<html><body><p>Ulrich, <br><br>Thank you very much for your feedback. <br><br>You wrote, &quot;<tt>Could it be you forgot &quot;allow-migrate=true&quot; at the resource level or some migration IP address at the node level?<br>I only have SLES11 here...</tt>&quot;<br><br>I know for sure that the pacemaker remote node (zs95kjg110102) I mentioned below is configured correctly for pacemaker Live Guest Migration. <br>I can demonstrate this using the 'pcs resource move' CLI : <br><br>I will migrate this &quot;remote node&quot; guest (zs95kjg110102) and resource &quot;zs95kjg110102_res&quot; to another cluster node <br>(e.g. zs95kjpcs1 / 10.20.93.12) , using the 'pcs1' hostname / IP which is currently running on zs93kjpcs1 (10.20.93.11): <br><br>[root@zs95kj ~]# pcs resource show |grep zs95kjg110102_res<br> zs95kjg110102_res      (ocf::heartbeat:VirtualDomain): Started zs93kjpcs1<br><br>[root@zs93kj ~]# pcs resource move zs95kjg110102_res zs95kjpcs1<br><br>[root@zs93kj ~]# pcs resource show |grep zs95kjg110102_res<br> zs95kjg110102_res      (ocf::heartbeat:VirtualDomain): Started zs95kjpcs1<br><br>## On zs95kjpcs1,  you can see that the guest is actually running there...<br><br>[root@zs95kj ~]# virsh list |grep zs95kjg110102<br> 63    zs95kjg110102                  running<br><br>[root@zs95kj ~]# ping 10.20.110.102<br>PING 10.20.110.102 (10.20.110.102) 56(84) bytes of data.<br>64 bytes from 10.20.110.102: icmp_seq=1 ttl=63 time=0.775 ms<br><br>So, everything seems set up correctly for live guest migration of this VirtualDomain resource. <br><br>What I am really looking for is a way to ensure 100% availability of a &quot;live guest migratable&quot; pacemaker remote node guest<br>in a situation where the interface (in this case vlan1293) ring0_addr goes down.  I thought that maybe configuring<br>Redundant Ring Protocol (RRP) for corosync would provide this, but from what I've seen so far it doesn't<br>look that way.    If the ring0_addr interface is lost in an RRP configuration while the remote guest is connected<br>to the host using that ring0_addr, the guest gets rebooted and reestablishes the  &quot;remote-node-to-host&quot; connection over the ring1_addr, <br>which is great as long as you don't care if the guest gets rebooted.   Corosync is doing its job of preventing the<br>cluster node from being fenced by failing over its heartbeat messaging to ring1, however the remote_node guests take<br>a short term hit due to the remote-node-to-host reconnect. <br><br>In the event of a ring0_addr failure, I don't see any attempt by pacemaker to migrate the remote_node to another cluster node, <br>but maybe this is by design, since there is no alternate path for the guest to use for LGM (i.e. ring0 is a single point of failure). <br>If the guest could be migrated over an alternate route, it would prevent the guest outage.<br><br>Maybe my question is... is there any way to facilitate an alternate Live Guest Migration path in the event of a ring0_addr failure? <br>This might also apply to a single ring protocol as well. <br><br>Thanks, <br><br>Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.<br>  INTERNET:  swgreenl@us.ibm.com  <br><br><br><img width="16" height="16" src="cid:1__=8FBB0A4FDFC3474B8f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for &quot;Ulrich Windl&quot; ---03/02/2017 02:39:23 AM---&gt;&gt;&gt; &quot;Scott Greenlese&quot; &lt;swgreenl@us.ibm.com&gt; schrieb am 01."><font color="#424282">&quot;Ulrich Windl&quot; ---03/02/2017 02:39:23 AM---&gt;&gt;&gt; &quot;Scott Greenlese&quot; &lt;swgreenl@us.ibm.com&gt; schrieb am 01.03.2017 um 22:07 in Nachricht</font><br><br><font size="2" color="#5F5F5F">From:        </font><font size="2">&quot;Ulrich Windl&quot; &lt;Ulrich.Windl@rz.uni-regensburg.de&gt;</font><br><font size="2" color="#5F5F5F">To:        </font><font size="2">&lt;users@clusterlabs.org&gt;</font><br><font size="2" color="#5F5F5F">Date:        </font><font size="2">03/02/2017 02:39 AM</font><br><font size="2" color="#5F5F5F">Subject:        </font><font size="2">[ClusterLabs] Antw: Expected recovery behavior of remote-node guest when corosync ring0 is lost in a passive mode RRP config?</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><tt>&gt;&gt;&gt; &quot;Scott Greenlese&quot; &lt;swgreenl@us.ibm.com&gt; schrieb am 01.03.2017 um 22:07 in<br>Nachricht<br>&lt;OFFC50C6DC.1138528D-ON002580D6.006F49AA-852580D6.00740858@notes.na.collabserv.c <br>m&gt;:<br><br>&gt; Hi..<br>&gt; <br>&gt; I am running a few corosync &quot;passive mode&quot; Redundant Ring Protocol (RRP)<br>&gt; failure scenarios, where<br>&gt; my cluster has several remote-node VirtualDomain resources running on each<br>&gt; node in the cluster,<br>&gt; which have been configured to allow Live Guest Migration (LGM) operations.<br>&gt; <br>&gt; While both corosync rings are active, if I drop ring0 on a given node where<br>&gt; I have remote node (guests) running,<br>&gt; I noticed that the guest will be shutdown / re-started on the same host,<br>&gt; after which the connection is re-established<br>&gt; and the guest proceeds to run on that same cluster node.<br><br>Could it be you forgot &quot;allow-migrate=true&quot; at the resource level or some migration IP address at the node level?<br>I only have SLES11 here...<br><br>&gt; <br>&gt; I am wondering why pacemaker doesn't try to &quot;live&quot; migrate the remote node<br>&gt; (guest) to a different node, instead<br>&gt; of rebooting the guest? &nbsp;Is there some way to configure the remote nodes<br>&gt; such that the recovery action is<br>&gt; LGM instead of reboot when the host-to-remote_node connect is lost in an<br>&gt; RRP situation? &nbsp; I guess the<br>&gt; next question is, is it even possible to LGM a remote node guest if the<br>&gt; corosync ring fails over from ring0 to ring1<br>&gt; (or vise-versa)?<br>&gt; <br>&gt; # For example, here's a remote node's VirtualDomain resource definition.<br>&gt; <br>&gt; [root@zs95kj]# pcs resource show &nbsp;zs95kjg110102_res<br>&gt; &nbsp;Resource: zs95kjg110102_res (class=ocf provider=heartbeat<br>&gt; type=VirtualDomain)<br>&gt; &nbsp; Attributes: config=/guestxml/nfs1/zs95kjg110102.xml<br>&gt; hypervisor=qemu:///system migration_transport=ssh<br>&gt; &nbsp; Meta Attrs: allow-migrate=true remote-node=zs95kjg110102<br>&gt; remote-addr=10.20.110.102<br>&gt; &nbsp; Operations: start interval=0s timeout=480<br>&gt; (zs95kjg110102_res-start-interval-0s)<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; stop interval=0s timeout=120<br>&gt; (zs95kjg110102_res-stop-interval-0s)<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; monitor interval=30s (zs95kjg110102_res-monitor-interval-30s)<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; migrate-from interval=0s timeout=1200<br>&gt; (zs95kjg110102_res-migrate-from-interval-0s)<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; migrate-to interval=0s timeout=1200<br>&gt; (zs95kjg110102_res-migrate-to-interval-0s)<br>&gt; [root@zs95kj VD]#<br>&gt; <br>&gt; <br>&gt; <br>&gt; <br>&gt; # My RRP rings are active, and configured &quot;rrp_mode=&quot;passive&quot;<br>&gt; <br>&gt; [root@zs95kj ~]# corosync-cfgtool -s<br>&gt; Printing ring status.<br>&gt; Local node ID 2<br>&gt; RING ID 0<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp;= 10.20.93.12<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; status &nbsp;= ring 0 active with no faults<br>&gt; RING ID 1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp;= 10.20.94.212<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; status &nbsp;= ring 1 active with no faults<br>&gt; <br>&gt; <br>&gt; <br>&gt; # Here's the corosync.conf ..<br>&gt; <br>&gt; [root@zs95kj ~]# cat /etc/corosync/corosync.conf<br>&gt; totem {<br>&gt; &nbsp; &nbsp; version: 2<br>&gt; &nbsp; &nbsp; secauth: off<br>&gt; &nbsp; &nbsp; cluster_name: test_cluster_2<br>&gt; &nbsp; &nbsp; transport: udpu<br>&gt; &nbsp; &nbsp; rrp_mode: passive<br>&gt; }<br>&gt; <br>&gt; nodelist {<br>&gt; &nbsp; &nbsp; node {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring0_addr: zs95kjpcs1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring1_addr: zs95kjpcs2<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; nodeid: 2<br>&gt; &nbsp; &nbsp; }<br>&gt; <br>&gt; &nbsp; &nbsp; node {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring0_addr: zs95KLpcs1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring1_addr: zs95KLpcs2<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; nodeid: 3<br>&gt; &nbsp; &nbsp; }<br>&gt; <br>&gt; &nbsp; &nbsp; node {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring0_addr: zs90kppcs1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring1_addr: zs90kppcs2<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; nodeid: 4<br>&gt; &nbsp; &nbsp; }<br>&gt; <br>&gt; &nbsp; &nbsp; node {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring0_addr: zs93KLpcs1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring1_addr: zs93KLpcs2<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; nodeid: 5<br>&gt; &nbsp; &nbsp; }<br>&gt; <br>&gt; &nbsp; &nbsp; node {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring0_addr: zs93kjpcs1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; ring1_addr: zs93kjpcs2<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; nodeid: 1<br>&gt; &nbsp; &nbsp; }<br>&gt; }<br>&gt; <br>&gt; quorum {<br>&gt; &nbsp; &nbsp; provider: corosync_votequorum<br>&gt; }<br>&gt; <br>&gt; logging {<br>&gt; &nbsp; &nbsp; to_logfile: yes<br>&gt; &nbsp; &nbsp; logfile: /var/log/corosync/corosync.log<br>&gt; &nbsp; &nbsp; timestamp: on<br>&gt; &nbsp; &nbsp; syslog_facility: daemon<br>&gt; &nbsp; &nbsp; to_syslog: yes<br>&gt; &nbsp; &nbsp; debug: on<br>&gt; <br>&gt; &nbsp; &nbsp; logger_subsys {<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; debug: off<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; subsys: QUORUM<br>&gt; &nbsp; &nbsp; }<br>&gt; }<br>&gt; <br>&gt; <br>&gt; <br>&gt; <br>&gt; # Here's the vlan / route situation on cluster node zs95kj:<br>&gt; <br>&gt; ring0 is on vlan1293<br>&gt; ring1 is on vlan1294<br>&gt; <br>&gt; [root@zs95kj ~]# route -n<br>&gt; Kernel IP routing table<br>&gt; Destination &nbsp; &nbsp; Gateway &nbsp; &nbsp; &nbsp; &nbsp; Genmask &nbsp; &nbsp; &nbsp; &nbsp; Flags Metric Ref &nbsp; &nbsp;Use<br>&gt; Iface<br>&gt; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 10.20.93.254 &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; UG &nbsp; &nbsp;400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1293 &nbsp;&lt;&lt; default route to guests from ring0<br>&gt; 9.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 9.12.23.1 &nbsp; &nbsp; &nbsp; 255.0.0.0 &nbsp; &nbsp; &nbsp; UG &nbsp; &nbsp;400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan508<br>&gt; 9.12.23.0 &nbsp; &nbsp; &nbsp; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan508<br>&gt; 10.20.92.0 &nbsp; &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1292<br>&gt; 10.20.93.0 &nbsp; &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 0 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1293 &nbsp;&lt;&lt; ring0 IPs<br>&gt; 10.20.93.0 &nbsp; &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1293<br>&gt; 10.20.94.0 &nbsp; &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 0 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1294 &nbsp; &lt;&lt; ring1 IPs<br>&gt; 10.20.94.0 &nbsp; &nbsp; &nbsp;0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1294<br>&gt; 10.20.101.0 &nbsp; &nbsp; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1298<br>&gt; 10.20.109.0 &nbsp; &nbsp; 10.20.94.254 &nbsp; &nbsp;255.255.255.0 &nbsp; UG &nbsp; &nbsp;400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1294 &nbsp;&lt;&lt; Route to guests on 10.20.109 from ring1<br>&gt; 10.20.110.0 &nbsp; &nbsp; 10.20.94.254 &nbsp; &nbsp;255.255.255.0 &nbsp; UG &nbsp; &nbsp;400 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; vlan1294 &nbsp;&lt;&lt; Route to guests on 10.20.110 from ring1<br>&gt; 169.254.0.0 &nbsp; &nbsp; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.0.0 &nbsp; &nbsp; U &nbsp; &nbsp; 1007 &nbsp; 0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; enccw0.0.02e0<br>&gt; 169.254.0.0 &nbsp; &nbsp; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.0.0 &nbsp; &nbsp; U &nbsp; &nbsp; 1016 &nbsp; 0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; ovsbridge1<br>&gt; 192.168.122.0 &nbsp; 0.0.0.0 &nbsp; &nbsp; &nbsp; &nbsp; 255.255.255.0 &nbsp; U &nbsp; &nbsp; 0 &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;0<br>&gt; virbr0<br>&gt; <br>&gt; <br>&gt; <br>&gt; # On remote node, you can see we have a connection back to the host.<br>&gt; <br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; crm_log_init: &nbsp;Changed active directory to /var/lib/heartbeat/cores/root<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: lrmd<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; lrmd_init_remote_tls_server: &nbsp; Starting a tls listener on port 3121.<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; bind_and_listen: &nbsp; &nbsp; &nbsp; Listening on address ::<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_ro<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_rw<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_shm<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: attrd<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: stonith-ng<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: crmd<br>&gt; Feb 28 14:30:22 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info: main:<br>&gt; Starting<br>&gt; Feb 28 14:30:27 [928] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; lrmd_remote_listen: &nbsp; &nbsp;LRMD client connection established. 0x9ec18b50 id:<br>&gt; 93e25ef0-4ff8-45ac-a6ed-f13b64588326<br>&gt; <br>&gt; zs95kjg110102:~ # netstat -anp<br>&gt; Active Internet connections (servers and established)<br>&gt; Proto Recv-Q Send-Q Local Address &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Foreign Address &nbsp; &nbsp; &nbsp; &nbsp; State<br>&gt; PID/Program name<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:22 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 946/sshd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 127.0.0.1:25 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 1022/master<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5666 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 931/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5801 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 931/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5901 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 931/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::21 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 926/vsftpd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::22 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 946/sshd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 ::1:25 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 1022/master<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::44931 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 1068/xdm<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::80 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 929/httpd-prefork<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::3121 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 928/pacemaker_remot<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 10.20.110.102:3121 &nbsp; &nbsp; &nbsp;10.20.93.12:46425<br>&gt; ESTABLISHED 928/pacemaker_remot<br>&gt; udp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::177 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::*<br>&gt; 1068/xdm<br>&gt; <br>&gt; <br>&gt; <br>&gt; <br>&gt; ## Drop the ring0 (vlan1293) interface on cluster node zs95kj, causing fail<br>&gt; over to ring1 (vlan1294)<br>&gt; <br>&gt; [root@zs95kj]# date;ifdown vlan1293<br>&gt; Tue Feb 28 15:54:11 EST 2017<br>&gt; Device 'vlan1293' successfully disconnected.<br>&gt; <br>&gt; <br>&gt; <br>&gt; ## Confirm that ring0 is now offline (a.k.a. &quot;FAULTY&quot;)<br>&gt; <br>&gt; [root@zs95kj]# date;corosync-cfgtool -s<br>&gt; Tue Feb 28 15:54:49 EST 2017<br>&gt; Printing ring status.<br>&gt; Local node ID 2<br>&gt; RING ID 0<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp;= 10.20.93.12<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; status &nbsp;= Marking ringid 0 interface 10.20.93.12 FAULTY<br>&gt; RING ID 1<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; id &nbsp; &nbsp; &nbsp;= 10.20.94.212<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; status &nbsp;= ring 1 active with no faults<br>&gt; [root@zs95kj VD]#<br>&gt; <br>&gt; <br>&gt; <br>&gt; <br>&gt; # See that the resource stayed local to cluster node zs95kj.<br>&gt; <br>&gt; [root@zs95kj]# date;pcs resource show |grep zs95kjg110102<br>&gt; Tue Feb 28 15:55:32 EST 2017<br>&gt; &nbsp;zs95kjg110102_res &nbsp; &nbsp; &nbsp;(ocf::heartbeat:VirtualDomain): Started zs95kjpcs1<br>&gt; You have new mail in /var/spool/mail/root<br>&gt; <br>&gt; <br>&gt; <br>&gt; # On the remote node, show new entries in pacemaker.log showing connection<br>&gt; re-established.<br>&gt; <br>&gt; Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; crm_signal_dispatch: &nbsp; Invoking handler for signal 15: Terminated<br>&gt; Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; lrmd_shutdown: Terminating with &nbsp;1 clients<br>&gt; Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_withdraw: &nbsp; withdrawing server sockets<br>&gt; Feb 28 15:55:17 [928] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; crm_xml_cleanup: &nbsp; &nbsp; &nbsp; Cleaning up memory from libxml2<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; crm_log_init: &nbsp;Changed active directory to /var/lib/heartbeat/cores/root<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: lrmd<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; lrmd_init_remote_tls_server: &nbsp; Starting a tls listener on port 3121.<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; bind_and_listen: &nbsp; &nbsp; &nbsp; Listening on address ::<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_ro<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_rw<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: cib_shm<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: attrd<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: stonith-ng<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info:<br>&gt; qb_ipcs_us_publish: &nbsp; &nbsp;server name: crmd<br>&gt; Feb 28 15:55:37 [942] zs95kjg110102 pacemaker_remoted: &nbsp; &nbsp; info: main:<br>&gt; Starting<br>&gt; Feb 28 15:55:38 [942] zs95kjg110102 pacemaker_remoted: &nbsp; notice:<br>&gt; lrmd_remote_listen: &nbsp; &nbsp;LRMD client connection established. 0xbed1ab50 id:<br>&gt; b19ed532-6f61-4d9c-9439-ffb836eea34f<br>&gt; zs95kjg110102:~ #<br>&gt; <br>&gt; <br>&gt; <br>&gt; zs95kjg110102:~ # netstat -anp |less<br>&gt; Active Internet connections (servers and established)<br>&gt; Proto Recv-Q Send-Q Local Address &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Foreign Address &nbsp; &nbsp; &nbsp; &nbsp; State<br>&gt; PID/Program name<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:22 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 961/sshd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 127.0.0.1:25 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 1065/master<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5666 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 946/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5801 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 946/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 0.0.0.0:5901 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0.0.0:* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LISTEN<br>&gt; 946/xinetd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 10.20.110.102:22 &nbsp; &nbsp; &nbsp; &nbsp;10.20.94.32:57749<br>&gt; ESTABLISHED 1134/0<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::21 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 941/vsftpd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::22 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 961/sshd<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 ::1:25 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 1065/master<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::80 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 944/httpd-prefork<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::3121 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 942/pacemaker_remot<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::34836 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::* &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LISTEN<br>&gt; 1070/xdm<br>&gt; tcp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 10.20.110.102:3121 &nbsp; &nbsp; &nbsp;10.20.94.212:49666<br>&gt; ESTABLISHED 942/pacemaker_remot<br>&gt; udp &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp;0 :::177 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:::*<br>&gt; 1070/xdm<br>&gt; <br>&gt; <br>&gt; <br>&gt; ## On host node, zs95kj show system messages indicating remote node (guest)<br>&gt; shutdown / start ... &nbsp;(but no attempt to LGM).<br>&gt; <br>&gt; [root@zs95kj ~]# grep &quot;Feb 28&quot; /var/log/messages |grep zs95kjg110102<br>&gt; <br>&gt; Feb 28 15:55:07 zs95kj crmd[121380]: &nbsp; error: Operation<br>&gt; zs95kjg110102_monitor_30000: Timed Out (node=zs95kjpcs1, call=2,<br>&gt; timeout=30000ms)<br>&gt; Feb 28 15:55:07 zs95kj crmd[121380]: &nbsp; error: Unexpected disconnect on<br>&gt; remote-node zs95kjg110102<br>&gt; Feb 28 15:55:17 zs95kj crmd[121380]: &nbsp;notice: Operation<br>&gt; zs95kjg110102_stop_0: ok (node=zs95kjpcs1, call=38, rc=0, cib-update=370,<br>&gt; confirmed=true)<br>&gt; Feb 28 15:55:17 zs95kj attrd[121378]: &nbsp;notice: Removing all zs95kjg110102<br>&gt; attributes for zs95kjpcs1<br>&gt; Feb 28 15:55:17 zs95kj VirtualDomain(zs95kjg110102_res)[173127]: INFO:<br>&gt; Issuing graceful shutdown request for domain zs95kjg110102.<br>&gt; Feb 28 15:55:23 zs95kj systemd-machined: Machine qemu-38-zs95kjg110102<br>&gt; terminated.<br>&gt; Feb 28 15:55:23 zs95kj crmd[121380]: &nbsp;notice: Operation<br>&gt; zs95kjg110102_res_stop_0: ok (node=zs95kjpcs1, call=858, rc=0,<br>&gt; cib-update=378, confirmed=true)<br>&gt; Feb 28 15:55:24 zs95kj systemd-machined: New machine qemu-64-zs95kjg110102.<br>&gt; Feb 28 15:55:24 zs95kj systemd: Started Virtual Machine<br>&gt; qemu-64-zs95kjg110102.<br>&gt; Feb 28 15:55:24 zs95kj systemd: Starting Virtual Machine<br>&gt; qemu-64-zs95kjg110102.<br>&gt; Feb 28 15:55:25 zs95kj crmd[121380]: &nbsp;notice: Operation<br>&gt; zs95kjg110102_res_start_0: ok (node=zs95kjpcs1, call=859, rc=0,<br>&gt; cib-update=385, confirmed=true)<br>&gt; Feb 28 15:55:38 zs95kj crmd[121380]: &nbsp;notice: Operation<br>&gt; zs95kjg110102_start_0: ok (node=zs95kjpcs1, call=44, rc=0, cib-update=387,<br>&gt; confirmed=true)<br>&gt; [root@zs95kj ~]#<br>&gt; <br>&gt; <br>&gt; Once the remote node established re-connection, there was no further remote<br>&gt; node / resource instability.<br>&gt; <br>&gt; Anyway, just wondering why there was no attempt to migrate this remote node<br>&gt; guest as opposed to a reboot? &nbsp; Is it necessary to reboot the guest in<br>&gt; order to be managed<br>&gt; by pacemaker and corosync over the ring1 interface if ring0 goes down?<br>&gt; Is live guest migration even possible if ring0 goes away and ring1 takes<br>&gt; over?<br>&gt; <br>&gt; Thanks in advance..<br>&gt; <br>&gt; Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,<br>&gt; N.Y.<br>&gt; &nbsp; INTERNET: &nbsp;swgreenl@us.ibm.com <br><br><br><br><br>_______________________________________________<br>Users mailing list: Users@clusterlabs.org<br></tt><tt><a href="http://lists.clusterlabs.org/mailman/listinfo/users">http://lists.clusterlabs.org/mailman/listinfo/users</a></tt><tt><br><br>Project Home: </tt><tt><a href="http://www.clusterlabs.org">http://www.clusterlabs.org</a></tt><tt><br>Getting started: </tt><tt><a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a></tt><tt><br>Bugs: </tt><tt><a href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a></tt><tt><br><br></tt><br><br><BR>

</body></html>