<div dir="ltr">Actually, looking at my configs I didn&#39;t see where the stonith RA&#39;s are actually associated with a specific cluster node:<div><br></div><div><div>Stonith Devices:  </div><div> Resource: NFS1 (class=stonith type=fence_xvm)</div><div>  Attributes: key_file=/etc/cluster/fence_xvm_ceph1.key multicast_address=225.0.0.12 port=NFS1</div><div>  Operations: monitor interval=20s (NFS1-monitor-interval-20s)</div><div> Resource: NFS2 (class=stonith type=fence_xvm)</div><div>  Attributes: key_file=/etc/cluster/fence_xvm_ceph2.key multicast_address=225.0.1.12 port=NFS2</div><div>  Operations: monitor interval=20s (NFS2-monitor-interval-20s)</div><div> Resource: NFS3 (class=stonith type=fence_xvm)</div><div>  Attributes: key_file=/etc/cluster/fence_xvm_ceph3.key multicast_address=225.0.2.12 port=NFS3</div><div>  Operations: monitor interval=20s (NFS3-monitor-interval-20s)</div></div><div><br></div><div>My cluster contains nodes: node1, node2, node3. But pacemaker wouldn&#39;t know that guest NFS1 = node1, etc. Looking through the options for fence_xvm I also didn&#39;t see parameters containing cluster node name.</div><div><br></div><div>So I changed the VM names to node[1-3] and its working! Also, before I changed the VM name, I also changed the stonith RA names from NFS[1-3] to node[1-3], doubt that makes any difference as its just a name, but at the least its a logical name now for me.<br></div><div><br></div><div>Now the next issue..</div><div><br></div><div>After stopping the network service on node1, the other two nodes decided they should stonith node1 as expected. Then as I&#39;m watching (watch -n 2 pcs status) on node2 where the services have started up I notice it appears that the services are flapping (log file time 12:09:19 - 12:09:56 matches the video I took here <a href="https://youtu.be/mnCJ9FZqGjA" target="_blank">https://youtu.be/mnCJ9FZqGjA</a>)</div><div><br></div><div>Logs:</div><div><a href="https://dl.dropboxusercontent.com/u/21916057/pacemaker-node1.log">https://dl.dropboxusercontent.com/u/21916057/pacemaker-node1.log</a><br></div><div><a href="https://dl.dropboxusercontent.com/u/21916057/pacemaker-node2.log">https://dl.dropboxusercontent.com/u/21916057/pacemaker-node2.log</a><br></div><div><a href="https://dl.dropboxusercontent.com/u/21916057/pacemaker-node3.log">https://dl.dropboxusercontent.com/u/21916057/pacemaker-node3.log</a><br></div><div><div><br></div></div><div><br></div><div>Then at 12:12:15 (ish) node3 gets fenced (by itself it appears). I see this on node3&#39;s hypervisor:</div><div>Jun  3 12:12:16 ceph3 fence_virtd: Rebooting domain node3<br></div><div><br></div><div>But afterwards the resources are all still running on node2, which I suppose is the &#39;safe&#39; bet:</div><div><div># pcs status</div><div>Cluster name: nfs</div><div>Last updated: Wed Jun  3 13:10:17 2015</div><div>Last change: Wed Jun  3 11:32:42 2015</div><div>Stack: corosync</div><div>Current DC: node2 (2) - partition WITHOUT quorum</div><div>Version: 1.1.12-a14efad</div><div>3 Nodes configured</div><div>9 Resources configured</div><div><br></div><div><br></div><div>Online: [ node2 ]</div><div>OFFLINE: [ node1 node3 ]</div><div><br></div><div>Full list of resources:</div><div><br></div><div> Resource Group: group_rbd_fs_nfs_vip</div><div>     rbd_nfs-ha (ocf::ceph:<a href="http://rbd.in">rbd.in</a>):     Started node2 </div><div>     rbd_home   (ocf::ceph:<a href="http://rbd.in">rbd.in</a>):     Started node2 </div><div>     fs_nfs-ha  (ocf::heartbeat:Filesystem):    Started node2 </div><div>     FS_home    (ocf::heartbeat:Filesystem):    Started node2 </div><div>     nfsserver  (ocf::heartbeat:nfsserver):     Started node2 </div><div>     vip_nfs_private    (ocf::heartbeat:IPaddr):        Started node2 </div><div> node1  (stonith:fence_xvm):    Stopped </div><div> node2  (stonith:fence_xvm):    Started node2 </div><div> node3  (stonith:fence_xvm):    Stopped </div><div><br></div><div>PCSD Status:</div><div>  node1: Online</div><div>  node2: Online</div><div>  node3: Online</div><div><br></div><div>Daemon Status:</div><div>  corosync: active/disabled</div><div>  pacemaker: active/disabled</div><div>  pcsd: active/enabled</div><div>[root@node2 ~]# exportfs</div><div>/mnt/home       <a href="http://10.0.231.0/255.255.255.0">10.0.231.0/255.255.255.0</a></div></div><div><br></div><div>I realize this is a ton of logs to go through, appreciate it if anyone has the time as I&#39;m not sure how to troubleshoot why node3 was fenced when everything was working properly with 2/3 nodes.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 3, 2015 at 10:38 AM, Ken Gaillot <span dir="ltr">&lt;<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

----- Original Message -----<br>

&gt; I&#39;ve tried configuring without pcmk_host_list as well with the same result.<br>

<br>

</span>What do the logs say now?<br>

<div class="HOEnZb"><div class="h5"><br>

&gt; Stonith Devices:<br>

&gt;  Resource: NFS1 (class=stonith type=fence_xvm)<br>

&gt;   Attributes: key_file=/etc/cluster/fence_xvm_ceph1.key<br>

&gt; multicast_address=225.0.0.12 port=NFS1<br>

&gt;   Operations: monitor interval=20s (NFS1-monitor-interval-20s)<br>

&gt;  Resource: NFS2 (class=stonith type=fence_xvm)<br>

&gt;   Attributes: key_file=/etc/cluster/fence_xvm_ceph2.key<br>

&gt; multicast_address=225.0.1.12 port=NFS2<br>

&gt;   Operations: monitor interval=20s (NFS2-monitor-interval-20s)<br>

&gt;  Resource: NFS3 (class=stonith type=fence_xvm)<br>

&gt;   Attributes: key_file=/etc/cluster/fence_xvm_ceph3.key<br>

&gt; multicast_address=225.0.2.12 port=NFS3<br>

&gt;   Operations: monitor interval=20s (NFS3-monitor-interval-20s)<br>

&gt;<br>

&gt; I can get the list of VM&#39;s from any of the 3 cluster nodes using the<br>

&gt; multicast address:<br>

&gt;<br>

&gt; # fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm_ceph1.key -o list<br>

&gt; NFS1                 1814d93d-3e40-797f-a3c6-102aaa6a3d01 on<br>

&gt;<br>

&gt; # fence_xvm -a 225.0.1.12 -k /etc/cluster/fence_xvm_ceph2.key -o list<br>

&gt; NFS2                 75ab85fc-40e9-45ae-8b0a-c346d59b24e8 on<br>

&gt;<br>

&gt; # fence_xvm -a 225.0.2.12 -k /etc/cluster/fence_xvm_ceph3.key -o list<br>

&gt; NFS3                 f23cca5d-d50b-46d2-85dd-d8357337fd22 on<br>

&gt;<br>

&gt; On Tue, Jun 2, 2015 at 10:07 AM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt; On 06/02/2015 11:40 AM, Steve Dainard wrote:<br>

&gt; &gt; &gt; Hello,<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I have 3 CentOS7 guests running on 3 CentOS7 hypervisors and I can&#39;t get<br>

&gt; &gt; &gt; stonith operations to work.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Config:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Cluster Name: nfs<br>

&gt; &gt; &gt; Corosync Nodes:<br>

&gt; &gt; &gt;  node1 node2 node3<br>

&gt; &gt; &gt; Pacemaker Nodes:<br>

&gt; &gt; &gt;  node1 node2 node3<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Resources:<br>

&gt; &gt; &gt;  Group: group_rbd_fs_nfs_vip<br>

&gt; &gt; &gt;   Resource: rbd_nfs-ha (class=ocf provider=ceph type=<a href="http://rbd.in" target="_blank">rbd.in</a>)<br>

&gt; &gt; &gt;    Attributes: user=admin pool=rbd name=nfs-ha<br>

&gt; &gt; cephconf=/etc/ceph/ceph.conf<br>

&gt; &gt; &gt;    Operations: start interval=0s timeout=20 (rbd_nfs-ha-start-timeout-20)<br>

&gt; &gt; &gt;                stop interval=0s timeout=20 (rbd_nfs-ha-stop-timeout-20)<br>

&gt; &gt; &gt;                monitor interval=10s timeout=20s<br>

&gt; &gt; &gt; (rbd_nfs-ha-monitor-interval-10s)<br>

&gt; &gt; &gt;   Resource: rbd_home (class=ocf provider=ceph type=<a href="http://rbd.in" target="_blank">rbd.in</a>)<br>

&gt; &gt; &gt;    Attributes: user=admin pool=rbd name=home cephconf=/etc/ceph/ceph.conf<br>

&gt; &gt; &gt;    Operations: start interval=0s timeout=20 (rbd_home-start-timeout-20)<br>

&gt; &gt; &gt;                stop interval=0s timeout=20 (rbd_home-stop-timeout-20)<br>

&gt; &gt; &gt;                monitor interval=10s timeout=20s<br>

&gt; &gt; &gt; (rbd_home-monitor-interval-10s)<br>

&gt; &gt; &gt;   Resource: fs_nfs-ha (class=ocf provider=heartbeat type=Filesystem)<br>

&gt; &gt; &gt;    Attributes: directory=/mnt/nfs-ha fstype=btrfs<br>

&gt; &gt; &gt; device=/dev/rbd/rbd/nfs-ha fast_stop=no<br>

&gt; &gt; &gt;    Operations: monitor interval=20s timeout=40s<br>

&gt; &gt; &gt; (fs_nfs-ha-monitor-interval-20s)<br>

&gt; &gt; &gt;                start interval=0 timeout=60s (fs_nfs-ha-start-interval-0)<br>

&gt; &gt; &gt;                stop interval=0 timeout=60s (fs_nfs-ha-stop-interval-0)<br>

&gt; &gt; &gt;   Resource: FS_home (class=ocf provider=heartbeat type=Filesystem)<br>

&gt; &gt; &gt;    Attributes: directory=/mnt/home fstype=btrfs device=/dev/rbd/rbd/home<br>

&gt; &gt; &gt; options=rw,compress-force=lzo fast_stop=no<br>

&gt; &gt; &gt;    Operations: monitor interval=20s timeout=40s<br>

&gt; &gt; &gt; (FS_home-monitor-interval-20s)<br>

&gt; &gt; &gt;                start interval=0 timeout=60s (FS_home-start-interval-0)<br>

&gt; &gt; &gt;                stop interval=0 timeout=60s (FS_home-stop-interval-0)<br>

&gt; &gt; &gt;   Resource: nfsserver (class=ocf provider=heartbeat type=nfsserver)<br>

&gt; &gt; &gt;    Attributes: nfs_shared_infodir=/mnt/nfs-ha<br>

&gt; &gt; &gt;    Operations: stop interval=0s timeout=20s (nfsserver-stop-timeout-20s)<br>

&gt; &gt; &gt;                monitor interval=10s timeout=20s<br>

&gt; &gt; &gt; (nfsserver-monitor-interval-10s)<br>

&gt; &gt; &gt;                start interval=0 timeout=40s (nfsserver-start-interval-0)<br>

&gt; &gt; &gt;   Resource: vip_nfs_private (class=ocf provider=heartbeat type=IPaddr)<br>

&gt; &gt; &gt;    Attributes: ip=10.0.231.49 cidr_netmask=24<br>

&gt; &gt; &gt;    Operations: start interval=0s timeout=20s<br>

&gt; &gt; &gt; (vip_nfs_private-start-timeout-20s)<br>

&gt; &gt; &gt;                stop interval=0s timeout=20s<br>

&gt; &gt; &gt; (vip_nfs_private-stop-timeout-20s)<br>

&gt; &gt; &gt;                monitor interval=5 (vip_nfs_private-monitor-interval-5)<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Stonith Devices:<br>

&gt; &gt; &gt;  Resource: NFS1 (class=stonith type=fence_xvm)<br>

&gt; &gt; &gt;   Attributes: pcmk_host_list=10.0.231.50<br>

&gt; &gt; &gt; key_file=/etc/cluster/fence_xvm_ceph1.key multicast_address=225.0.0.12<br>

&gt; &gt; &gt; port=NFS1<br>

&gt; &gt; &gt;   Operations: monitor interval=20s (NFS1-monitor-interval-20s)<br>

&gt; &gt; &gt;  Resource: NFS2 (class=stonith type=fence_xvm)<br>

&gt; &gt; &gt;   Attributes: pcmk_host_list=10.0.231.51<br>

&gt; &gt; &gt; key_file=/etc/cluster/fence_xvm_ceph2.key multicast_address=225.0.1.12<br>

&gt; &gt; &gt; port=NFS2<br>

&gt; &gt; &gt;   Operations: monitor interval=20s (NFS2-monitor-interval-20s)<br>

&gt; &gt; &gt;  Resource: NFS3 (class=stonith type=fence_xvm)<br>

&gt; &gt; &gt;   Attributes: pcmk_host_list=10.0.231.52<br>

&gt; &gt; &gt; key_file=/etc/cluster/fence_xvm_ceph3.key multicast_address=225.0.2.12<br>

&gt; &gt; &gt; port=NFS3<br>

&gt; &gt;<br>

&gt; &gt; I think pcmk_host_list should have the node name rather than the IP<br>

&gt; &gt; address. If fence_xvm -o list -a whatever shows the right nodes to<br>

&gt; &gt; fence, you don&#39;t even need to set pcmk_host_list.<br>

&gt; &gt;<br>

&gt; &gt; &gt;   Operations: monitor interval=20s (NFS3-monitor-interval-20s)<br>

&gt; &gt; &gt; Fencing Levels:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Location Constraints:<br>

&gt; &gt; &gt;   Resource: NFS1<br>

&gt; &gt; &gt;     Enabled on: node1 (score:1) (id:location-NFS1-node1-1)<br>

&gt; &gt; &gt;     Enabled on: node2 (score:1000) (id:location-NFS1-node2-1000)<br>

&gt; &gt; &gt;     Enabled on: node3 (score:500) (id:location-NFS1-node3-500)<br>

&gt; &gt; &gt;   Resource: NFS2<br>

&gt; &gt; &gt;     Enabled on: node2 (score:1) (id:location-NFS2-node2-1)<br>

&gt; &gt; &gt;     Enabled on: node3 (score:1000) (id:location-NFS2-node3-1000)<br>

&gt; &gt; &gt;     Enabled on: node1 (score:500) (id:location-NFS2-node1-500)<br>

&gt; &gt; &gt;   Resource: NFS3<br>

&gt; &gt; &gt;     Enabled on: node3 (score:1) (id:location-NFS3-node3-1)<br>

&gt; &gt; &gt;     Enabled on: node1 (score:1000) (id:location-NFS3-node1-1000)<br>

&gt; &gt; &gt;     Enabled on: node2 (score:500) (id:location-NFS3-node2-500)<br>

&gt; &gt; &gt; Ordering Constraints:<br>

&gt; &gt; &gt; Colocation Constraints:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Cluster Properties:<br>

&gt; &gt; &gt;  cluster-infrastructure: corosync<br>

&gt; &gt; &gt;  cluster-name: nfs<br>

&gt; &gt; &gt;  dc-version: 1.1.12-a14efad<br>

&gt; &gt; &gt;  have-watchdog: false<br>

&gt; &gt; &gt;  stonith-enabled: true<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; When I stop networking services on node1 (stonith resource NFS1) I see<br>

&gt; &gt; logs<br>

&gt; &gt; &gt; on the other two cluster nodes attempting to reboot the vm NFS1 without<br>

&gt; &gt; &gt; success.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Logs:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    rbd_nfs-ha      (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    rbd_home        (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    fs_nfs-ha       (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    FS_home (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    nfsserver       (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    vip_nfs_private (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:     info:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Leave   NFS1    (Started node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:     info:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Leave   NFS2    (Started node3)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:   notice:<br>

&gt; &gt; LogActions:<br>

&gt; &gt; &gt;      Move    NFS3    (Started node1 -&gt; node2)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2130] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>    pengine:  warning:<br>

&gt; &gt; &gt; process_pe_message:      Calculated Transition 8:<br>

&gt; &gt; &gt; /var/lib/pacemaker/pengine/pe-warn-0.bz2<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:     info:<br>

&gt; &gt; &gt; do_state_transition:     State transition S_POLICY_ENGINE -&gt;<br>

&gt; &gt; &gt; S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE<br>

&gt; &gt; &gt; origin=handle_response ]<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:     info:<br>

&gt; &gt; &gt; do_te_invoke:    Processing graph 8 (ref=pe_calc-dc-1433198297-78)<br>

&gt; &gt; derived<br>

&gt; &gt; &gt; from /var/lib/pacemaker/pengine/pe-warn-0.bz2<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; te_fence_node:   Executing reboot fencing operation (37) on node1<br>

&gt; &gt; &gt; (timeout=60000)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; handle_request:  Client crmd.2131.f7e79b61 wants to fence (reboot)<br>

&gt; &gt; &#39;node1&#39;<br>

&gt; &gt; &gt; with device &#39;(any)&#39;<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; initiate_remote_stonith_op:      Initiating remote operation reboot for<br>

&gt; &gt; &gt; node1: a22a16f3-b699-453e-a090-43a640dd0e3f (0)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; can_fence_host_with_device:      NFS1 can not fence (reboot) node1:<br>

&gt; &gt; &gt; static-list<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; can_fence_host_with_device:      NFS2 can not fence (reboot) node1:<br>

&gt; &gt; &gt; static-list<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; can_fence_host_with_device:      NFS3 can not fence (reboot) node1:<br>

&gt; &gt; &gt; static-list<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:     info:<br>

&gt; &gt; &gt; process_remote_stonith_query:    All queries have arrived, continuing (2,<br>

&gt; &gt; &gt; 2, 2, a22a16f3-b699-453e-a090-43a640dd0e3f)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:   notice:<br>

&gt; &gt; &gt; stonith_choose_peer:     Couldn&#39;t find anyone to fence node1 with &lt;any&gt;<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:     info:<br>

&gt; &gt; &gt; call_remote_stonith:     Total remote op timeout set to 60 for fencing of<br>

&gt; &gt; &gt; node node1 for crmd.2131.a22a16f3<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:     info:<br>

&gt; &gt; &gt; call_remote_stonith:     None of the 2 peers have devices capable of<br>

&gt; &gt; &gt; terminating node1 for crmd.2131 (0)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2127] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a> stonith-ng:    error:<br>

&gt; &gt; &gt; remote_op_done:  Operation reboot of node1 by &lt;no-one&gt; for<br>

&gt; &gt; &gt; crmd.2131@node3.a22a16f3: No such device<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; tengine_stonith_callback:        Stonith operation<br>

&gt; &gt; &gt; 2/37:8:0:241ee032-f3a1-4c2b-8427-63af83b54343: No such device (-19)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; tengine_stonith_callback:        Stonith operation 2 for node1 failed (No<br>

&gt; &gt; &gt; such device): aborting transition.<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; abort_transition_graph:  Transition aborted: Stonith failed<br>

&gt; &gt; &gt; (source=tengine_stonith_callback:697, 0)<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; tengine_stonith_notify:  Peer node1 was not terminated (reboot) by<br>

&gt; &gt; &lt;anyone&gt;<br>

&gt; &gt; &gt; for node3: No such device (ref=a22a16f3-b699-453e-a090-43a640dd0e3f) by<br>

&gt; &gt; &gt; client crmd.2131<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; run_graph:<br>

&gt; &gt; &gt;     Transition 8 (Complete=1, Pending=0, Fired=0, Skipped=27,<br>

&gt; &gt; Incomplete=0,<br>

&gt; &gt; &gt; Source=/var/lib/pacemaker/pengine/pe-warn-0.bz2): Stopped<br>

&gt; &gt; &gt; Jun 01 15:38:17 [2131] <a href="http://nfs3.pcic.uvic.ca" target="_blank">nfs3.pcic.uvic.ca</a>       crmd:   notice:<br>

&gt; &gt; &gt; too_many_st_failures:    No devices found in cluster to fence node1,<br>

&gt; &gt; giving<br>

&gt; &gt; &gt; up<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I can manually fence a guest without any issue:<br>

&gt; &gt; &gt; # fence_xvm -a 225.0.0.12 -k /etc/cluster/fence_xvm_ceph1.key -o reboot<br>

&gt; &gt; -H<br>

&gt; &gt; &gt; NFS1<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; But the cluster doesn&#39;t recover resources to another host:<br>

&gt; &gt;<br>

&gt; &gt; The cluster doesn&#39;t know that the manual fencing succeeded, so it plays<br>

&gt; &gt; it safe by not moving resources. If you fix the cluster fencing issue,<br>

&gt; &gt; I&#39;d expect this to work.<br>

&gt; &gt;<br>

&gt; &gt; &gt; # pcs status *&lt;-- after manual fencing*<br>

&gt; &gt; &gt; Cluster name: nfs<br>

&gt; &gt; &gt; Last updated: Tue Jun  2 08:34:18 2015<br>

&gt; &gt; &gt; Last change: Mon Jun  1 16:02:58 2015<br>

&gt; &gt; &gt; Stack: corosync<br>

&gt; &gt; &gt; Current DC: node3 (3) - partition with quorum<br>

&gt; &gt; &gt; Version: 1.1.12-a14efad<br>

&gt; &gt; &gt; 3 Nodes configured<br>

&gt; &gt; &gt; 9 Resources configured<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Node node1 (1): UNCLEAN (offline)<br>

&gt; &gt; &gt; Online: [ node2 node3 ]<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Full list of resources:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;  Resource Group: group_rbd_fs_nfs_vip<br>

&gt; &gt; &gt;      rbd_nfs-ha (ocf::ceph:<a href="http://rbd.in" target="_blank">rbd.in</a>):     Started node1<br>

&gt; &gt; &gt;      rbd_home   (ocf::ceph:<a href="http://rbd.in" target="_blank">rbd.in</a>):     Started node1<br>

&gt; &gt; &gt;      fs_nfs-ha  (ocf::heartbeat:Filesystem):    Started node1<br>

&gt; &gt; &gt;      FS_home    (ocf::heartbeat:Filesystem):    Started node1<br>

&gt; &gt; &gt;      nfsserver  (ocf::heartbeat:nfsserver):     Started node1<br>

&gt; &gt; &gt;      vip_nfs_private    (ocf::heartbeat:IPaddr):        Started node1<br>

&gt; &gt; &gt;  NFS1   (stonith:fence_xvm):    Started node2<br>

&gt; &gt; &gt;  NFS2   (stonith:fence_xvm):    Started node3<br>

&gt; &gt; &gt;  NFS3   (stonith:fence_xvm):    Started node1<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; PCSD Status:<br>

&gt; &gt; &gt;   node1: Online<br>

&gt; &gt; &gt;   node2: Online<br>

&gt; &gt; &gt;   node3: Online<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Daemon Status:<br>

&gt; &gt; &gt;   corosync: active/disabled<br>

&gt; &gt; &gt;   pacemaker: active/disabled<br>

&gt; &gt; &gt;   pcsd: active/enabled<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Fence_virtd config on one of the hypervisors:<br>

&gt; &gt; &gt; # cat fence_virt.conf<br>

&gt; &gt; &gt; backends {<br>

&gt; &gt; &gt;         libvirt {<br>

&gt; &gt; &gt;                 uri = &quot;qemu:///system&quot;;<br>

&gt; &gt; &gt;         }<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; }<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; listeners {<br>

&gt; &gt; &gt;         multicast {<br>

&gt; &gt; &gt;                 port = &quot;1229&quot;;<br>

&gt; &gt; &gt;                 family = &quot;ipv4&quot;;<br>

&gt; &gt; &gt;                 interface = &quot;br1&quot;;<br>

&gt; &gt; &gt;                 address = &quot;225.0.0.12&quot;;<br>

&gt; &gt; &gt;                 key_file = &quot;/etc/cluster/fence_xvm_ceph1.key&quot;;<br>

&gt; &gt; &gt;         }<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; }<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; fence_virtd {<br>

&gt; &gt; &gt;         module_path = &quot;/usr/lib64/fence-virt&quot;;<br>

&gt; &gt; &gt;         backend = &quot;libvirt&quot;;<br>

&gt; &gt; &gt;         listener = &quot;multicast&quot;;<br>

&gt; &gt; &gt; }<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

&gt; &gt; <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt; &gt;<br>

&gt; &gt; Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

&gt; &gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; &gt; Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt; &gt;<br>

&gt;<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

-- Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

</font></span></blockquote></div><br></div>