<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 28, 2016 at 11:51 AM, Klaus Wenninger <span dir="ltr">&lt;<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="">On 06/28/2016 11:24 AM, Marcin Dulak wrote:<br>

&gt;<br>

&gt;<br>

&gt; On Tue, Jun 28, 2016 at 5:04 AM, Andrew Beekhof &lt;<a href="mailto:abeekhof@redhat.com">abeekhof@redhat.com</a><br>

</span><span class="">&gt; &lt;mailto:<a href="mailto:abeekhof@redhat.com">abeekhof@redhat.com</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt;     On Sun, Jun 26, 2016 at 6:05 AM, Marcin Dulak<br>

</span><div><div class="h5">&gt;     &lt;<a href="mailto:marcin.dulak@gmail.com">marcin.dulak@gmail.com</a> &lt;mailto:<a href="mailto:marcin.dulak@gmail.com">marcin.dulak@gmail.com</a>&gt;&gt; wrote:<br>

&gt;     &gt; Hi,<br>

&gt;     &gt;<br>

&gt;     &gt; I&#39;m trying to get familiar with STONITH Block Devices (SBD) on a<br>

&gt;     3-node<br>

&gt;     &gt; CentOS7 built in VirtualBox.<br>

&gt;     &gt; The complete setup is available at<br>

&gt;     &gt; <a href="https://github.com/marcindulak/vagrant-sbd-tutorial-centos7.git" rel="noreferrer" target="_blank">https://github.com/marcindulak/vagrant-sbd-tutorial-centos7.git</a><br>

&gt;     &gt; so hopefully with some help I&#39;ll be able to make it work.<br>

&gt;     &gt;<br>

&gt;     &gt; Question 1:<br>

&gt;     &gt; The shared device /dev/sbd1 is the VirtualBox&#39;s &quot;shareable hard<br>

&gt;     disk&quot;<br>

&gt;     &gt; <a href="https://www.virtualbox.org/manual/ch05.html#hdimagewrites" rel="noreferrer" target="_blank">https://www.virtualbox.org/manual/ch05.html#hdimagewrites</a><br>

&gt;     &gt; will SBD fencing work with that type of storage?<br>

&gt;<br>

&gt;     unknown<br>

&gt;<br>

&gt;     &gt;<br>

&gt;     &gt; I start the cluster using vagrant_1.8.1 and virtualbox-4.3 with:<br>

&gt;     &gt; $ vagrant up  # takes ~15 minutes<br>

&gt;     &gt;<br>

&gt;     &gt; The setup brings up the nodes, installs the necessary packages,<br>

&gt;     and prepares<br>

&gt;     &gt; for the configuration of the pcs cluster.<br>

&gt;     &gt; You can see which scripts the nodes execute at the bottom of the<br>

&gt;     &gt; Vagrantfile.<br>

&gt;     &gt; While there is &#39;yum -y install sbd&#39; on CentOS7 the fence_sbd<br>

&gt;     agent has not<br>

&gt;     &gt; been packaged yet.<br>

&gt;<br>

&gt;     you&#39;re not supposed to use it<br>

&gt;<br>

&gt;     &gt; Therefore I rebuild Fedora 24 package using the latest<br>

&gt;     &gt; <a href="https://github.com/ClusterLabs/fence-agents/archive/v4.0.22.tar.gz" rel="noreferrer" target="_blank">https://github.com/ClusterLabs/fence-agents/archive/v4.0.22.tar.gz</a><br>

&gt;     &gt; plus the update to the fence_sbd from<br>

&gt;     &gt; <a href="https://github.com/ClusterLabs/fence-agents/pull/73" rel="noreferrer" target="_blank">https://github.com/ClusterLabs/fence-agents/pull/73</a><br>

&gt;     &gt;<br>

&gt;     &gt; The configuration is inspired by<br>

&gt;     &gt; <a href="https://www.novell.com/support/kb/doc.php?id=7009485" rel="noreferrer" target="_blank">https://www.novell.com/support/kb/doc.php?id=7009485</a> and<br>

&gt;     &gt;<br>

&gt;     <a href="https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_storage_protect_fencing.html" rel="noreferrer" target="_blank">https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_storage_protect_fencing.html</a><br>

&gt;     &gt;<br>

&gt;     &gt; Question 2:<br>

&gt;     &gt; After reading<br>

&gt;     <a href="http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit" rel="noreferrer" target="_blank">http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit</a> I<br>

&gt;     &gt; expect with just one stonith resource configured<br>

&gt;<br>

&gt;     there shouldn&#39;t be any stonith resources configured<br>

&gt;<br>

&gt;<br>

&gt; It&#39;s a test setup.<br>

</div></div>&gt; Foundhttps://<a href="http://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_storage_protect_fencing.html" rel="noreferrer" target="_blank">www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_storage_protect_fencing.html</a><br>

<span class="">&gt;<br>

&gt; crm configure<br>

&gt; property stonith-enabled=&quot;true&quot;<br>

&gt; property stonith-timeout=&quot;40s&quot;<br>

&gt; primitive stonith_sbd stonith:external/sbd op start interval=&quot;0&quot;<br>

&gt; timeout=&quot;15&quot; start-delay=&quot;10&quot;<br>

&gt; commit<br>

&gt; quit<br>

<br>

</span>For what is supported (self-fencing by watchdog) the stonith-resource is<br>

just not needed because<br>

of sbd and pacemaker interacting via cib.<br>

<span class=""><br>

&gt;<br>

&gt;<br>

&gt; and trying to configure CentOS7 similarly.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;     &gt; a node will be fenced when I stop pacemaker and corosync `pcs<br>

&gt;     cluster stop<br>

&gt;     &gt; node-1` or just `stonith_admin -F node-1`, but this is not the case.<br>

&gt;     &gt;<br>

&gt;     &gt; As can be seen below from uptime, the node-1 is not shutdown by<br>

&gt;     `pcs cluster<br>

&gt;     &gt; stop node-1` executed on itself.<br>

&gt;     &gt; I found some discussions on <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

</span>&gt;     &lt;mailto:<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>&gt; about whether a node<br>

<span class="">&gt;     &gt; running SBD resource can fence itself,<br>

&gt;     &gt; but the conclusion was not clear to me.<br>

&gt;<br>

&gt;     on RHEL and derivatives it can ONLY fence itself. the disk based<br>

&gt;     posion pill isn&#39;t supported yet<br>

&gt;<br>

&gt;<br>

&gt; once it&#39;s supported on RHEL I&#39;ll be ready :)<br>

</span>Meaning not supported in this case doesn&#39;t (just) mean that you will<br>

receive - if at<br>

all - very limited help, but that sbd is built with &quot;--disable-shared-disk&quot;.<br>

So unless you rebuild the package accordingly (with the other type of not<br>

supported then ;-) ) testing with a block-device won&#39;t make much sense I<br>

guess.<br></blockquote><div><br></div><div>I rebuilt also the sbd: <a href="https://github.com/marcindulak/vagrant-sbd-tutorial-centos7/blob/master/sbd_build.sh">https://github.com/marcindulak/vagrant-sbd-tutorial-centos7/blob/master/sbd_build.sh</a><br></div><div>because I noticed that for the same version of sbd-1.2.1 as CentOS7, Fedora does not use --disable-shared-disk anymore:<br><a href="http://pkgs.fedoraproject.org/cgit/rpms/sbd.git/tree/sbd.spec">http://pkgs.fedoraproject.org/cgit/rpms/sbd.git/tree/sbd.spec</a><br><br></div><div>Marcin<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I&#39;m already a little surprised that you get what you get ;-)<br>

<div><div class="h5">&gt;<br>

&gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Question 3:<br>

&gt;     &gt; Neither node-1 is fenced by `stonith_admin -F node-1` executed<br>

&gt;     on node-2,<br>

&gt;     &gt; despite the fact<br>

&gt;     &gt; /var/log/messages on node-2 (the one currently running<br>

&gt;     MyStonith) reporting:<br>

&gt;     &gt; ...<br>

&gt;     &gt; notice: Operation &#39;off&#39; [3309] (call 2 from stonith_admin.3288)<br>

&gt;     for host<br>

&gt;     &gt; &#39;node-1&#39; with device &#39;MyStonith&#39; returned: 0 (OK)<br>

&gt;     &gt; ...<br>

&gt;     &gt; What is happening here?<br>

&gt;<br>

&gt;     have you tried looking at the sbd logs?<br>

&gt;     is the watchdog device functioning correctly?<br>

&gt;<br>

&gt;<br>

&gt; it turned out (suggested here<br>

&gt; <a href="http://clusterlabs.org/pipermail/users/2016-June/003355.html" rel="noreferrer" target="_blank">http://clusterlabs.org/pipermail/users/2016-June/003355.html</a>) that the<br>

&gt; reason for node-1 not being fenced by `stonith_admin -F node-1`<br>

&gt; executed on node-2<br>

&gt; was the previously executed `pcs cluster stop node-1`. In my setup SBD<br>

&gt; seems integrated with corosync/pacemaker and the latter command<br>

&gt; stopped the sbd service on node-1.<br>

&gt; Killing corosync on node-1 instead of `pcs cluster stop node-1` fences<br>

&gt; node-1 as expected:<br>

&gt;<br>

&gt; [root at node-1 ~]# killall -15 corosync<br>

&gt; Broadcast message from systemd-journald at node-1 (Sat 2016-06-25<br>

&gt; 21:55:07 EDT):<br>

&gt; sbd[4761]:  /dev/sdb1:    emerg: do_exit: Rebooting system: off<br>

&gt;<br>

&gt; I&#39;m left with further questions: how to setup fence_sbd for the fenced<br>

&gt; node to shutdown instead of reboot?<br>

&gt; Both action=off or mode=onoff action=off options passed to fence_sbd<br>

&gt; when creating the MyStonith resource result in a reboot.<br>

&gt;<br>

&gt; [root at node-2 ~]# pcs stonith show MyStonith<br>

&gt;  Resource: MyStonith (class=stonith type=fence_sbd)<br>

&gt;   Attributes: devices=/dev/sdb1 power_timeout=21 action=off<br>

&gt;   Operations: monitor interval=60s (MyStonith-monitor-interval-60s)<br>

&gt;<br>

&gt; [root@node-2 ~]# pcs status<br>

&gt; Cluster name: mycluster<br>

&gt; Last updated: Tue Jun 28 04:55:43 2016        Last change: Tue Jun 28<br>

&gt; 04:48:03 2016 by root via cibadmin on node-1<br>

&gt; Stack: corosync<br>

&gt; Current DC: node-3 (version 1.1.13-10.el7_2.2-44eb2dd) - partition<br>

&gt; with quorum<br>

&gt; 3 nodes and 1 resource configured<br>

&gt;<br>

&gt; Online: [ node-1 node-2 node-3 ]<br>

&gt;<br>

&gt; Full list of resources:<br>

&gt;<br>

&gt;  MyStonith    (stonith:fence_sbd):    Started node-2<br>

&gt;<br>

&gt; PCSD Status:<br>

&gt;   node-1: Online<br>

&gt;   node-2: Online<br>

&gt;   node-3: Online<br>

&gt;<br>

&gt; Daemon Status:<br>

&gt;   corosync: active/disabled<br>

&gt;   pacemaker: active/disabled<br>

&gt;   pcsd: active/enabled<br>

&gt;<br>

&gt; Starting from the above cluster state:<br>

&gt; [root@node-2 ~]# stonith_admin -F node-1<br>

&gt; results also in a reboot of node-1 instead of shutdown.<br>

&gt;<br>

&gt; /var/log/messages after the last command show &quot;reboot&quot; on node-2<br>

&gt; ...<br>

&gt; Jun 28 04:49:39 localhost stonith-ng[3081]:  notice: Client<br>

&gt; stonith_admin.3179.fbc038ee wants to fence (off) &#39;node-1&#39; with device<br>

&gt; &#39;(any)&#39;<br>

&gt; Jun 28 04:49:39 localhost stonith-ng[3081]:  notice: Initiating remote<br>

&gt; operation off for node-1: 8aea4f12-538d-41ab-bf20-0c8b0f72e2a3 (0)<br>

&gt; Jun 28 04:49:39 localhost stonith-ng[3081]:  notice: watchdog can not<br>

&gt; fence (off) node-1: static-list<br>

&gt; Jun 28 04:49:40 localhost stonith-ng[3081]:  notice: MyStonith can<br>

&gt; fence (off) node-1: dynamic-list<br>

&gt; Jun 28 04:49:40 localhost stonith-ng[3081]:  notice: watchdog can not<br>

&gt; fence (off) node-1: static-list<br>

&gt; Jun 28 04:49:44 localhost stonith-ng[3081]:  notice:<br>

&gt; crm_update_peer_proc: Node node-1[1] - state is now lost (was member)<br>

&gt; Jun 28 04:49:44 localhost stonith-ng[3081]:  notice: Removing node-1/1<br>

&gt; from the membership list<br>

&gt; Jun 28 04:49:44 localhost stonith-ng[3081]:  notice: Purged 1 peers<br>

&gt; with id=1 and/or uname=node-1 from the membership cache<br>

&gt; Jun 28 04:49:45 localhost stonith-ng[3081]:  notice: MyStonith can<br>

&gt; fence (reboot) node-1: dynamic-list<br>

&gt; Jun 28 04:49:45 localhost stonith-ng[3081]:  notice: watchdog can not<br>

&gt; fence (reboot) node-1: static-list<br>

&gt; Jun 28 04:49:46 localhost stonith-ng[3081]:  notice: Operation reboot<br>

&gt; of node-1 by node-3 for crmd.3063@node-3.36859c4e: OK<br>

&gt; Jun 28 04:50:00 localhost stonith-ng[3081]:  notice: Operation &#39;off&#39;<br>

&gt; [3200] (call 2 from stonith_admin.3179) for host &#39;node-1&#39; with device<br>

&gt; &#39;MyStonith&#39; returned: 0 (OK)<br>

&gt; Jun 28 04:50:00 localhost stonith-ng[3081]:  notice: Operation off of<br>

&gt; node-1 by node-2 for stonith_admin.3179@node-2.8aea4f12: OK<br>

&gt; ...<br>

&gt;<br>

&gt;<br>

&gt; Another question (I think the question is valid also for a potential<br>

&gt; SUSE setup): What is the proper way of operating a cluster with SBD<br>

&gt; after node-1 was fenced?<br>

&gt;<br>

&gt; [root at node-2 ~]# sbd -d /dev/sdb1 list<br>

&gt; 0    node-3    clear<br>

&gt; 1    node-2    clear<br>

&gt; 2    node-1    off    node-2<br>

&gt;<br>

&gt; I found that executing sbd watch on node-1 clears the SBD status:<br>

&gt; [root at node-1 ~]# sbd -d /dev/sdb1 watch<br>

&gt; [root at node-1 ~]# sbd -d /dev/sdb1 list<br>

&gt; 0    node-3    clear<br>

&gt; 1    node-2    clear<br>

&gt; 2    node-1    clear<br>

&gt; Making sure that sbd is not running on node-1 (I can do that because<br>

&gt; node-1 is currently not a part of the cluster)<br>

&gt; [root at node-1 ~]# killall -15 sbd<br>

&gt; I have to kill sbd because it&#39;s integrated with corosync and corosync<br>

&gt; fails to start on node-1 with sbd already running.<br>

&gt;<br>

&gt; I can now join node-1 to the cluster from node-2:<br>

&gt; [root at node-2 ~]# pcs cluster start node-1<br>

&gt;<br>

&gt;<br>

&gt; Marcin<br>

&gt;<br>

&gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Question 4 (for the future):<br>

&gt;     &gt; Assuming the node-1 was fenced, what is the way of operating SBD?<br>

&gt;     &gt; I see the sbd lists now:<br>

&gt;     &gt; 0       node-3  clear<br>

&gt;     &gt; 1       node-1  off    node-2<br>

&gt;     &gt; 2       node-2  clear<br>

&gt;     &gt; How to clear the status of node-1?<br>

&gt;     &gt;<br>

&gt;     &gt; Question 5 (also for the future):<br>

&gt;     &gt; While the relation &#39;stonith-timeout = Timeout (msgwait) + 20%&#39;<br>

&gt;     presented<br>

&gt;     &gt; at<br>

&gt;     &gt;<br>

&gt;     <a href="https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_storage_protect_fencing.html" rel="noreferrer" target="_blank">https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_storage_protect_fencing.html</a><br>

&gt;     &gt; is clearly described, I wonder about the relation of<br>

&gt;     &#39;stonith-timeout&#39;<br>

&gt;     &gt; to other timeouts like the &#39;monitor interval=60s&#39; reported by<br>

&gt;     `pcs stonith<br>

&gt;     &gt; show MyStonith`.<br>

&gt;     &gt;<br>

&gt;     &gt; Here is how I configure the cluster and test it. The run.sh<br>

&gt;     script is<br>

&gt;     &gt; attached.<br>

&gt;     &gt;<br>

&gt;     &gt; $ sh -x run01.sh 2&gt;&amp;1 | tee run01.txt<br>

&gt;     &gt;<br>

&gt;     &gt; with the result:<br>

&gt;     &gt;<br>

&gt;     &gt; $ cat run01.txt<br>

&gt;     &gt;<br>

&gt;     &gt; Each block below shows the executed ssh command and the result.<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs cluster auth -u hacluster -p<br>

&gt;     password node-1<br>

&gt;     &gt; node-2 node-3&#39;<br>

&gt;     &gt; node-1: Authorized<br>

&gt;     &gt; node-3: Authorized<br>

&gt;     &gt; node-2: Authorized<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs cluster setup --name mycluster<br>

&gt;     node-1 node-2<br>

&gt;     &gt; node-3&#39;<br>

&gt;     &gt; Shutting down pacemaker/corosync services...<br>

&gt;     &gt; Redirecting to /bin/systemctl stop  pacemaker.service<br>

&gt;     &gt; Redirecting to /bin/systemctl stop  corosync.service<br>

&gt;     &gt; Killing any remaining services...<br>

&gt;     &gt; Removing all cluster configuration files...<br>

&gt;     &gt; node-1: Succeeded<br>

&gt;     &gt; node-2: Succeeded<br>

&gt;     &gt; node-3: Succeeded<br>

&gt;     &gt; Synchronizing pcsd certificates on nodes node-1, node-2, node-3...<br>

&gt;     &gt; node-1: Success<br>

&gt;     &gt; node-3: Success<br>

&gt;     &gt; node-2: Success<br>

&gt;     &gt; Restaring pcsd on the nodes in order to reload the certificates...<br>

&gt;     &gt; node-1: Success<br>

&gt;     &gt; node-3: Success<br>

&gt;     &gt; node-2: Success<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs cluster start --all&#39;<br>

&gt;     &gt; node-3: Starting Cluster...<br>

&gt;     &gt; node-2: Starting Cluster...<br>

&gt;     &gt; node-1: Starting Cluster...<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;corosync-cfgtool -s&#39;<br>

&gt;     &gt; Printing ring status.<br>

&gt;     &gt; Local node ID 1<br>

&gt;     &gt; RING ID 0<br>

&gt;     &gt;     id    = 192.168.10.11<br>

&gt;     &gt;     status    = ring 0 active with no faults<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs status corosync&#39;<br>

&gt;     &gt; Membership information<br>

&gt;     &gt; ----------------------<br>

&gt;     &gt;     Nodeid      Votes Name<br>

&gt;     &gt;          1          1 node-1 (local)<br>

&gt;     &gt;          2          1 node-2<br>

&gt;     &gt;          3          1 node-3<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs status&#39;<br>

&gt;     &gt; Cluster name: mycluster<br>

&gt;     &gt; WARNING: no stonith devices and stonith-enabled is not false<br>

&gt;     &gt; Last updated: Sat Jun 25 15:40:51 2016        Last change: Sat<br>

&gt;     Jun 25<br>

&gt;     &gt; 15:40:33 2016 by hacluster via crmd on node-2<br>

&gt;     &gt; Stack: corosync<br>

&gt;     &gt; Current DC: node-2 (version 1.1.13-10.el7_2.2-44eb2dd) -<br>

&gt;     partition with<br>

&gt;     &gt; quorum<br>

&gt;     &gt; 3 nodes and 0 resources configured<br>

&gt;     &gt; Online: [ node-1 node-2 node-3 ]<br>

&gt;     &gt; Full list of resources:<br>

&gt;     &gt; PCSD Status:<br>

&gt;     &gt;   node-1: Online<br>

&gt;     &gt;   node-2: Online<br>

&gt;     &gt;   node-3: Online<br>

&gt;     &gt; Daemon Status:<br>

&gt;     &gt;   corosync: active/disabled<br>

&gt;     &gt;   pacemaker: active/disabled<br>

&gt;     &gt;   pcsd: active/enabled<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;sbd -d /dev/sdb1 list&#39;<br>

&gt;     &gt; 0    node-3    clear<br>

&gt;     &gt; 1    node-2    clear<br>

&gt;     &gt; 2    node-1    clear<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;sbd -d /dev/sdb1 dump&#39;<br>

&gt;     &gt; ==Dumping header on disk /dev/sdb1<br>

&gt;     &gt; Header version     : 2.1<br>

&gt;     &gt; UUID               : 79f28167-a207-4f2a-a723-aa1c00bf1dee<br>

&gt;     &gt; Number of slots    : 255<br>

&gt;     &gt; Sector size        : 512<br>

&gt;     &gt; Timeout (watchdog) : 10<br>

&gt;     &gt; Timeout (allocate) : 2<br>

&gt;     &gt; Timeout (loop)     : 1<br>

&gt;     &gt; Timeout (msgwait)  : 20<br>

&gt;     &gt; ==Header on disk /dev/sdb1 is dumped<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs stonith list&#39;<br>

&gt;     &gt; fence_sbd - Fence agent for sbd<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs stonith create MyStonith fence_sbd<br>

&gt;     &gt; devices=/dev/sdb1 power_timeout=21 action=off&#39;<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs property set stonith-enabled=true&#39;<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs property set stonith-timeout=24s&#39;<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs property&#39;<br>

&gt;     &gt; Cluster Properties:<br>

&gt;     &gt;  cluster-infrastructure: corosync<br>

&gt;     &gt;  cluster-name: mycluster<br>

&gt;     &gt;  dc-version: 1.1.13-10.el7_2.2-44eb2dd<br>

&gt;     &gt;  have-watchdog: true<br>

&gt;     &gt;  stonith-enabled: true<br>

&gt;     &gt;  stonith-timeout: 24s<br>

&gt;     &gt;  stonith-watchdog-timeout: 10s<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs stonith show MyStonith&#39;<br>

&gt;     &gt;  Resource: MyStonith (class=stonith type=fence_sbd)<br>

&gt;     &gt;   Attributes: devices=/dev/sdb1 power_timeout=21 action=off<br>

&gt;     &gt;   Operations: monitor interval=60s (MyStonith-monitor-interval-60s)<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;pcs cluster stop node-1 &#39;<br>

&gt;     &gt; node-1: Stopping Cluster (pacemaker)...<br>

&gt;     &gt; node-1: Stopping Cluster (corosync)...<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-2 -c sudo su - -c &#39;pcs status&#39;<br>

&gt;     &gt; Cluster name: mycluster<br>

&gt;     &gt; Last updated: Sat Jun 25 15:42:29 2016        Last change: Sat<br>

&gt;     Jun 25<br>

&gt;     &gt; 15:41:09 2016 by root via cibadmin on node-1<br>

&gt;     &gt; Stack: corosync<br>

&gt;     &gt; Current DC: node-2 (version 1.1.13-10.el7_2.2-44eb2dd) -<br>

&gt;     partition with<br>

&gt;     &gt; quorum<br>

&gt;     &gt; 3 nodes and 1 resource configured<br>

&gt;     &gt; Online: [ node-2 node-3 ]<br>

&gt;     &gt; OFFLINE: [ node-1 ]<br>

&gt;     &gt; Full list of resources:<br>

&gt;     &gt;  MyStonith    (stonith:fence_sbd):    Started node-2<br>

&gt;     &gt; PCSD Status:<br>

&gt;     &gt;   node-1: Online<br>

&gt;     &gt;   node-2: Online<br>

&gt;     &gt;   node-3: Online<br>

&gt;     &gt; Daemon Status:<br>

&gt;     &gt;   corosync: active/disabled<br>

&gt;     &gt;   pacemaker: active/disabled<br>

&gt;     &gt;   pcsd: active/enabled<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-2 -c sudo su - -c &#39;stonith_admin -F node-1 &#39;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-2 -c sudo su - -c &#39;grep stonith-ng /var/log/messages&#39;<br>

&gt;     &gt; Jun 25 15:40:11 localhost stonith-ng[3102]:  notice: Additional<br>

&gt;     logging<br>

&gt;     &gt; available in /var/log/cluster/corosync.log<br>

&gt;     &gt; Jun 25 15:40:11 localhost stonith-ng[3102]:  notice: Connecting<br>

&gt;     to cluster<br>

&gt;     &gt; infrastructure: corosync<br>

&gt;     &gt; Jun 25 15:40:11 localhost stonith-ng[3102]:  notice:<br>

&gt;     crm_update_peer_proc:<br>

&gt;     &gt; Node node-2[2] - state is now member (was (null))<br>

&gt;     &gt; Jun 25 15:40:12 localhost stonith-ng[3102]:  notice: Watching<br>

&gt;     for stonith<br>

&gt;     &gt; topology changes<br>

&gt;     &gt; Jun 25 15:40:12 localhost stonith-ng[3102]:  notice: Added<br>

&gt;     &#39;watchdog&#39; to the<br>

&gt;     &gt; device list (1 active devices)<br>

&gt;     &gt; Jun 25 15:40:12 localhost stonith-ng[3102]:  notice:<br>

&gt;     crm_update_peer_proc:<br>

&gt;     &gt; Node node-3[3] - state is now member (was (null))<br>

&gt;     &gt; Jun 25 15:40:12 localhost stonith-ng[3102]:  notice:<br>

&gt;     crm_update_peer_proc:<br>

&gt;     &gt; Node node-1[1] - state is now member (was (null))<br>

&gt;     &gt; Jun 25 15:40:12 localhost stonith-ng[3102]:  notice: New<br>

&gt;     watchdog timeout<br>

&gt;     &gt; 10s (was 0s)<br>

&gt;     &gt; Jun 25 15:41:03 localhost stonith-ng[3102]:  notice: Relying on<br>

&gt;     watchdog<br>

&gt;     &gt; integration for fencing<br>

&gt;     &gt; Jun 25 15:41:04 localhost stonith-ng[3102]:  notice: Added<br>

&gt;     &#39;MyStonith&#39; to<br>

&gt;     &gt; the device list (2 active devices)<br>

&gt;     &gt; Jun 25 15:41:54 localhost stonith-ng[3102]:  notice:<br>

&gt;     crm_update_peer_proc:<br>

&gt;     &gt; Node node-1[1] - state is now lost (was member)<br>

&gt;     &gt; Jun 25 15:41:54 localhost stonith-ng[3102]:  notice: Removing<br>

&gt;     node-1/1 from<br>

&gt;     &gt; the membership list<br>

&gt;     &gt; Jun 25 15:41:54 localhost stonith-ng[3102]:  notice: Purged 1<br>

&gt;     peers with<br>

&gt;     &gt; id=1 and/or uname=node-1 from the membership cache<br>

&gt;     &gt; Jun 25 15:42:33 localhost stonith-ng[3102]:  notice: Client<br>

&gt;     &gt; stonith_admin.3288.eb400ac9 wants to fence (off) &#39;node-1&#39; with<br>

&gt;     device<br>

&gt;     &gt; &#39;(any)&#39;<br>

&gt;     &gt; Jun 25 15:42:33 localhost stonith-ng[3102]:  notice: Initiating<br>

&gt;     remote<br>

&gt;     &gt; operation off for node-1: 848cd1e9-55e4-4abc-8d7a-3762eaaf9ab4 (0)<br>

&gt;     &gt; Jun 25 15:42:33 localhost stonith-ng[3102]:  notice: watchdog<br>

&gt;     can not fence<br>

&gt;     &gt; (off) node-1: static-list<br>

&gt;     &gt; Jun 25 15:42:33 localhost stonith-ng[3102]:  notice: MyStonith<br>

&gt;     can fence<br>

&gt;     &gt; (off) node-1: dynamic-list<br>

&gt;     &gt; Jun 25 15:42:33 localhost stonith-ng[3102]:  notice: watchdog<br>

&gt;     can not fence<br>

&gt;     &gt; (off) node-1: static-list<br>

&gt;     &gt; Jun 25 15:42:54 localhost stonith-ng[3102]:  notice: Operation<br>

&gt;     &#39;off&#39; [3309]<br>

&gt;     &gt; (call 2 from stonith_admin.3288) for host &#39;node-1&#39; with device<br>

&gt;     &#39;MyStonith&#39;<br>

&gt;     &gt; returned: 0 (OK)<br>

&gt;     &gt; Jun 25 15:42:54 localhost stonith-ng[3102]:  notice: Operation<br>

&gt;     off of node-1<br>

&gt;     &gt; by node-2 for stonith_admin.3288@node-2.848cd1e9: OK<br>

&gt;     &gt; Jun 25 15:42:54 localhost stonith-ng[3102]: warning:<br>

&gt;     new_event_notification<br>

</div></div>&gt;     &gt; <a href="tel:%283102-3288-12" value="+13102328812">(3102-3288-12</a> &lt;tel:%283102-3288-12&gt;): Broken pipe (32)<br>

<span class="">&gt;     &gt; Jun 25 15:42:54 localhost stonith-ng[3102]: warning: st_notify_fence<br>

&gt;     &gt; notification of client stonith_admin.3288.eb400a failed: Broken<br>

&gt;     pipe (-32)<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;sbd -d /dev/sdb1 list&#39;<br>

&gt;     &gt; 0    node-3    clear<br>

&gt;     &gt; 1    node-2    clear<br>

&gt;     &gt; 2    node-1    off    node-2<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; ############################<br>

&gt;     &gt; ssh node-1 -c sudo su - -c &#39;uptime&#39;<br>

&gt;     &gt;  15:43:31 up 21 min,  2 users,  load average: 0.25, 0.18, 0.11<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Cheers,<br>

&gt;     &gt;<br>

&gt;     &gt; Marcin<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; _______________________________________________<br>

&gt;     &gt; Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

</span>&gt;     &lt;mailto:<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>&gt;<br>

<span class="">&gt;     &gt; <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;     &gt;<br>

&gt;     &gt; Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt;     &gt; Getting started:<br>

&gt;     <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt;     &gt; Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;     &gt;<br>

&gt;<br>

&gt;     _______________________________________________<br>

&gt;     Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

</span>&gt;     &lt;mailto:<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>&gt;<br>

<div class=""><div class="h5">&gt;     <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;<br>

&gt;     Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt;     Getting started:<br>

&gt;     <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt;     Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

&gt; <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;<br>

&gt; Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

<br>

<br>

_______________________________________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br></div></div>