<div dir="ltr">Thank you!<div><br></div><div>However, what is proper fencing in this situation?</div><div><br></div><div>Kind Regards,</div><div><br></div><div>Alex</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 1, 2015 at 11:30 PM, Ken Gaillot <span dir="ltr">&lt;<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 07/01/2015 09:39 AM, alex austin wrote:<br>

&gt; This is what crm_mon shows<br>

&gt;<br>

&gt;<br>

&gt; Last updated: Wed Jul  1 10:35:40 2015<br>

&gt;<br>

&gt; Last change: Wed Jul  1 09:52:46 2015<br>

&gt;<br>

&gt; Stack: classic openais (with plugin)<br>

&gt;<br>

&gt; Current DC: host2 - partition with quorum<br>

&gt;<br>

&gt; Version: 1.1.11-97629de<br>

&gt;<br>

&gt; 2 Nodes configured, 2 expected votes<br>

&gt;<br>

&gt; 4 Resources configured<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Online: [ host1 host2 ]<br>

&gt;<br>

&gt;<br>

&gt; ClusterIP (ocf::heartbeat:IPaddr2): Started host2<br>

&gt;<br>

&gt;  Master/Slave Set: redis_clone [redis]<br>

&gt;<br>

&gt;      Masters: [ host2 ]<br>

&gt;<br>

&gt;      Slaves: [ host1 ]<br>

&gt;<br>

&gt; pcmk-fencing    (stonith:fence_pcmk):   Started host2<br>

&gt;<br>

&gt; On Wed, Jul 1, 2015 at 3:37 PM, alex austin &lt;<a href="mailto:alexixalex@gmail.com">alexixalex@gmail.com</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; I am running version 1.4.7 of corosync<br>

<br>

</span>If you can&#39;t upgrade to corosync 2 (which has many improvements), you&#39;ll<br>

need to set the no-quorum-policy=ignore cluster option.<br>

<br>

Proper fencing is necessary to avoid a split-brain situation, which can<br>

corrupt your data.<br>

<div class="HOEnZb"><div class="h5"><br>

&gt;&gt; On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;&gt; On 07/01/2015 08:57 AM, alex austin wrote:<br>

&gt;&gt;&gt;&gt; I have now configured stonith-enabled=true. What device should I use for<br>

&gt;&gt;&gt;&gt; fencing given the fact that it&#39;s a virtual machine but I don&#39;t have<br>

&gt;&gt;&gt; access<br>

&gt;&gt;&gt;&gt; to its configuration. would fence_pcmk do? if so, what parameters<br>

&gt;&gt;&gt; should I<br>

&gt;&gt;&gt;&gt; configure for it to work properly?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; No, fence_pcmk is not for using in pacemaker, but for using in RHEL6&#39;s<br>

&gt;&gt;&gt; CMAN to redirect its fencing requests to pacemaker.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; For a virtual machine, ideally you&#39;d use fence_virtd running on the<br>

&gt;&gt;&gt; physical host, but I&#39;m guessing from your comment that you can&#39;t do<br>

&gt;&gt;&gt; that. Does whoever provides your VM also provide an API for controlling<br>

&gt;&gt;&gt; it (starting/stopping/rebooting)?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Regarding your original problem, it sounds like the surviving node<br>

&gt;&gt;&gt; doesn&#39;t have quorum. What version of corosync are you using? If you&#39;re<br>

&gt;&gt;&gt; using corosync 2, you need &quot;two_node: 1&quot; in corosync.conf, in addition<br>

&gt;&gt;&gt; to configuring fencing in pacemaker.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; This is my new config:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; node <a href="http://dcwbpvmuas004.edc.nam.gm.com" rel="noreferrer" target="_blank">dcwbpvmuas004.edc.nam.gm.com</a> \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         attributes standby=off<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; node <a href="http://dcwbpvmuas005.edc.nam.gm.com" rel="noreferrer" target="_blank">dcwbpvmuas005.edc.nam.gm.com</a> \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         attributes standby=off<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; primitive ClusterIP IPaddr2 \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         params ip=198.208.86.242 cidr_netmask=23 \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         op monitor interval=1s timeout=20s \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         op start interval=0 timeout=20s \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         op stop interval=0 timeout=20s \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         meta is-managed=true target-role=Started resource-stickiness=500<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; primitive pcmk-fencing stonith:fence_pcmk \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         params pcmk_host_list=&quot;<a href="http://dcwbpvmuas004.edc.nam.gm.com" rel="noreferrer" target="_blank">dcwbpvmuas004.edc.nam.gm.com</a><br>

&gt;&gt;&gt;&gt; <a href="http://dcwbpvmuas005.edc.nam.gm.com" rel="noreferrer" target="_blank">dcwbpvmuas005.edc.nam.gm.com</a>&quot; \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         op monitor interval=10s \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         meta target-role=Started<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; primitive redis redis \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         meta target-role=Master is-managed=true \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         op monitor interval=1s role=Master timeout=5s on-fail=restart<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; ms redis_clone redis \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         meta notify=true is-managed=true ordered=false interleave=false<br>

&gt;&gt;&gt;&gt; globally-unique=false target-role=Master migration-threshold=1<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; colocation ip-on-redis inf: ClusterIP redis_clone:Master<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; property cib-bootstrap-options: \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         dc-version=1.1.11-97629de \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         cluster-infrastructure=&quot;classic openais (with plugin)&quot; \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         expected-quorum-votes=2 \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         stonith-enabled=true<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; property redis_replication: \<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;         redis_REPL_INFO=<a href="http://dcwbpvmuas005.edc.nam.gm.com" rel="noreferrer" target="_blank">dcwbpvmuas005.edc.nam.gm.com</a><br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:alexander.nekrasov@emc.com">alexander.nekrasov@emc.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; stonith-enabled=false<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; this might be the issue. The way peer node death is resolved, the<br>

&gt;&gt;&gt;&gt;&gt; surviving node must call STONITH on the peer. If it’s disabled it<br>

&gt;&gt;&gt; might not<br>

&gt;&gt;&gt;&gt;&gt; be able to resolve the event<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Alex<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; *From:* alex austin [mailto:<a href="mailto:alexixalex@gmail.com">alexixalex@gmail.com</a>]<br>

&gt;&gt;&gt;&gt;&gt; *Sent:* Wednesday, July 01, 2015 9:51 AM<br>

&gt;&gt;&gt;&gt;&gt; *To:* <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

&gt;&gt;&gt;&gt;&gt; *Subject:* Re: [ClusterLabs] Pacemaker failover failure<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; So I noticed that if I kill redis on one node, it starts on the other,<br>

&gt;&gt;&gt; no<br>

&gt;&gt;&gt;&gt;&gt; problem, but if I actually kill pacemaker itself on one node, the other<br>

&gt;&gt;&gt;&gt;&gt; doesn&#39;t &quot;sense&quot; it so it doesn&#39;t fail over.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Wed, Jul 1, 2015 at 12:42 PM, alex austin &lt;<a href="mailto:alexixalex@gmail.com">alexixalex@gmail.com</a>&gt;<br>

&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Hi all,<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I have configured a virtual ip and redis in master-slave with corosync<br>

&gt;&gt;&gt;&gt;&gt; pacemaker. If redis fails, then the failover is successful, and redis<br>

&gt;&gt;&gt; gets<br>

&gt;&gt;&gt;&gt;&gt; promoted on the other node. However if pacemaker itself fails on the<br>

&gt;&gt;&gt; active<br>

&gt;&gt;&gt;&gt;&gt; node, the failover is not performed. Is there anything I missed in the<br>

&gt;&gt;&gt;&gt;&gt; configuration?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Here&#39;s my configuration (i have hashed the ip address out):<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; node <a href="http://host1.com" rel="noreferrer" target="_blank">host1.com</a><br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; node <a href="http://host2.com" rel="noreferrer" target="_blank">host2.com</a><br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; primitive ClusterIP IPaddr2 \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; op monitor interval=1s timeout=20s \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; op start interval=0 timeout=20s \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; op stop interval=0 timeout=20s \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; meta is-managed=true target-role=Started resource-stickiness=500<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; primitive redis redis \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; meta target-role=Master is-managed=true \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; op monitor interval=1s role=Master timeout=5s on-fail=restart<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; ms redis_clone redis \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; meta notify=true is-managed=true ordered=false interleave=false<br>

&gt;&gt;&gt;&gt;&gt; globally-unique=false target-role=Master migration-threshold=1<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; colocation ip-on-redis inf: ClusterIP redis_clone:Master<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; property cib-bootstrap-options: \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; dc-version=1.1.11-97629de \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; cluster-infrastructure=&quot;classic openais (with plugin)&quot; \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; expected-quorum-votes=2 \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; stonith-enabled=false<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; property redis_replication: \<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; redis_REPL_INFO=<a href="http://host.com" rel="noreferrer" target="_blank">host.com</a><br>

<br>

</div></div></blockquote></div><br></div>