<div dir="ltr"><div><div><div><div><div><div><div>Hey people <br><br></div>I&#39;m experiencing very strange issue and it&#39;s appearing every time I try to fence a node.<br></div>I
 have a test environment with three node cluster (CentOS 6.6 x86_64) 
where rgmanager is replaced with pacemaker (CMAN + pacemaker).<br><br></div>I&#39;ve configured fencing with pcs for all three nodes<br><br>Pacemaker:<br><span style="font-family:monospace,monospace">pcs stonith create node1-ipmi \<br>fence_ipmilan
 pcmk_host_list=&quot;node1&quot; ipaddr=1.1.1.1 login=fencer passwd=******** 
privlvl=OPERATOR power_wait=10 lanplus=1 action=off \<br>op monitor interval=10s timeout=30s  <br><br></span></div><div><span style="font-family:monospace,monospace">pcs constraint location node1-ipmi avoids node1<br></span></div><div><span style="font-family:monospace,monospace"><br></span></div><span style="font-family:monospace,monospace">pcs property set stonith-enabled=true</span><br><br><br></div>CMAN - /etc/cluster/cluster.conf:<br><span style="font-family:monospace,monospace">&lt;?xml version=&quot;1.0&quot;?&gt;<br>&lt;cluster config_version=&quot;10&quot; name=&quot;mycluster&quot;&gt;<br>        &lt;fence_daemon/&gt;<br>        &lt;clusternodes&gt;<br>                &lt;clusternode name=&quot;node1&quot; nodeid=&quot;1&quot;&gt;<br>                        &lt;fence&gt;<br>                                &lt;method name=&quot;pcmk-redirect&quot;&gt;<br>                                        &lt;device action=&quot;off&quot; name=&quot;pcmk&quot; port=&quot;node1&quot;/&gt;<br>                                &lt;/method&gt;<br>                        &lt;/fence&gt;<br>                &lt;/clusternode&gt;<br>                &lt;clusternode name=&quot;node2&quot; nodeid=&quot;2&quot;&gt;<br>                        &lt;fence&gt;<br>                                &lt;method name=&quot;pcmk-redirect&quot;&gt;<br>                                        &lt;device action=&quot;off&quot; name=&quot;pcmk&quot; port=&quot;node2&quot;/&gt;<br>                                &lt;/method&gt;<br>                        &lt;/fence&gt;<br>                &lt;/clusternode&gt;<br>                &lt;clusternode name=&quot;node2&quot; nodeid=&quot;3&quot;&gt;<br>                        &lt;fence&gt;<br>                                &lt;method name=&quot;pcmk-redirect&quot;&gt;<br>                                        &lt;device action=&quot;off&quot; name=&quot;pcmk&quot; port=&quot;node2&quot;/&gt;<br>                                &lt;/method&gt;<br>                        &lt;/fence&gt;<br>                &lt;/clusternode&gt;<br>        &lt;/clusternodes&gt;<br>        &lt;cman/&gt;<br>        &lt;fencedevices&gt;<br>                &lt;fencedevice agent=&quot;fence_pcmk&quot; name=&quot;pcmk&quot;/&gt;<br>        &lt;/fencedevices&gt;<br>        &lt;rm&gt;<br>                &lt;failoverdomains/&gt;<br>                &lt;resources/&gt;<br>        &lt;/rm&gt;<br>        &lt;logging debug=&quot;on&quot;/&gt;<br>        &lt;quorumd interval=&quot;1&quot; label=&quot;QuorumDisk&quot; status_file=&quot;/qdisk_status&quot; tko=&quot;70&quot;/&gt;<br>        &lt;totem token=&quot;108000&quot;/&gt;<br>&lt;/cluster&gt;</span><br><br></div>Every
 time I try to fence a node I&#39;m getting a timeout error with node 
being fenced at the end (on second attempt) but I&#39;m wondering why it
 took so long to fence a node?<br><br></div><div>So when I run stonith_admin or fence_node (which at the end also runs stonith_admin, you can see that clearly from the log file) it&#39;s always failing on the first attempt, my guess probably  because it doesn&#39;t get status code or something like that:<br><span style="font-family:monospace,monospace">strace stonith_admin --fence node1 --tolerance 5s --tag cman<br></span><br></div><div>Partial output from strace:<br><span style="font-family:monospace,monospace">  ...<br>  <span style="color:rgb(255,0,0)">poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 500)   = 0 (Timeout)<br>  poll([{fd=4, events=POLLIN}], 1, 291)   = 0 (Timeout)<br>  </span>fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0<br>  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb2a8c37000<br>  write(1, &quot;Command failed: Timer expired\n&quot;, 30Command failed: Timer expired<br>  ) = 30<br>  poll([{fd=4, events=POLLIN}], 1, 0)     = 0 (Timeout)<br>  shutdown(4, 2 /* send and receive */)   = 0<br>  close(4)                                = 0<br>  munmap(0x7fb2a8b98000, 270336)          = 0<br>  munmap(0x7fb2a8bda000, 8248)            = 0<br>  munmap(0x7fb2a8b56000, 270336)          = 0<br>  munmap(0x7fb2a8c3b000, 8248)            = 0<br>  munmap(0x7fb2a8b14000, 270336)          = 0<br>  munmap(0x7fb2a8c38000, 8248)            = 0<br>  munmap(0x7fb2a8bdd000, 135168)          = 0<br>  munmap(0x7fb2a8bfe000, 135168)          = 0<br>  exit_group(-62)                         = ?</span><br><br><br>Or via cman:<br><span style="font-family:monospace,monospace">[node1:~]# fence_node -vv node3<br>fence node3 dev 0.0 agent fence_pcmk result: error from agent<br>agent args: action=off port=node3 timeout=15 nodename=node3 agent=fence_pcmk<br>fence node3 failed<br></span><br><br><span style="font-family:monospace,monospace">/var/log/messages:<br> 
 Jun 19 10:57:43 node1 stonith_admin[3804]:   notice: crm_log_args: 
Invoked: <span style="color:rgb(0,0,255)">stonith_admin --fence node1 --tolerance 5s --tag cman</span><br>  Jun 19 10:57:43 node1 stonith-ng[8283]:   notice: handle_request: Client stonith_admin.cman.3804.65de6378 wants to fence (off) &#39;node1&#39; with device &#39;(any)&#39;<br> 
 Jun 19 10:57:43 node1 stonith-ng[8283]:   notice: 
initiate_remote_stonith_op: Initiating remote operation off for node1: 
fbc7fe61-9451-4634-9c12-57d933ccd0a4 (  0)<br>  Jun 19 10:57:43 
node1 stonith-ng[8283]:   notice: can_fence_host_with_device: node2-ipmi
 can not fence (off) node1: static-list<br>  Jun 19 10:57:43 node1 stonith-ng[8283]:   notice: can_fence_host_with_device: node3-ipmi can fence (off) node3: static-list<br>  <span style="color:rgb(0,0,0)">Jun 19 10:57:54 node1 stonith-ng[8283]:  warning: get_xpath_object: No match for //@st_delegate in /st-reply</span><br>  Jun 19 10:59:00 node1 qdiskd[7409]: Node 3 evicted<br>  Jun 19 10:59:31 node1 corosync[7349]:   [TOTEM ] A processor failed, forming new configuration.<br>  Jun 19 11:01:21 node1 corosync[7349]:   [QUORUM] Members[2]: 1 2<br>  Jun 19 11:01:21 node1 corosync[7349]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.<br> 
 Jun 19 11:01:21 node1 crmd[8287]:   notice: crm_update_peer_state: 
cman_event_callback: Node node3[3] - state is now lost (was member)<br>  Jun 19 11:01:21 node1 kernel: dlm: closing connection to node 3<br> 
 Jun 19 11:01:21 node1 stonith-ng[8283]:   notice: remote_op_done: 
Operation off of node3 by node2 for stonith_admin.cman.3804@node1.  com.fbc7fe61: OK<br> 
 Jun 19 11:01:21 node1 crmd[8287]:   notice: tengine_stonith_notify: 
Peer node3 was terminated (off) by node2 for node1: OK (  
ref=fbc7fe61-9451-4634-9c12-57d933ccd0a4) by client stonith_admin.cman.3804<br>  Jun 19 11:01:21 node1 crmd[8287]:   notice: tengine_stonith_notify: Notified CMAN that &#39;node3&#39; is now fenced<br>  <br>  Jun 19 11:01:21 node1 fenced[7625]: fencing node node3<br>  Jun 19 11:01:22 node1 fence_pcmk[8067]: Requesting Pacemaker fence node3 (off)<br>  Jun 19 11:01:22 node1 stonith_admin[8068]:   notice: crm_log_args: Invoked: <span style="color:rgb(0,0,255)"><span style="background-color:rgb(255,255,255)">stonith_admin --fence node3 --tolerance 5s --tag cman</span></span><br>  Jun 19 11:01:22 node1 stonith-ng[8283]:   notice: handle_request: Client stonith_admin.cman.8068.fcd7f751 wants to fence (off) &#39;node3&#39; with device &#39;(any)&#39;<br> 
 Jun 19 11:01:22 node1 stonith-ng[8283]:   notice: 
stonith_check_fence_tolerance: Target node3 was fenced (off) less than 
5s ago by node2 on   behalf of node1<br>  Jun 19 11:01:22 node1 fenced[7625]: fence node3 success<br></span><br><br><br>    <span style="font-family:monospace,monospace">[node1:~]# ls -ahl /proc/</span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">22505</span>/fd<br>  total 0<br>  dr-x------ 2 root root  0 Jun 19 11:55 .<br>  dr-xr-xr-x 8 root root  0 Jun 19 11:55 ..<br>  lrwx------ 1 root root 64 Jun 19 11:56 0 -&gt; /dev/pts/8<br>  lrwx------ 1 root root 64 Jun 19 11:56 1 -&gt; /dev/pts/8<br>  lrwx------ 1 root root 64 Jun 19 11:55 2 -&gt; /dev/pts/8<br>  lrwx------ 1 root root 64 Jun 19 11:56 3 -&gt; socket:[4061683]<br> <span style="color:rgb(255,0,0)"> lrwx------ 1 root root 64 Jun 19 11:56 4 -&gt; socket:[4061684]</span><br><br>  [node1:~]# lsof -p 22505<br>  ...<br>  stonith_admin 22505 root    3u  unix 0xffff880c14889b80      0t0 </span><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace">4061683</span> socket<br>  <span style="color:rgb(255,0,0)">stonith_admin 22505 root    4u  unix 0xffff880c2a4fbc40      0t0 </span></span><span style="font-family:monospace,monospace"><span style="color:rgb(255,0,0)"><span style="font-family:monospace,monospace"><span style="color:rgb(255,0,0)">4061684</span></span> socket</span></span><br><br><br></div><div>Obviously
 it&#39;s trying to read some data from unix socket but doesn&#39;t get anything from the other side, is there 
anyone there who can explain me why fence command is always failing on first attempt?<br><br></div>Thanks</div>