<div class="zcontentRow"> <p><span style="line-height: 21px;">Is&nbsp;there&nbsp;a&nbsp;reason&nbsp;not&nbsp;to&nbsp;use&nbsp;a&nbsp;colocation&nbsp;constraint&nbsp;instead?&nbsp;If&nbsp;X_vip</span><br style="white-space: normal;"><span style="line-height: 21px;">is&nbsp;colocated&nbsp;with&nbsp;X,&nbsp;it&nbsp;will&nbsp;be&nbsp;moved&nbsp;if&nbsp;X&nbsp;fails.</span><br style="white-space: normal;"><br><span style="color: rgb(0, 112, 192);">[hhl]: the movement should take place as well if X stopped (the start is on-going). I don't know if the colocation would satisfy this requirement.</span><br><br style="white-space: normal;"><span style="line-height: 21px;">I&nbsp;don't&nbsp;see&nbsp;any&nbsp;reason&nbsp;in&nbsp;your&nbsp;configuration&nbsp;why&nbsp;the&nbsp;services&nbsp;wouldn't</span><br style="white-space: normal;"><span style="line-height: 21px;">be&nbsp;restarted.&nbsp;It's&nbsp;possible&nbsp;the&nbsp;cluster&nbsp;tried&nbsp;to&nbsp;restart&nbsp;the&nbsp;service,</span><br style="white-space: normal;"><span style="line-height: 21px;">but&nbsp;the&nbsp;stop&nbsp;action&nbsp;failed.&nbsp;Since&nbsp;you&nbsp;have&nbsp;stonith&nbsp;disabled,&nbsp;the&nbsp;cluster</span><br style="white-space: normal;"><span style="line-height: 21px;">can't&nbsp;recover&nbsp;from&nbsp;a&nbsp;failed&nbsp;stop&nbsp;action.</span></p><p><br></p><p><span style="color: rgb(0, 112, 192);">[hhl]: the ocf logs showed the pacemaker never entered the stop function in this case.</span><br style="white-space: normal;"><br style="white-space: normal;"><span style="line-height: 21px;">Is&nbsp;there&nbsp;a&nbsp;reason&nbsp;you&nbsp;disabled&nbsp;quorum?&nbsp;With&nbsp;3&nbsp;nodes,&nbsp;if&nbsp;they&nbsp;get&nbsp;split</span><br style="white-space: normal;"><span style="line-height: 21px;">into&nbsp;groups&nbsp;of&nbsp;1&nbsp;node&nbsp;and&nbsp;2&nbsp;nodes,&nbsp;quorum&nbsp;is&nbsp;what&nbsp;keeps&nbsp;the&nbsp;groups&nbsp;from</span><br style="white-space: normal;"><span style="line-height: 21px;">both&nbsp;starting&nbsp;all&nbsp;resources.</span></p><p><br></p><p><span style="color: rgb(0, 112, 192);">[hhl]: I enabled the&nbsp;<span style="line-height: 21px;">quorum and had a retry, the same happens.</span></span></p><p><span style="color: rgb(0, 112, 192);">b.t.w, I repeat sevaral times today, and found when I trigger the condition on one node that would fail all the clone resources, only one would get restared, the other two would fail to restart.</span></p><p><br></p><p><span style="color: rgb(0, 176, 80);">> trigger the failure conditon on <span style="line-height: 21px;">paas-controller-1</span></span></p><p><br></p><p><span style="color: rgb(0, 176, 80);">Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]</span></p><p><br></p><p><span style="color: rgb(0, 176, 80);">&nbsp;router_vip &nbsp; &nbsp; (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-2&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;sdclient_vip &nbsp; (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-3&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;apigateway_vip (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-3&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: sdclient_rep [sdclient]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-2 paas-controller-3 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Stopped: [ paas-controller-1 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: router_rep [router]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;router &nbsp; &nbsp; (ocf::heartbeat:router): &nbsp; &nbsp; &nbsp; &nbsp;Started paas-controller-1 FAILED&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-2 paas-controller-3 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: apigateway_rep [apigateway]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;apigateway (ocf::heartbeat:apigateway): &nbsp; &nbsp;Started paas-controller-1 FAILED&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-2 paas-controller-3 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;<br></span></p><p><span style="line-height: 21px; color: rgb(0, 176, 80);">>&nbsp;trigger the failure conditon on&nbsp;paas-controller-3</span></p><p><span style="line-height: 21px; color: rgb(0, 176, 80);"></span></p><p><span style="color: rgb(0, 176, 80);">Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]</span></p><p><br></p><p><span style="color: rgb(0, 176, 80);">&nbsp;router_vip &nbsp; &nbsp; (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-2&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;sdclient_vip &nbsp; (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-3&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;apigateway_vip (ocf::heartbeat:IPaddr2): &nbsp; &nbsp; &nbsp; Started paas-controller-3&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: sdclient_rep [sdclient]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;sdclient &nbsp; (ocf::heartbeat:sdclient): &nbsp; &nbsp; &nbsp;Started paas-controller-3 FAILED&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-1 paas-controller-2 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: router_rep [router]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-1 paas-controller-2 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Stopped: [ paas-controller-3 ]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp;Clone Set: apigateway_rep [apigateway]</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;apigateway (ocf::heartbeat:apigateway): &nbsp; &nbsp;Started paas-controller-3 FAILED&nbsp;</span></p><p><span style="color: rgb(0, 176, 80);">&nbsp; &nbsp; &nbsp;Started: [ paas-controller-1 paas-controller-2 ]</span></p><p><span style="line-height: 21px;"><br></span><br></p><p><br></p><div><div class="zhistoryRow" style="display:block"><div class="zhistoryDes" style="width: 100%; height: 28px; line-height: 28px; background-color: #E0E5E9; color: #1388FF; text-align: center;" language-data="HistoryOrgTxt">原始邮件</div><div id="zwriteHistoryContainer"><div class="control-group zhistoryPanel"><div class="zhistoryHeader" style="padding: 8px; background-color: #F5F6F8;"><div><strong language-data="HistorySenderTxt">发件人:</strong><span class="zreadUserName"> <kgaillot@redhat.com>;</span></div><div><strong language-data="HistoryTOTxt">收件人:</strong><span class="zreadUserName" style="display: inline-block;">何海龙10164561;</span></div><div><strong language-data="HistoryCCTxt">抄送人:</strong><span class="zreadUserName" style="display: inline-block;"> <users@clusterlabs.org>;</span></div><div><strong language-data="HistoryDateTxt">日 期 :</strong><span class="">2017年02月15日 06:14</span></div><div><strong language-data="HistorySubjectTxt">主 题 :</strong><span class="zreadTitle"><strong>Re: 答复: Re: [ClusterLabs] clone resource not get restarted on fail</strong></span></div></div><p class="zhistoryContent"><br></p><div>On&nbsp;02/13/2017&nbsp;07:08&nbsp;PM,&nbsp;he.hailong5@zte.com.cn&nbsp;wrote:<br>>&nbsp;Hi,<br>>&nbsp;<br>>&nbsp;<br>>&nbsp;>&nbsp;crm&nbsp;configure&nbsp;show<br>>&nbsp;<br>>&nbsp;+&nbsp;crm&nbsp;configure&nbsp;show<br>>&nbsp;<br>>&nbsp;node&nbsp;$id="336855579"&nbsp;paas-controller-1<br>>&nbsp;<br>>&nbsp;node&nbsp;$id="336855580"&nbsp;paas-controller-2<br>>&nbsp;<br>>&nbsp;node&nbsp;$id="336855581"&nbsp;paas-controller-3<br>>&nbsp;<br>>&nbsp;primitive&nbsp;apigateway&nbsp;ocf:heartbeat:apigateway&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;interval="2s"&nbsp;timeout="20s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="200s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="9999h"&nbsp;on-fail="restart"<br>>&nbsp;<br>>&nbsp;primitive&nbsp;apigateway_vip&nbsp;ocf:heartbeat:IPaddr2&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;params&nbsp;ip="20.20.2.7"&nbsp;cidr_netmask="24"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;timeout="20s"&nbsp;interval="2s"&nbsp;depth="0"<br>>&nbsp;<br>>&nbsp;primitive&nbsp;router&nbsp;ocf:heartbeat:router&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;interval="2s"&nbsp;timeout="20s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="200s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="9999h"&nbsp;on-fail="restart"<br>>&nbsp;<br>>&nbsp;primitive&nbsp;router_vip&nbsp;ocf:heartbeat:IPaddr2&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;params&nbsp;ip="10.10.1.7"&nbsp;cidr_netmask="24"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;timeout="20s"&nbsp;interval="2s"&nbsp;depth="0"<br>>&nbsp;<br>>&nbsp;primitive&nbsp;sdclient&nbsp;ocf:heartbeat:sdclient&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;interval="2s"&nbsp;timeout="20s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="200s"&nbsp;on-fail="restart"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="9999h"&nbsp;on-fail="restart"<br>>&nbsp;<br>>&nbsp;primitive&nbsp;sdclient_vip&nbsp;ocf:heartbeat:IPaddr2&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;params&nbsp;ip="10.10.1.8"&nbsp;cidr_netmask="24"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;start&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;stop&nbsp;interval="0"&nbsp;timeout="20"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;op&nbsp;monitor&nbsp;timeout="20s"&nbsp;interval="2s"&nbsp;depth="0"<br>>&nbsp;<br>>&nbsp;clone&nbsp;apigateway_rep&nbsp;apigateway<br>>&nbsp;<br>>&nbsp;clone&nbsp;router_rep&nbsp;router<br>>&nbsp;<br>>&nbsp;clone&nbsp;sdclient_rep&nbsp;sdclient<br>>&nbsp;<br>>&nbsp;location&nbsp;apigateway_loc&nbsp;apigateway_vip&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rule&nbsp;$id="apigateway_loc-rule"&nbsp;+inf:&nbsp;apigateway_workable&nbsp;eq&nbsp;1<br>>&nbsp;<br>>&nbsp;location&nbsp;router_loc&nbsp;router_vip&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rule&nbsp;$id="router_loc-rule"&nbsp;+inf:&nbsp;router_workable&nbsp;eq&nbsp;1<br>>&nbsp;<br>>&nbsp;location&nbsp;sdclient_loc&nbsp;sdclient_vip&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rule&nbsp;$id="sdclient_loc-rule"&nbsp;+inf:&nbsp;sdclient_workable&nbsp;eq&nbsp;1<br>>&nbsp;<br>>&nbsp;property&nbsp;$id="cib-bootstrap-options"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dc-version="1.1.10-42f2063"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cluster-infrastructure="corosync"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stonith-enabled="false"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;no-quorum-policy="ignore"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;start-failure-is-fatal="false"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;last-lrm-refresh="1486981647"<br>>&nbsp;<br>>&nbsp;op_defaults&nbsp;$id="op_defaults-options"&nbsp;\<br>>&nbsp;<br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;on-fail="restart"<br>>&nbsp;<br>>&nbsp;-------------------------------------------------------------------------------------------------<br>>&nbsp;<br>>&nbsp;<br>>&nbsp;and&nbsp;B.T.W,&nbsp;I&nbsp;am&nbsp;using&nbsp;"crm_attribute&nbsp;-N&nbsp;$HOSTNAME&nbsp;-q&nbsp;-l&nbsp;reboot&nbsp;--name<br>>&nbsp;<prefix>_workable&nbsp;-v&nbsp;<1&nbsp;or&nbsp;0>"&nbsp;in&nbsp;the&nbsp;monitor&nbsp;to&nbsp;update&nbsp;the<br>>&nbsp;transient&nbsp;attributes,&nbsp;which&nbsp;control&nbsp;the&nbsp;vip&nbsp;location.<br><br>Is&nbsp;there&nbsp;a&nbsp;reason&nbsp;not&nbsp;to&nbsp;use&nbsp;a&nbsp;colocation&nbsp;constraint&nbsp;instead?&nbsp;If&nbsp;X_vip<br>is&nbsp;colocated&nbsp;with&nbsp;X,&nbsp;it&nbsp;will&nbsp;be&nbsp;moved&nbsp;if&nbsp;X&nbsp;fails.<br><br>I&nbsp;don't&nbsp;see&nbsp;any&nbsp;reason&nbsp;in&nbsp;your&nbsp;configuration&nbsp;why&nbsp;the&nbsp;services&nbsp;wouldn't<br>be&nbsp;restarted.&nbsp;It's&nbsp;possible&nbsp;the&nbsp;cluster&nbsp;tried&nbsp;to&nbsp;restart&nbsp;the&nbsp;service,<br>but&nbsp;the&nbsp;stop&nbsp;action&nbsp;failed.&nbsp;Since&nbsp;you&nbsp;have&nbsp;stonith&nbsp;disabled,&nbsp;the&nbsp;cluster<br>can't&nbsp;recover&nbsp;from&nbsp;a&nbsp;failed&nbsp;stop&nbsp;action.<br><br>Is&nbsp;there&nbsp;a&nbsp;reason&nbsp;you&nbsp;disabled&nbsp;quorum?&nbsp;With&nbsp;3&nbsp;nodes,&nbsp;if&nbsp;they&nbsp;get&nbsp;split<br>into&nbsp;groups&nbsp;of&nbsp;1&nbsp;node&nbsp;and&nbsp;2&nbsp;nodes,&nbsp;quorum&nbsp;is&nbsp;what&nbsp;keeps&nbsp;the&nbsp;groups&nbsp;from<br>both&nbsp;starting&nbsp;all&nbsp;resources.<br><br>>&nbsp;and&nbsp;also&nbsp;found,&nbsp;the&nbsp;vip&nbsp;resource&nbsp;won't&nbsp;get&nbsp;moved&nbsp;if&nbsp;the&nbsp;related&nbsp;clone<br>>&nbsp;resource&nbsp;failed&nbsp;to&nbsp;restart.<br>>&nbsp;<br>>&nbsp;<br>>&nbsp;原始邮件<br>>&nbsp;*发件人:*<kgaillot@redhat.com>;<br>>&nbsp;*收件人:*<users@clusterlabs.org>;<br>>&nbsp;*日&nbsp;期&nbsp;:*2017年02月13日&nbsp;23:04<br>>&nbsp;*主&nbsp;题&nbsp;:**Re:&nbsp;[ClusterLabs]&nbsp;clone&nbsp;resource&nbsp;not&nbsp;get&nbsp;restarted&nbsp;on&nbsp;fail*<br>>&nbsp;<br>>&nbsp;<br>>&nbsp;On&nbsp;02/13/2017&nbsp;07:57&nbsp;AM,&nbsp;he.hailong5@zte.com.cn&nbsp;wrote:<br>>&nbsp;>&nbsp;Pacemaker&nbsp;1.1.10<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;Corosync&nbsp;2.3.3<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;this&nbsp;is&nbsp;a&nbsp;3&nbsp;nodes&nbsp;cluster&nbsp;configured&nbsp;with&nbsp;3&nbsp;clone&nbsp;resources,&nbsp;each<br>>&nbsp;>&nbsp;attached&nbsp;wih&nbsp;a&nbsp;vip&nbsp;resource&nbsp;of&nbsp;IPAddr2:<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;>crm&nbsp;status<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;Online:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;router_vip&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-1&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;sdclient_vip&nbsp;&nbsp;&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-3&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;apigateway_vip&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-2&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;sdclient_rep&nbsp;[sdclient]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;router_rep&nbsp;[router]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;apigateway_rep&nbsp;[apigateway]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;It&nbsp;is&nbsp;observed&nbsp;that&nbsp;sometimes&nbsp;the&nbsp;clone&nbsp;resource&nbsp;is&nbsp;stuck&nbsp;to&nbsp;monitor<br>>&nbsp;>&nbsp;when&nbsp;the&nbsp;service&nbsp;fails:<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;router_vip&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-1&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;sdclient_vip&nbsp;&nbsp;&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-2&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;apigateway_vip&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-3&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;sdclient_rep&nbsp;[sdclient]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Stopped:&nbsp;[&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;router_rep&nbsp;[router]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;router&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(ocf::heartbeat:router):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started<br>>&nbsp;>&nbsp;paas-controller-3&nbsp;FAILED&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;apigateway_rep&nbsp;[apigateway]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;apigateway&nbsp;(ocf::heartbeat:apigateway):&nbsp;&nbsp;&nbsp;&nbsp;Started<br>>&nbsp;>&nbsp;paas-controller-3&nbsp;FAILED&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;in&nbsp;the&nbsp;example&nbsp;above.&nbsp;the&nbsp;sdclient_rep&nbsp;get&nbsp;restarted&nbsp;on&nbsp;node&nbsp;3,&nbsp;while<br>>&nbsp;>&nbsp;the&nbsp;other&nbsp;two&nbsp;hang&nbsp;at&nbsp;monitoring&nbsp;on&nbsp;node&nbsp;3,&nbsp;here&nbsp;are&nbsp;the&nbsp;ocf&nbsp;logs:<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;abnormal&nbsp;(apigateway_rep):<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:53&nbsp;[23586]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;health&nbsp;check.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:53&nbsp;[23586]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;health&nbsp;check&nbsp;succeed.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[24010]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;health&nbsp;check.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[24010]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Failed:&nbsp;docker&nbsp;daemon&nbsp;is&nbsp;not&nbsp;running.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:57&nbsp;[24095]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;health&nbsp;check.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:57&nbsp;[24095]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Failed:&nbsp;docker&nbsp;daemon&nbsp;is&nbsp;not&nbsp;running.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:59&nbsp;[24159]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;health&nbsp;check.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:59&nbsp;[24159]===print_log&nbsp;test_monitor&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Failed:&nbsp;docker&nbsp;daemon&nbsp;is&nbsp;not&nbsp;running.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;normal&nbsp;(sdclient_rep):<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:52&nbsp;[23507]===print_log&nbsp;sdclient_monitor&nbsp;run_func<br>>&nbsp;>&nbsp;main===&nbsp;health&nbsp;check&nbsp;succeed.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:54&nbsp;[23630]===print_log&nbsp;sdclient_monitor&nbsp;run_func<br>>&nbsp;>&nbsp;main===&nbsp;Starting&nbsp;health&nbsp;check.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:54&nbsp;[23630]===print_log&nbsp;sdclient_monitor&nbsp;run_func<br>>&nbsp;>&nbsp;main===&nbsp;Failed:&nbsp;docker&nbsp;daemon&nbsp;is&nbsp;not&nbsp;running.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[23710]===print_log&nbsp;sdclient_stop&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;stop&nbsp;the&nbsp;container.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[23710]===print_log&nbsp;sdclient_stop&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;docker&nbsp;daemon&nbsp;lost,&nbsp;pretend&nbsp;stop&nbsp;succeed.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[23763]===print_log&nbsp;sdclient_start&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;Starting&nbsp;run&nbsp;the&nbsp;container.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:27:55&nbsp;[23763]===print_log&nbsp;sdclient_start&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;docker&nbsp;daemon&nbsp;lost,&nbsp;try&nbsp;again&nbsp;in&nbsp;5&nbsp;secs.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:28:00&nbsp;[23763]===print_log&nbsp;sdclient_start&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;docker&nbsp;daemon&nbsp;lost,&nbsp;try&nbsp;again&nbsp;in&nbsp;5&nbsp;secs.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;2017-02-13&nbsp;18:28:05&nbsp;[23763]===print_log&nbsp;sdclient_start&nbsp;run_func&nbsp;main===<br>>&nbsp;>&nbsp;docker&nbsp;daemon&nbsp;lost,&nbsp;try&nbsp;again&nbsp;in&nbsp;5&nbsp;secs.<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;If&nbsp;I&nbsp;disable&nbsp;2&nbsp;clone&nbsp;resource,&nbsp;the&nbsp;switch&nbsp;over&nbsp;test&nbsp;for&nbsp;one&nbsp;clone<br>>&nbsp;>&nbsp;resource&nbsp;works&nbsp;as&nbsp;expected:&nbsp;fail&nbsp;the&nbsp;service&nbsp;->&nbsp;monitor&nbsp;fails&nbsp;->&nbsp;stop<br>>&nbsp;>&nbsp;->&nbsp;start<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;Online:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;sdclient_vip&nbsp;&nbsp;&nbsp;(ocf::heartbeat:IPaddr2):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started&nbsp;paas-controller-2&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;Clone&nbsp;Set:&nbsp;sdclient_rep&nbsp;[sdclient]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Started:&nbsp;[&nbsp;paas-controller-1&nbsp;paas-controller-2&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Stopped:&nbsp;[&nbsp;paas-controller-3&nbsp;]<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;<br>>&nbsp;>&nbsp;what's&nbsp;the&nbsp;reason&nbsp;behind????&nbsp;<br>>&nbsp;<br>>&nbsp;Can&nbsp;you&nbsp;show&nbsp;the&nbsp;configuration&nbsp;of&nbsp;the&nbsp;three&nbsp;clones,&nbsp;their&nbsp;operations,<br>>&nbsp;and&nbsp;any&nbsp;constraints?<br>>&nbsp;<br>>&nbsp;Normally,&nbsp;the&nbsp;response&nbsp;is&nbsp;controlled&nbsp;by&nbsp;the&nbsp;monitor&nbsp;operation's&nbsp;on-fail<br>>&nbsp;attribute&nbsp;(which&nbsp;defaults&nbsp;to&nbsp;restart).<br></div><p><br></p></div></div></div></div><p><br></p> </div>