<div dir="ltr"><div>Hi<br></div><div><div><span style="white-space:pre">                </span></div><div>I translated the a Postgresql multi state RA (<a href="https://github.com/dalibo/PAF">https://github.com/dalibo/PAF</a>) in Python (<a href="https://github.com/ulodciv/deploy_cluster">https://github.com/ulodciv/deploy_cluster</a>), and I have been editing it heavily.</div><div><br></div><div>In parallel I am writing unit tests and functional tests.</div><div><br></div><div>I am having an issue with a functional test that abruptly powers off a slave named says &quot;host3&quot; (hot standby PG instance). Later on I start the slave back. Once it is started, I run &quot;pcs cluster start host3&quot;. And this is where I start having a problem.</div><div><br></div><div>I check every second the output of &quot;pcs status xml&quot; until host3 is said to be ready as a slave again. In the following I assume that test3 is ready as a slave:</div><div><br></div><div>    &lt;nodes&gt;</div><div>        &lt;node name=&quot;test1&quot; id=&quot;1&quot; online=&quot;true&quot; standby=&quot;false&quot; standby_onfail=&quot;false&quot; maintenance=&quot;false&quot; pending=&quot;false&quot; unclean=&quot;false&quot; shutdown=&quot;false&quot; expected_up=&quot;true&quot; is_dc=&quot;false&quot; resources_running=&quot;2&quot; type=&quot;member&quot; /&gt;</div><div>        &lt;node name=&quot;test2&quot; id=&quot;2&quot; online=&quot;true&quot; standby=&quot;false&quot; standby_onfail=&quot;false&quot; maintenance=&quot;false&quot; pending=&quot;false&quot; unclean=&quot;false&quot; shutdown=&quot;false&quot; expected_up=&quot;true&quot; is_dc=&quot;true&quot; resources_running=&quot;1&quot; type=&quot;member&quot; /&gt;</div><div>        &lt;node name=&quot;test3&quot; id=&quot;3&quot; online=&quot;true&quot; standby=&quot;false&quot; standby_onfail=&quot;false&quot; maintenance=&quot;false&quot; pending=&quot;false&quot; unclean=&quot;false&quot; shutdown=&quot;false&quot; expected_up=&quot;true&quot; is_dc=&quot;false&quot; resources_running=&quot;1&quot; type=&quot;member&quot; /&gt;</div><div>    &lt;/nodes&gt;</div><div>    &lt;resources&gt;</div><div>        &lt;clone id=&quot;pgsql-ha&quot; multi_state=&quot;true&quot; unique=&quot;false&quot; managed=&quot;true&quot; failed=&quot;false&quot; failure_ignored=&quot;false&quot; &gt;</div><div>            &lt;resource id=&quot;pgsqld&quot; resource_agent=&quot;ocf::heartbeat:pgha&quot; role=&quot;Slave&quot; active=&quot;true&quot; orphaned=&quot;false&quot; managed=&quot;true&quot; failed=&quot;false&quot; failure_ignored=&quot;false&quot; nodes_running_on=&quot;1&quot; &gt;</div><div>                &lt;node name=&quot;test3&quot; id=&quot;3&quot; cached=&quot;false&quot;/&gt;</div><div>            &lt;/resource&gt;</div><div>            &lt;resource id=&quot;pgsqld&quot; resource_agent=&quot;ocf::heartbeat:pgha&quot; role=&quot;Master&quot; active=&quot;true&quot; orphaned=&quot;false&quot; managed=&quot;true&quot; failed=&quot;false&quot; failure_ignored=&quot;false&quot; nodes_running_on=&quot;1&quot; &gt;</div><div>                &lt;node name=&quot;test1&quot; id=&quot;1&quot; cached=&quot;false&quot;/&gt;</div><div>            &lt;/resource&gt;</div><div>            &lt;resource id=&quot;pgsqld&quot; resource_agent=&quot;ocf::heartbeat:pgha&quot; role=&quot;Slave&quot; active=&quot;true&quot; orphaned=&quot;false&quot; managed=&quot;true&quot; failed=&quot;false&quot; failure_ignored=&quot;false&quot; nodes_running_on=&quot;1&quot; &gt;</div><div>                &lt;node name=&quot;test2&quot; id=&quot;2&quot; cached=&quot;false&quot;/&gt;</div><div>            &lt;/resource&gt;</div><div>        &lt;/clone&gt;</div><div><span style="white-space:pre">                </span></div><div>By ready to go I mean that upon running &quot;pcs cluster start test3&quot;, the following occurs before test3 appears ready in the XML:</div><div><br></div><div>pcs cluster start test3<span style="white-space:pre">        </span></div><div>monitor<span style="white-space:pre">                                        </span>-&gt; RA returns unknown error (1) </div><div>notify/pre-stop    <span style="white-space:pre">                </span>-&gt; RA returns ok (0)</div><div>stop<span style="white-space:pre">                        </span>    <span style="white-space:pre">        </span>-&gt; RA returns ok (0)</div><div>start<span style="white-space:pre">                                        </span>-&gt; RA returns ok (0)</div><div><br></div><div>The problem I have is that between &quot;pcs cluster start test3&quot; and &quot;monitor&quot;, it seems that the XML returned by &quot;pcs status xml&quot; says test3 is ready (the XML extract above is what I get at that moment). Once &quot;monitor&quot; occurs, the returned XML shows test3 to be offline, and not until the start is finished do I once again have test3 shown as ready.</div><div><br></div><div>I am getting anything wrong? Is there a simpler or better way to check if test3 is fully functional again, ie OCF start was successful?</div><div><br></div><div>Thanks</div><div><br></div><div>Ludovic</div></div>
</div>