<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 1:58 AM, Ken Gaillot <span dir="ltr">&lt;<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On 09/22/2016 09:53 AM, Jan Pokorný wrote:<br>

&gt; On 22/09/16 08:42 +0200, Kristoffer Grönlund wrote:<br>

&gt;&gt; Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; writes:<br>

&gt;&gt;<br>

&gt;&gt;&gt; I&#39;m not saying it&#39;s a bad idea, just that it&#39;s more complicated than it<br>

&gt;&gt;&gt; first sounds, so it&#39;s worth thinking through the implications.<br>

&gt;&gt;<br>

&gt;&gt; Thinking about it and looking at how complicated it gets, maybe what<br>

&gt;&gt; you&#39;d really want, to make it clearer for the user, is the ability to<br>

&gt;&gt; explicitly configure the behavior, either globally or per-resource. So<br>

&gt;&gt; instead of having to tweak a set of variables that interact in complex<br>

&gt;&gt; ways, you&#39;d configure something like rule expressions,<br>

&gt;&gt;<br>

&gt;&gt; &lt;on_fail&gt;<br>

&gt;&gt;   &lt;restart repeat=&quot;3&quot; /&gt;<br>

&gt;&gt;   &lt;migrate timeout=&quot;60s&quot; /&gt;<br>

&gt;&gt;   &lt;fence/&gt;<br>

&gt;&gt; &lt;/on_fail&gt;<br>

&gt;&gt;<br>

&gt;&gt; So, try to restart the service 3 times, if that fails migrate the<br>

&gt;&gt; service, if it still fails, fence the node.<br>

&gt;&gt;<br>

&gt;&gt; (obviously the details and XML syntax are just an example)<br>

&gt;&gt;<br>

&gt;&gt; This would then replace on-fail, migration-threshold, etc.<br>

&gt;<br>

&gt; I must admit that in previous emails in this thread, I wasn&#39;t able to<br>

&gt; follow during the first pass, which is not the case with this procedural<br>

&gt; (sequence-ordered) approach.  Though someone can argue it doesn&#39;t take<br>

&gt; type of operation into account, which might again open the door for<br>

&gt; non-obvious interactions.<br>

<br>

</span>&quot;restart&quot; is the only on-fail value that it makes sense to escalate.<br>

<br>

block/stop/fence/standby are final. Block means &quot;don&#39;t touch the<br>

resource again&quot;, so there can&#39;t be any further response to failures.<br>

Stop/fence/standby move the resource off the local node, so failure<br>

handling is reset (there are 0 failures on the new node to begin with).<br>

<br>

&quot;Ignore&quot; is theoretically possible to escalate, e.g. &quot;ignore 3 failures<br>

then migrate&quot;, but I can&#39;t think of a real-world situation where that<br>

makes sense, </blockquote><div><br></div><div>really?</div><div><br></div><div>it is not uncommon to hear &quot;i know its failed, but i dont want the cluster to do anything until its _really_ failed&quot;  </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">and it would be a significant re-implementation of &quot;ignore&quot;<br>

(which currently ignores the state of having failed, as opposed to a<br>

particular instance of failure).<br></blockquote><div><br></div><div>agreed</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

What the interface needs to express is: &quot;If this operation fails,<br>

optionally try a soft recovery [always stop+start], but if &lt;N&gt; failures<br>

occur on the same node, proceed to a [configurable] hard recovery&quot;.<br>

<br>

And of course the interface will need to be different depending on how<br>

certain details are decided, e.g. whether any failures count toward &lt;N&gt;<br>

or just failures of one particular operation type, and whether the hard<br>

recovery type can vary depending on what operation failed.<br>

<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

______________________________<wbr>_________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br></div></div>