<div dir="ltr">Thanks Ken for the detailed response.<div>I suppose I could even use some of the pcs/crm CLI commands then.</div><div>Cheers.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 8:27 PM, Ken Gaillot <span dir="ltr">&lt;<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 03/16/2016 05:22 AM, Nikhil Utane wrote:<br>

&gt; I see following info gets updated in CIB. Can I use this or there is better<br>

&gt; way?<br>

&gt;<br>

</span>&gt; &lt;node_state id=&quot;*node1*&quot; uname=&quot;node1&quot; in_ccm=&quot;false&quot; crmd=&quot;offline&quot;<br>

&gt; crm-debug-origin=&quot;peer_update_callback&quot; join=&quot;*down*&quot; expected=&quot;member&quot;&gt;<br>

<br>

in_ccm/crmd/join reflect the current state of the node (as known by the<br>

partition that you&#39;re looking at the CIB on), so if the node went down<br>

and came back up, it won&#39;t tell you anything about being down.<br>

<br>

- in_ccm indicates that the node is part of the underlying cluster layer<br>

(heartbeat/cman/corosync)<br>

<br>

- crmd indicates that the node is communicating at the pacemaker layer<br>

<br>

- join indicates what phase of the join process the node is at<br>

<br>

There&#39;s not a direct way to see what node went down after the fact.<br>

There are ways however:<br>

<br>

- if the node was running resources, those will be failed, and those<br>

failures (including node) will be shown in the cluster status<br>

<br>

- the logs show all node membership events; you can search for logs such<br>

as &quot;state is now lost&quot; and &quot;left us&quot;<br>

<br>

- &quot;stonith -H $NODE_NAME&quot; will show the fence history for a given node,<br>

so if the node went down due to fencing, it will show up there<br>

<br>

- you can configure an ocf:pacemaker:ClusterMon resource to run crm_mon<br>

periodically and run a script for node events, and you can write the<br>

script to do whatever you want (email you, etc.) (in the upcoming 1.1.15<br>

release, built-in notifications will make this more reliable and easier,<br>

but any script you use with ClusterMon will still be usable with the new<br>

method)<br>

<div class="HOEnZb"><div class="h5"><br>

&gt; On Wed, Mar 16, 2016 at 12:40 PM, Nikhil Utane &lt;<a href="mailto:nikhil.subscribed@gmail.com">nikhil.subscribed@gmail.com</a>&gt;<br>

&gt; wrote:<br>

&gt;<br>

&gt;&gt; Hi Ken,<br>

&gt;&gt;<br>

&gt;&gt; Sorry about the long delay. This activity was de-focussed but now it&#39;s<br>

&gt;&gt; back on track.<br>

&gt;&gt;<br>

&gt;&gt; One part of question that is still not answered is on the newly active<br>

&gt;&gt; node, how to find out which was the node that went down?<br>

&gt;&gt; Anything that gets updated in the status section that can be read and<br>

&gt;&gt; figured out?<br>

&gt;&gt;<br>

&gt;&gt; Thanks.<br>

&gt;&gt; Nikhil<br>

&gt;&gt;<br>

&gt;&gt; On Sat, Jan 9, 2016 at 3:31 AM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;&gt; On 01/08/2016 11:13 AM, Nikhil Utane wrote:<br>

&gt;&gt;&gt;&gt;&gt; I think stickiness will do what you want here. Set a stickiness higher<br>

&gt;&gt;&gt;&gt;&gt; than the original node&#39;s preference, and the resource will want to stay<br>

&gt;&gt;&gt;&gt;&gt; where it is.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Not sure I understand this. Stickiness will ensure that resources don&#39;t<br>

&gt;&gt;&gt;&gt; move back when original node comes back up, isn&#39;t it?<br>

&gt;&gt;&gt;&gt; But in my case, I want the newly standby node to become the backup node<br>

&gt;&gt;&gt; for<br>

&gt;&gt;&gt;&gt; all other nodes. i.e. it should now be able to run all my resource<br>

&gt;&gt;&gt; groups<br>

&gt;&gt;&gt;&gt; albeit with a lower score. How do I achieve that?<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Oh right. I forgot to ask whether you had an opt-out<br>

&gt;&gt;&gt; (symmetric-cluster=true, the default) or opt-in<br>

&gt;&gt;&gt; (symmetric-cluster=false) cluster. If you&#39;re opt-out, every node can run<br>

&gt;&gt;&gt; every resource unless you give it a negative preference.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Partly it depends on whether there is a good reason to give each<br>

&gt;&gt;&gt; instance a &quot;home&quot; node. Often, there&#39;s not. If you just want to balance<br>

&gt;&gt;&gt; resources across nodes, the cluster will do that by default.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; If you prefer to put certain resources on certain nodes because the<br>

&gt;&gt;&gt; resources require more physical resources (RAM/CPU/whatever), you can<br>

&gt;&gt;&gt; set node attributes for that and use rules to set node preferences.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Either way, you can decide whether you want stickiness with it.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Also can you answer, how to get the values of node that goes active and<br>

&gt;&gt;&gt; the<br>

&gt;&gt;&gt;&gt; node that goes down inside the OCF agent?  Do I need to use<br>

&gt;&gt;&gt; notification or<br>

&gt;&gt;&gt;&gt; some simpler alternative is available?<br>

&gt;&gt;&gt;&gt; Thanks.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; On Fri, Jan 8, 2016 at 9:30 PM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On 01/08/2016 06:55 AM, Nikhil Utane wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt; Would like to validate my final config.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; As I mentioned earlier, I will be having (upto) 5 active servers and 1<br>

&gt;&gt;&gt;&gt;&gt;&gt; standby server.<br>

&gt;&gt;&gt;&gt;&gt;&gt; The standby server should take up the role of active that went down.<br>

&gt;&gt;&gt; Each<br>

&gt;&gt;&gt;&gt;&gt;&gt; active has some unique configuration that needs to be preserved.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 1) So I will create total 5 groups. Each group has a<br>

&gt;&gt;&gt; &quot;heartbeat::IPaddr2<br>

&gt;&gt;&gt;&gt;&gt;&gt; resource (for virtual IP) and my custom resource.<br>

&gt;&gt;&gt;&gt;&gt;&gt; 2) The virtual IP needs to be read inside my custom OCF agent, so I<br>

&gt;&gt;&gt; will<br>

&gt;&gt;&gt;&gt;&gt;&gt; make use of attribute reference and point to the value of IPaddr2<br>

&gt;&gt;&gt; inside<br>

&gt;&gt;&gt;&gt;&gt; my<br>

&gt;&gt;&gt;&gt;&gt;&gt; custom resource to avoid duplication.<br>

&gt;&gt;&gt;&gt;&gt;&gt; 3) I will then configure location constraint to run the group resource<br>

&gt;&gt;&gt;&gt;&gt; on 5<br>

&gt;&gt;&gt;&gt;&gt;&gt; active nodes with higher score and lesser score on standby.<br>

&gt;&gt;&gt;&gt;&gt;&gt; For e.g.<br>

&gt;&gt;&gt;&gt;&gt;&gt; Group              Node            Score<br>

&gt;&gt;&gt;&gt;&gt;&gt; ---------------------------------------------<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup1        node1           500<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup1        node6           0<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup2        node2           500<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup2        node6           0<br>

&gt;&gt;&gt;&gt;&gt;&gt; ..<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup5        node5           500<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup5        node6           0<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 4) Now if say node1 were to go down, then stop action on node1 will<br>

&gt;&gt;&gt; first<br>

&gt;&gt;&gt;&gt;&gt;&gt; get called. Haven&#39;t decided if I need to do anything specific here.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; To clarify, if node1 goes down intentionally (e.g. standby or stop),<br>

&gt;&gt;&gt;&gt;&gt; then all resources on it will be stopped first. But if node1 becomes<br>

&gt;&gt;&gt;&gt;&gt; unavailable (e.g. crash or communication outage), it will get fenced.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 5) But when the start action of node 6 gets called, then using crm<br>

&gt;&gt;&gt;&gt;&gt; command<br>

&gt;&gt;&gt;&gt;&gt;&gt; line interface, I will modify the above config to swap node 1 and<br>

&gt;&gt;&gt; node 6.<br>

&gt;&gt;&gt;&gt;&gt;&gt; i.e.<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup1        node6           500<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup1        node1           0<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup2        node2           500<br>

&gt;&gt;&gt;&gt;&gt;&gt; MyGroup2        node1           0<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 6) To do the above, I need the newly active and newly standby node<br>

&gt;&gt;&gt; names<br>

&gt;&gt;&gt;&gt;&gt; to<br>

&gt;&gt;&gt;&gt;&gt;&gt; be passed to my start action. What&#39;s the best way to get this<br>

&gt;&gt;&gt; information<br>

&gt;&gt;&gt;&gt;&gt;&gt; inside my OCF agent?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Modifying the configuration from within an agent is dangerous -- too<br>

&gt;&gt;&gt;&gt;&gt; much potential for feedback loops between pacemaker and the agent.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I think stickiness will do what you want here. Set a stickiness higher<br>

&gt;&gt;&gt;&gt;&gt; than the original node&#39;s preference, and the resource will want to stay<br>

&gt;&gt;&gt;&gt;&gt; where it is.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; 7) Apart from node name, there will be other information which I plan<br>

&gt;&gt;&gt; to<br>

&gt;&gt;&gt;&gt;&gt;&gt; pass by making use of node attributes. What&#39;s the best way to get this<br>

&gt;&gt;&gt;&gt;&gt;&gt; information inside my OCF agent? Use crm command to query?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Any of the command-line interfaces for doing so should be fine, but I&#39;d<br>

&gt;&gt;&gt;&gt;&gt; recommend using one of the lower-level tools (crm_attribute or<br>

&gt;&gt;&gt;&gt;&gt; attrd_updater) so you don&#39;t have a dependency on a higher-level tool<br>

&gt;&gt;&gt;&gt;&gt; that may not always be installed.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; Thank You.<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane &lt;<br>

&gt;&gt;&gt;&gt;&gt; <a href="mailto:nikhil.subscribed@gmail.com">nikhil.subscribed@gmail.com</a>&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Thanks to you Ken for giving all the pointers.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; Yes, I can use service start/stop which should be a lot simpler.<br>

&gt;&gt;&gt; Thanks<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; again. :)<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

&gt;&gt;&gt;&gt;&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; On 12/22/2015 12:17 AM, Nikhil Utane wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; I have prepared a write-up explaining my requirements and current<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; solution<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; that I am proposing based on my understanding so far.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Kindly let me know if what I am proposing is good or there is a<br>

&gt;&gt;&gt; better<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; way<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; to achieve the same.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt; <a href="https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing" rel="noreferrer" target="_blank">https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing</a><br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; Let me know if you face any issue in accessing the above link.<br>

&gt;&gt;&gt; Thanks.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; This looks great. Very well thought-out.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; One comment:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &quot;8. In the event of any failover, the standby node will get notified<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; through an event and it will execute a script that will read the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; configuration specific to the node that went down (again using<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; crm_attribute) and become active.&quot;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; It may not be necessary to use the notifications for this. Pacemaker<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; will call your resource agent with the &quot;start&quot; action on the standby<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; node, after ensuring it is stopped on the previous node. Hopefully<br>

&gt;&gt;&gt; the<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; resource agent&#39;s start action has (or can have, with configuration<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; options) all the information you need.<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; If you do end up needing notifications, be aware that the feature<br>

&gt;&gt;&gt; will<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; be disabled by default in the 1.1.14 release, because changes in<br>

&gt;&gt;&gt; syntax<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; are expected in further development. You can define a compile-time<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; constant to enable them.<br>

<br>

</div></div></blockquote></div><br></div>