<div dir="ltr">Thanks Ken.<div><br></div><div>Regards,</div><div>Ashutosh  <br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 10, 2017 at 6:57 AM,  <span dir="ltr">&lt;<a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send Users mailing list submissions to<br>

        <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

or, via email, send a message with subject or body &#39;help&#39; to<br>

        <a href="mailto:users-request@clusterlabs.org">users-request@clusterlabs.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:users-owner@clusterlabs.org">users-owner@clusterlabs.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than &quot;Re: Contents of Users digest...&quot;<br>

<br>

<br>

Today&#39;s Topics:<br>

<br>

   1. Re: issues with pacemaker daemonization (Ken Gaillot)<br>

   2. Re: Pacemaker 1.1.18 Release Candidate 4 (Ken Gaillot)<br>

   3. Re: Issue in starting Pacemaker Virtual IP in RHEL 7 (Jan Pokorn?)<br>

   4. Re: One cluster with two groups of nodes (Alberto Mijares)<br>

   5. Pacemaker responsible of DRBD and a systemd resource<br>

      (Derek Wuelfrath)<br>

<br>

<br>

------------------------------<wbr>------------------------------<wbr>----------<br>

<br>

Message: 1<br>

Date: Thu, 09 Nov 2017 09:49:20 -0600<br>

From: Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

To: Cluster Labs - All topics related to open-source clustering<br>

        welcomed        &lt;<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>&gt;<br>

Subject: Re: [ClusterLabs] issues with pacemaker daemonization<br>

Message-ID: &lt;<a href="mailto:1510242560.5244.3.camel@redhat.com">1510242560.5244.3.camel@<wbr>redhat.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

On Thu, 2017-11-09 at 15:59 +0530, ashutosh tiwari wrote:<br>

&gt; Hi,<br>

&gt;<br>

&gt; We are observing that sometime pacemaker daemon gets the same<br>

&gt; processgroup id as the process /script calling the &quot;service pacemaker<br>

&gt; start&quot;.?<br>

&gt; While child processes of pacemaeker(cib/crmd/pengine) have there<br>

&gt; processgroup id? same as there pid which is how things should be for<br>

&gt; a daemon afaik.<br>

&gt;<br>

&gt; Do we expect it to be managed by init.d (centos 6) or pacemaker<br>

&gt; binary.<br>

&gt;<br>

&gt; pacemaker version: pacemaker-1.1.14-8.el6_8.1.<wbr>x86_64<br>

&gt;<br>

&gt;<br>

&gt; Thanks and Regards,<br>

&gt; Ashutosh Tiwari<br>

<br>

When pacemakerd spawns a child (cib etc.), it calls setsid() in the<br>

child to start a new session, which will set the process group ID and<br>

session ID to the child&#39;s PID.<br>

<br>

However it doesn&#39;t do anything similar for itself. Possibly it should.<br>

It&#39;s a longstanding to-do item to make pacemaker daemonize itself more<br>

&quot;properly&quot;, but no one&#39;s had the time to address it.<br>

--<br>

Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Thu, 09 Nov 2017 10:11:08 -0600<br>

From: Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

To: Kristoffer Gr?nlund &lt;<a href="mailto:kgronlund@suse.com">kgronlund@suse.com</a>&gt;, Cluster   Labs - All<br>

        topics related to open-source clustering welcomed<br>

        &lt;<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>&gt;<br>

Subject: Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4<br>

Message-ID: &lt;<a href="mailto:1510243868.5244.5.camel@redhat.com">1510243868.5244.5.camel@<wbr>redhat.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

On Fri, 2017-11-03 at 08:24 +0100, Kristoffer Gr?nlund wrote:<br>

&gt; Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; writes:<br>

&gt;<br>

&gt; &gt; I decided to do another release candidate, because we had a large<br>

&gt; &gt; number of changes since rc3. The fourth release candidate for<br>

&gt; &gt; Pacemaker<br>

&gt; &gt; version 1.1.18 is now available at:<br>

&gt; &gt;<br>

&gt; &gt; <a href="https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1" rel="noreferrer" target="_blank">https://github.com/<wbr>ClusterLabs/pacemaker/<wbr>releases/tag/Pacemaker-1.1</a><br>

&gt; &gt; .18-<br>

&gt; &gt; rc4<br>

&gt; &gt;<br>

&gt; &gt; The big changes are numerous scalability improvements and bundle<br>

&gt; &gt; fixes.<br>

&gt; &gt; We&#39;re starting to test Pacemaker with as many as 1,500 bundles<br>

&gt; &gt; (Docker<br>

&gt; &gt; containers) running on 20 guest nodes running on three 56-core<br>

&gt; &gt; physical<br>

&gt; &gt; cluster nodes.<br>

&gt;<br>

&gt; Hi Ken,<br>

&gt;<br>

&gt; That&#39;s really cool. What&#39;s the size of the CIB with that kind of<br>

&gt; configuration? I guess it would compress pretty well, but still.<br>

<br>

The test cluster is gone now, so not sure ... Beekhof might know.<br>

<br>

I know it&#39;s big enough that the transition graph could get too big to<br>

send via IPC, and we had to re-enable pengine&#39;s ability to write it to<br>

disk instead, and have the crmd read it from disk.<br>

<br>

&gt;<br>

&gt; Cheers,<br>

&gt; Kristoffer<br>

&gt;<br>

&gt; &gt;<br>

&gt; &gt; For details on the changes in this release, see the ChangeLog.<br>

&gt; &gt;<br>

&gt; &gt; This is likely to be the last release candidate before the final<br>

&gt; &gt; release next week. Any testing you can do is very welcome.<br>

--<br>

Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Thu, 9 Nov 2017 20:18:26 +0100<br>

From: Jan Pokorn? &lt;<a href="mailto:jpokorny@redhat.com">jpokorny@redhat.com</a>&gt;<br>

To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

Subject: Re: [ClusterLabs] Issue in starting Pacemaker Virtual IP in<br>

        RHEL 7<br>

Message-ID: &lt;<a href="mailto:20171109191826.GD10004@redhat.com">20171109191826.GD10004@<wbr>redhat.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;us-ascii&quot;<br>

<br>

On 06/11/17 10:43 +0000, Somanath Jeeva wrote:<br>

&gt; I am using a two node pacemaker cluster with teaming enabled. The cluster has<br>

&gt;<br>

&gt; 1.       Two team interfaces with different subents.<br>

&gt;<br>

&gt; 2.       The team1 has a NFS VIP plumbed to it.<br>

&gt;<br>

&gt; 3.       The VirtualIP from pacemaker is configured to plumb to team0(Corosync ring number is 0)<br>

&gt;<br>

&gt; In this case  the corosync takes the NFS IP as its ring address and<br>

&gt; checks the same in the corosync.conf. Since conf file has team0<br>

&gt; hostname the corosync start fails.<br>

&gt;<br>

&gt; Outputs:<br>

&gt;<br>

&gt;<br>

&gt; $ip a output:<br>

&gt;<br>

&gt; [...]<br>

&gt; 10: team1: &lt;BROADCAST,MULTICAST,UP,LOWER_<wbr>UP&gt; mtu 1500 qdisc noqueue state UP qlen 1000<br>

&gt;     link/ether 38:63:bb:3f:a4:ad brd ff:ff:ff:ff:ff:ff<br>

&gt;     inet <a href="http://10.64.23.117/28" rel="noreferrer" target="_blank">10.64.23.117/28</a> brd 10.64.23.127 scope global team1<br>

&gt;        valid_lft forever preferred_lft forever<br>

&gt;     inet <a href="http://10.64.23.121/24" rel="noreferrer" target="_blank">10.64.23.121/24</a> scope global secondary team1:~m0<br>

&gt;        valid_lft forever preferred_lft forever<br>

&gt;     inet6 fe80::3a63:bbff:fe3f:a4ad/64 scope link<br>

&gt;        valid_lft forever preferred_lft forever<br>

&gt; 11: team0: &lt;BROADCAST,MULTICAST,UP,LOWER_<wbr>UP&gt; mtu 1500 qdisc noqueue state UP qlen 1000<br>

&gt;     link/ether 38:63:bb:3f:a4:ac brd ff:ff:ff:ff:ff:ff<br>

&gt;     inet <a href="http://10.64.23.103/28" rel="noreferrer" target="_blank">10.64.23.103/28</a> brd 10.64.23.111 scope global team0<br>

&gt;        valid_lft forever preferred_lft forever<br>

&gt;     inet6 fe80::3a63:bbff:fe3f:a4ac/64 scope link<br>

&gt;        valid_lft forever preferred_lft forever<br>

&gt;<br>

&gt; Corosync Conf File:<br>

&gt;<br>

&gt; cat /etc/corosync/corosync.conf<br>

&gt; totem {<br>

&gt;     version: 2<br>

&gt;     secauth: off<br>

&gt;     cluster_name: DES<br>

&gt;     transport: udp<br>

&gt;     rrp_mode: passive<br>

&gt;<br>

&gt;     interface {<br>

&gt;         ringnumber: 0<br>

&gt;         bindnetaddr: 10.64.23.96<br>

&gt;         mcastaddr: 224.1.1.1<br>

&gt;         mcastport: 6860<br>

&gt;     }<br>

&gt; }<br>

&gt;<br>

&gt; nodelist {<br>

&gt;     node {<br>

&gt;         ring0_addr: dl380x4415<br>

&gt;         nodeid: 1<br>

&gt;     }<br>

&gt;<br>

&gt;     node {<br>

&gt;         ring0_addr: dl360x4405<br>

&gt;         nodeid: 2<br>

&gt;     }<br>

&gt; }<br>

&gt;<br>

&gt; quorum {<br>

&gt;     provider: corosync_votequorum<br>

&gt;     two_node: 1<br>

&gt; }<br>

&gt;<br>

&gt; logging {<br>

&gt;     to_logfile: yes<br>

&gt;     logfile: /var/log/cluster/corosync.log<br>

&gt;     to_syslog: yes<br>

&gt; }<br>

&gt;<br>

&gt; /etc/hosts:<br>

&gt;<br>

&gt; $ cat /etc/hosts<br>

&gt; [...]<br>

&gt; 10.64.23.103       dl380x4415<br>

&gt; 10.64.23.105       dl360x4405<br>

&gt; [...]<br>

&gt;<br>

&gt; Logs:<br>

&gt;<br>

&gt; [3029] dl380x4415 corosyncerror   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.<br>

&gt; [19040] dl380x4415 corosyncnotice  [MAIN  ] Corosync Cluster Engine (&#39;2.4.0&#39;): started and ready to provide service.<br>

&gt; [19040] dl380x4415 corosyncinfo    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow<br>

&gt; [19040] dl380x4415 corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Multicast).<br>

&gt; [19040] dl380x4415 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none<br>

&gt; [19040] dl380x4415 corosyncnotice  [TOTEM ] The network interface [10.64.23.121] is now up.<br>

&gt; [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration map access [0]<br>

&gt; [19040] dl380x4415 corosyncinfo    [QB    ] server name: cmap<br>

&gt; [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync configuration service [1]<br>

&gt; [19040] dl380x4415 corosyncinfo    [QB    ] server name: cfg<br>

&gt; [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]<br>

&gt; [19040] dl380x4415 corosyncinfo    [QB    ] server name: cpg<br>

&gt; [19040] dl380x4415 corosyncnotice  [SERV  ] Service engine loaded: corosync profile loading service [4]<br>

&gt; [19040] dl380x4415 corosyncnotice  [QUORUM] Using quorum provider corosync_votequorum<br>

&gt; [19040] dl380x4415 corosynccrit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.<br>

&gt; [19040] dl380x4415 corosyncerror   [SERV  ] Service engine &#39;corosync_quorum&#39; failed to load for reason &#39;configuration error: nodelist or quorum.expected_votes must be configured!&#39;<br>

<br>

I suspect whether teaming is involved or not is irrelevant here.<br>

<br>

You are not using the latest greatest 2.4.3, so I&#39;d suggest either the<br>

upgrade or applying this patch (present in that version) if that helps:<br>

<br>

<a href="https://github.com/corosync/corosync/commit/95f9583a25007398e3792bdca2da262db18f658a" rel="noreferrer" target="_blank">https://github.com/corosync/<wbr>corosync/commit/<wbr>95f9583a25007398e3792bdca2da26<wbr>2db18f658a</a><br>

<br>

--<br>

Jan (Poki)<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: application/pgp-signature<br>

Size: 819 bytes<br>

Desc: not available<br>

URL: &lt;<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/3847e1e8/attachment-0001.sig" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/3847e1e8/attachment-<wbr>0001.sig</a>&gt;<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Thu, 9 Nov 2017 17:34:35 -0400<br>

From: Alberto Mijares &lt;<a href="mailto:amijaresp@gmail.com">amijaresp@gmail.com</a>&gt;<br>

To: Cluster Labs - All topics related to open-source clustering<br>

        welcomed        &lt;<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>&gt;<br>

Subject: Re: [ClusterLabs] One cluster with two groups of nodes<br>

Message-ID:<br>

        &lt;<a href="mailto:CAGZBXN_Lv0pXUkVB_u_MWo_ZpHcFxVC3gYnS9xFYNxUZ46qTaA@mail.gmail.com">CAGZBXN_Lv0pXUkVB_u_MWo_<wbr>ZpHcFxVC3gYnS9xFYNxUZ46qTaA@<wbr>mail.gmail.com</a>&gt;<br>

Content-Type: text/plain; charset=&quot;UTF-8&quot;<br>

<br>

&gt;<br>

&gt; The first thing I&#39;d mention is that a 6-node cluster can only survive<br>

&gt; the loss of two nodes, as 3 nodes don&#39;t have quorum. You can tweak that<br>

&gt; behavior with corosync quorum options, or you could add a quorum-only<br>

&gt; node, or use corosync&#39;s new qdevice capability to have an arbiter node.<br>

&gt;<br>

&gt; Coincidentally, I recently stumbled across a long-time Pacemaker<br>

&gt; feature that I wasn&#39;t aware of, that can handle this type of situation.<br>

&gt; It&#39;s not documented yet but will be when 1.1.18 is released soon.<br>

&gt;<br>

&gt; Colocation constraints may take a &quot;node-attribute&quot; parameter, that<br>

&gt; basically means, &quot;Put this resource on a node of the same class as the<br>

&gt; one running resource X&quot;.<br>

&gt;<br>

&gt; In this case, you might set a &quot;group&quot; node attribute on all nodes, to<br>

&gt; &quot;1&quot; on the three primary nodes and &quot;2&quot; on the three failover nodes.<br>

&gt; Pick one resource as your base resource that everything else should go<br>

&gt; along with. Configure colocation constraints for all the other<br>

&gt; resources with that one, using &quot;node-attribute=group&quot;. That means that<br>

&gt; all the other resources must be one a node with the same &quot;group&quot;<br>

&gt; attribute value as the node that the base resource is running on.<br>

&gt;<br>

&gt; &quot;node-attribute&quot; defaults to &quot;#uname&quot; (node name), this giving the<br>

&gt; usual behavior of colocation constraints: put the resource only on a<br>

&gt; node with the same name, i.e. the same node.<br>

&gt;<br>

&gt; The remaining question is, how do you want the base resource to fail<br>

&gt; over? If the base resource can fail over to any other node, whether in<br>

&gt; the same group or not, then you&#39;re done. If the base resource can only<br>

&gt; run on one node in each group, ban it from the other nodes using<br>

&gt; -INFINITY location constraints. If the base resource should only fail<br>

&gt; over to the opposite group, that&#39;s trickier, but something roughly<br>

&gt; similar would be to prefer one node in each group with an equal<br>

&gt; positive score location constraint, and migration-threshold=1.<br>

&gt; --<br>

&gt; Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt;<br>

<br>

<br>

Thank you very very much for this. I&#39;m starting some tests in my lab tonight.<br>

<br>

I&#39;ll let you know my results and I hope I can count on you if a get<br>

lost in the way.<br>

<br>

BTW, every resource is supposed to run only on its designated node<br>

with a group. In example: if nginx normally runs on A1 and it MUST<br>

failover to B1. The same for every resource.<br>

<br>

Best regards,<br>

<br>

<br>

Alberto Mijares<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Thu, 9 Nov 2017 20:27:40 -0500<br>

From: Derek Wuelfrath &lt;<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>&gt;<br>

To: <a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a><br>

Subject: [ClusterLabs] Pacemaker responsible of DRBD and a systemd<br>

        resource<br>

Message-ID: &lt;<a href="mailto:57EF4B1D-42A5-4B20-95C7-3A3C95F47803@inverse.ca">57EF4B1D-42A5-4B20-95C7-<wbr>3A3C95F47803@inverse.ca</a>&gt;<br>

Content-Type: text/plain; charset=&quot;utf-8&quot;<br>

<br>

Hello there,<br>

<br>

First post here but following since a while!<br>

<br>

Here?s my issue,<br>

we are putting in place and running this type of cluster since a while and never really encountered this kind of problem.<br>

<br>

I recently set up a Corosync / Pacemaker / PCS cluster to manage DRBD along with different other resources. Part of theses resources are some systemd resources? this is the part where things are ?breaking?.<br>

<br>

Having a two servers cluster running only DRBD or DRBD with an OCF ipaddr2 resource (Cluser IP in instance) works just fine. I can easily move from one node to the other without any issue.<br>

As soon as I add a systemd resource to the resource group, things are breaking. Moving from one node to the other using standby mode works just fine but as soon as Corosync / Pacemaker restart involves polling of a systemd resource, it seems like it is trying to start the whole resource group and therefore, create a split-brain of the DRBD resource.<br>

<br>

It is the best explanation / description of the situation that I can give. If it need any clarification, examples, ? I am more than open to share them.<br>

<br>

Any guidance would be appreciated :)<br>

<br>

Here?s the output of a ?pcs config?<br>

<br>

<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a> &lt;<a href="https://pastebin.com/1TUvZ4X9" rel="noreferrer" target="_blank">https://pastebin.com/1TUvZ4X9</a><wbr>&gt;<br>

<br>

Cheers!<br>

-dw<br>

<br>

--<br>

Derek Wuelfrath<br>

<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a> &lt;mailto:<a href="mailto:dwuelfrath@inverse.ca">dwuelfrath@inverse.ca</a>&gt; :: +1.514.447.4918 (x110) :: +1.866.353.6153 (x110)<br>

Inverse inc. :: Leaders behind SOGo (<a href="http://www.sogo.nu" rel="noreferrer" target="_blank">www.sogo.nu</a> &lt;<a href="https://www.sogo.nu/" rel="noreferrer" target="_blank">https://www.sogo.nu/</a>&gt;), PacketFence (<a href="http://www.packetfence.org" rel="noreferrer" target="_blank">www.packetfence.org</a> &lt;<a href="https://www.packetfence.org/" rel="noreferrer" target="_blank">https://www.packetfence.org/</a>&gt;<wbr>) and Fingerbank (<a href="http://www.fingerbank.org" rel="noreferrer" target="_blank">www.fingerbank.org</a> &lt;<a href="https://www.fingerbank.org/" rel="noreferrer" target="_blank">https://www.fingerbank.org/</a>&gt;)<br>

<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: &lt;<a href="http://lists.clusterlabs.org/pipermail/users/attachments/20171109/9be1798b/attachment.html" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>pipermail/users/attachments/<wbr>20171109/9be1798b/attachment.<wbr>html</a>&gt;<br>

<br>

------------------------------<br>

<br>

______________________________<wbr>_________________<br>

Users mailing list<br>

<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

<br>

End of Users Digest, Vol 34, Issue 18<br>

******************************<wbr>*******<br>

</blockquote></div><br></div></div></div>