<div dir="ltr">Thank you.<div><br><div>Indeed the latest corosync and pacemaker does work with large clusters - some tuning is required though.</div><div>By working I mean also recovering after a node loss/regain, which was the major issue before, when the corosync worked (established recovered membership), but pacemaker was not able to sync CIB - it still needs some time and CPU power to do so though.</div><div><br></div><div>It works for me for a 34 nodes cluster with a few hundreds of resources (I haven&#39;t tested bigger yet).</div></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 19, 2015 at 2:43 AM, Cédric Dufour - Idiap Research Institute <span dir="ltr">&lt;<a href="mailto:cedric.dufour@idiap.ch" target="_blank">cedric.dufour@idiap.ch</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div text="#000000" bgcolor="#FFFFFF">

    [coming over from the old mailing list

    <a href="mailto:pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>; sorry for any thread discrepancy]<span class=""><br>

    <br>

    Hello,<br>

    <br>

    We&#39;ve also setup a fairly large cluster - 24 nodes / 348 resources

    (pacemaker 1.1.12, corosync 1.4.7) - and pacemaker 1.1.12 is

    definitely the minimum version you&#39;ll want, thanks to changes on how

    the CIB is handled.<br>

    <br>

    If you&#39;re going to handle a large number (~several hundreds) of

    resources as well, you may need to concern yourself with the CIB

    size as well.<br></span>

    You may want to have a look at pp.17-18 of the document I wrote to

    describe our setup: <a href="http://cedric.dufour.name/cv/download/idiap_havc2.pdf" target="_blank">http://cedric.dufour.name/cv/download/idiap_havc2.pdf</a><span class=""><br>

    <br>

    Currently, I would consider that with 24 nodes / 348 resources, we

    are close to the limit of what our cluster can handle, the

    bottleneck being CPU(core) power for CIB/CRM handling. Our &quot;worst

    performing nodes&quot; (out of the 24 in the cluster) are Xeon E7-2830 @

    2.13GHz.<br></span>

    The main issue we currently face in when a DC is taken out and a new

    one must be elected: CPU goes 100% for several tens of seconds (even

    minutes), during which the cluster is totally unresponsive.

    Fortunately, resources themselves just seat tight and remain

    available (I can&#39;t say about those who would need to be migrated

    because being collocated with the DC; we manually avoid that

    situation when performing maintenance that may affect the DC)<span class=""><br>

    <br>

    I&#39;m looking forwards to migrate to corosync 2+ (there are some

    backports available for debian/Jessie) and see it this would allow

    to push the limit further. Unfortunately, I can&#39;t say for sure as I

    have only a limited understanding of how Pacemaker/Corosync work and

    where CPU is bond to become a bottleneck.<br>

    <br></span>

    [UPDATE] Thanks Ken for the Pacemaker Remote pointer; I&#39;m head on to

    have a look at that<span class=""><br>

    <br>

    &#39;Hope it can help,<br>

    <br>

    Cédric<br>

    <br>

    <div>On 04/11/15 23:26, Radoslaw Garbacz

      wrote:<br>

    </div>

    </span><div><div class="h5"><blockquote type="cite">

      <div dir="ltr">Thank you, will give it a try.<br>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Nov 4, 2015 at 12:50 PM, Trevor

          Hemsley <span dir="ltr">&lt;<a href="mailto:themsley@voiceflex.com" target="_blank">themsley@voiceflex.com</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 04/11/15 18:41, Radoslaw Garbacz wrote:<br>

              &gt; Details:<br>

              &gt; OS: CentOS 6<br>

              &gt; Pacemaker: Pacemaker 1.1.9-1512.el6<br>

              &gt; Corosync: Corosync Cluster Engine, version &#39;2.3.2&#39;<br>

              <br>

            </span>yum update<br>

            <br>

            Pacemaker is currently 1.1.12 and corosync 1.4.7 on CentOS

            6. There were<br>

            major improvements in speed with later versions of

            pacemaker.<br>

            <br>

            Trevor<br>

            <br>

            _______________________________________________<br>

            Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

            <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" rel="noreferrer" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

            <br>

            Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

            Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

            Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <br>

        -- <br>

        <div>

          <div dir="ltr">

            <div>Best Regards,<br>

              <br>

              Radoslaw Garbacz<br>

            </div>

            XtremeData Incorporation<br>

          </div>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a>

</pre>

    </blockquote>

    <br>

  </div></div></div>

<br>_______________________________________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>Best Regards,<br><br>Radoslaw Garbacz<br></div>XtremeData Incorporation<br></div></div>

</div>