<div dir="ltr">Thank you, Ken.<div>This helps a lot.</div><div>Now I am sure that my current approach fits best for me =)</div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">Thank you,<div>Kostia</div></div></div></div></div></div>

<br><div class="gmail_quote">On Wed, Mar 30, 2016 at 11:10 PM, Ken Gaillot <span dir="ltr">&lt;<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:<br>

&gt; Ken, thank you for the answer.<br>

&gt;<br>

&gt; Every node in my cluster under normal conditions has &quot;load average&quot; of<br>

&gt; about 420. It is mainly connected to the high disk IO on the system.<br>

&gt; My system is designed to use almost 100% of its hardware (CPU/RAM/disks),<br>

&gt; so the situation when the system consumes almost all HW resources is<br>

&gt; normal.<br>

<br>

</span>420 suggests that HW resources are outstripped -- anything above the<br>

system&#39;s number of cores means processes are waiting for some resource.<br>

(Although with an I/O-bound workload like this, the number of cores<br>

isn&#39;t very important -- most will be sitting idle despite the high<br>

load.) And if that&#39;s during normal conditions, what will happen during a<br>

usage spike? It sounds like a recipe for less-than-HA.<br>

<br>

Under high load, there&#39;s a risk of negative feedback, where monitors<br>

time out, causing pacemaker to schedule recovery actions, which cause<br>

load to go higher and more monitors to time out, etc. That&#39;s why<br>

throttling is there.<br>

<span class=""><br>

&gt; I would like to get rid of &quot;High CPU load detected&quot; messages in the<br>

&gt; log, because<br>

&gt; they flood corosync.log as well as system journal.<br>

&gt;<br>

&gt; Maybe you can give an advice what would be the best way do to it?<br>

&gt;<br>

&gt; So far I came up with the idea of setting &quot;load-threshold&quot; to 1000% ,<br>

&gt; because of:<br>

&gt;     420(load average) / 24 (cores) = 17.5 (adjusted_load);<br>

&gt;     2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20<br>

&gt;<br>

&gt;     if(adjusted_load &gt; THROTTLE_FACTOR_HIGH * throttle_load_target) {<br>

&gt;         crm_notice(&quot;High %s detected: %f&quot;, desc, load);<br>

<br>

</span>That should work, as far as reducing the log messages, though of course<br>

it also reduces the amount of throttling pacemaker will do.<br>

<span class=""><br>

&gt; In this case do I need to set &quot;node-action-limit&quot; to something less than &quot;2<br>

&gt; x cores&quot; (which is default).<br>

<br>

</span>It&#39;s not necessary, but it would help compensate for the reduced<br>

throttling by imposing a maximum number of actions run at one time.<br>

<br>

I usually wouldn&#39;t recommend reducing log verbosity, because detailed<br>

logs are often necessary for troubleshooting cluster issues, but if your<br>

logs are on the same I/O controller that is overloaded, you might<br>

consider logging only to syslog and not to an additional detail file.<br>

That would cut back on the amount of I/O due to pacemaker itself. You<br>

could even drop PCMK_logpriority to warning, but then you&#39;re losing even<br>

more information.<br>

<span class=""><br>

&gt; Because the logic is (crmd/throttle.c):<br>

&gt;<br>

&gt;     switch(r-&gt;mode) {<br>

&gt;         case throttle_extreme:<br>

&gt;         case throttle_high:<br>

&gt;             jobs = 1; /* At least one job must always be allowed */<br>

&gt;             break;<br>

&gt;         case throttle_med:<br>

&gt;             jobs = QB_MAX(1, r-&gt;max / 4);<br>

&gt;             break;<br>

&gt;         case throttle_low:<br>

&gt;             jobs = QB_MAX(1, r-&gt;max / 2);<br>

&gt;             break;<br>

&gt;         case throttle_none:<br>

&gt;             jobs = QB_MAX(1, r-&gt;max);<br>

&gt;             break;<br>

&gt;         default:<br>

&gt;             crm_err(&quot;Unknown throttle mode %.4x on %s&quot;, r-&gt;mode, node);<br>

&gt;             break;<br>

&gt;     }<br>

&gt;     return jobs;<br>

&gt;<br>

&gt;<br>

&gt; The thing is, I know that there is &quot;High CPU load&quot; and this is normal<br>

&gt; state, but I wont Pacemaker to not saying it to me and treat this state the<br>

&gt; best it can.<br>

<br>

</span>If you can&#39;t improve your I/O performance, what you suggested is<br>

probably the best that can be done.<br>

<br>

When I/O is that critical to you, there are many tweaks that can make a<br>

big difference in performance. I&#39;m not sure how familiar you are with<br>

them already. Options depend on what your storage is (local or network,<br>

hardware/software/no RAID, etc.) and what your I/O-bound application is<br>

(database, etc.), but I&#39;d look closely at cache/buffer settings at all<br>

levels from hardware to application, RAID stripe alignment, filesystem<br>

choice and tuning, log verbosity, etc.<br>

<div class="HOEnZb"><div class="h5"><br>

&gt;<br>

&gt; Thank you,<br>

&gt; Kostia<br>

&gt;<br>

&gt; On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot &lt;<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt;&gt; On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:<br>

&gt;&gt;&gt; I am back to this question =)<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; I am still trying to understand the impact of &quot;High CPU load detected&quot;<br>

&gt;&gt;&gt; messages in the log.<br>

&gt;&gt;&gt; Looking in the code I figured out that setting &quot;load-threshold&quot; parameter<br>

&gt;&gt;&gt; to something higher than 100% solves the problem.<br>

&gt;&gt;&gt; And actually for 8 cores (12 with Hyper Threading) load-threshold=400%<br>

&gt;&gt; kind<br>

&gt;&gt;&gt; of works.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Also I noticed that this parameter may have an impact on the number of<br>

&gt;&gt; &quot;the<br>

&gt;&gt;&gt; maximum number of jobs that can be scheduled per node&quot;. As there is a<br>

&gt;&gt;&gt; formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; Is my understanding correct that the impact of setting &quot;load-threshold&quot;<br>

&gt;&gt;&gt; high enough (so there is no noisy messages) will lead only to the<br>

&gt;&gt;&gt; &quot;throttle_job_max&quot; and nothing more.<br>

&gt;&gt;&gt; Also, if I got it correct, than &quot;throttle_job_max&quot; is a number of allowed<br>

&gt;&gt;&gt; parallel actions per node in lrmd.<br>

&gt;&gt;&gt; And a child of the lrmd is actually an RA process running some actions<br>

&gt;&gt;&gt; (monitor, start, etc).<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; So there is no impact on how many RA (resources) can run on a node, but<br>

&gt;&gt; how<br>

&gt;&gt;&gt; Pacemaker will operate with them in parallel (I am not sure I understand<br>

&gt;&gt;&gt; this part correct).<br>

&gt;&gt;<br>

&gt;&gt; I believe that is an accurate description. I think the job limit applies<br>

&gt;&gt; to fence actions as well as lrmd actions.<br>

&gt;&gt;<br>

&gt;&gt; Note that if /proc/cpuinfo exists, pacemaker will figure out the number<br>

&gt;&gt; of cores from there, and divide the actual reported load by that number<br>

&gt;&gt; before comparing against load-threshold.<br>

&gt;&gt;<br>

&gt;&gt;&gt; Thank you,<br>

&gt;&gt;&gt; Kostia<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt; On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof &lt;<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>&gt;<br>

&gt;&gt; wrote:<br>

&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; I think I wasn&#39;t precise in my questions.<br>

&gt;&gt;&gt;&gt;&gt; So I will try to ask more precise questions.<br>

&gt;&gt;&gt;&gt;&gt; 1. why the default value for &quot;load-threshold&quot; is 80%?<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Experimentation showed it better to begin throttling before the node<br>

&gt;&gt;&gt;&gt; became saturated.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; 2. what would be the impact to the cluster in case of<br>

&gt;&gt;&gt;&gt; &quot;load-threshold=100%”?<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt; Your nodes will be busier.  Will they be able to handle your load or<br>

&gt;&gt; will<br>

&gt;&gt;&gt;&gt; it result in additional recovery actions (creating more load and more<br>

&gt;&gt;&gt;&gt; failures)?  Only you will know when you try.<br>

&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Thank you,<br>

&gt;&gt;&gt;&gt;&gt; Kostya<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; Guys, please, if anyone can help me to understand this parameter<br>

&gt;&gt; better,<br>

&gt;&gt;&gt;&gt; I would be appreciated.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Thank you,<br>

&gt;&gt;&gt;&gt;&gt; Kostya<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; Another question - is it crmd specific to measure CPU usage by &quot;I/O<br>

&gt;&gt;&gt;&gt; wait&quot;?<br>

&gt;&gt;&gt;&gt;&gt; And if I need to get the most performance of the running resources in<br>

&gt;&gt;&gt;&gt; cluster, should I set &quot;load-threshold=95%&quot; (or even 100%)?<br>

&gt;&gt;&gt;&gt;&gt; Will it impact the cluster behavior in any ways?<br>

&gt;&gt;&gt;&gt;&gt; The man page for crmd says that it will &quot;The cluster will slow down its<br>

&gt;&gt;&gt;&gt; recovery process when the amount of system resources used (currently<br>

&gt;&gt; CPU)<br>

&gt;&gt;&gt;&gt; approaches this limit&quot;.<br>

&gt;&gt;&gt;&gt;&gt; Does it mean there will be delays in cluster in moving resources in<br>

&gt;&gt; case<br>

&gt;&gt;&gt;&gt; a node goes down, or something else?<br>

&gt;&gt;&gt;&gt;&gt; I just want to understand in better.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; That you in advance for the help =)<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; P.S.: The main resource does a lot of disk I/Os.<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Thank you,<br>

&gt;&gt;&gt;&gt;&gt; Kostya<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt; I didn&#39;t know that.<br>

&gt;&gt;&gt;&gt;&gt; You mentioned &quot;as opposed to other Linuxes&quot;, but I am using Debian<br>

&gt;&gt; Linux.<br>

&gt;&gt;&gt;&gt;&gt; Does it also measure CPU usage by I/O waits?<br>

&gt;&gt;&gt;&gt;&gt; You are right about &quot;I/O waits&quot; (a screenshot of &quot;top&quot; is attached).<br>

&gt;&gt;&gt;&gt;&gt; But why it shows 50% of CPU usage for a single process (that is the<br>

&gt;&gt; main<br>

&gt;&gt;&gt;&gt; one) while &quot;I/O waits&quot; shows a bigger number?<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; Thank you,<br>

&gt;&gt;&gt;&gt;&gt; Kostya<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl &lt;<br>

&gt;&gt;&gt;&gt; <a href="mailto:Ulrich.Windl@rz.uni-regensburg.de">Ulrich.Windl@rz.uni-regensburg.de</a>&gt; wrote:<br>

&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt; &quot;Ulrich Windl&quot; &lt;<a href="mailto:Ulrich.Windl@rz.uni-regensburg.de">Ulrich.Windl@rz.uni-regensburg.de</a>&gt; schrieb am<br>

&gt;&gt;&gt;&gt; 22.05.2015 um<br>

&gt;&gt;&gt;&gt;&gt; 08:36 in Nachricht &lt;<a href="mailto:555EEA72020000A10001A71D@gwsmtp1.uni-regensburg.de">555EEA72020000A10001A71D@gwsmtp1.uni-regensburg.de</a><br>

&gt;&gt;&gt; :<br>

&gt;&gt;&gt;&gt;&gt;&gt; Hi!<br>

&gt;&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt;&gt; I Linux I/O waits are considered for load (as opposed to other<br>

&gt;&gt;&gt;&gt; Linuxes) Thus<br>

&gt;&gt;&gt;&gt;&gt; ^^ &quot;In&quot;<br>

&gt;&gt;&gt;&gt;                             s/Linux/UNIX/<br>

&gt;&gt;&gt;&gt;&gt;<br>

&gt;&gt;&gt;&gt;&gt; (I should have my coffee now to awake ;-) Sorry.<br>

&gt;&gt;<br>

&gt;&gt; _______________________________________________<br>

&gt;&gt; Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

&gt;&gt; <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;&gt;<br>

&gt;&gt; Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt;&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt;&gt; Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;&gt;<br>

&gt;<br>

<br>

</div></div></blockquote></div><br></div>