[ClusterLabs] Resource Parameter Change Not Honoring Constraints

Ken Gaillot kgaillot at redhat.com
Wed Apr 1 20:00:42 EDT 2020


On Thu, 2020-03-19 at 13:39 -0400, Marc Smith wrote:
> On Mon, Mar 16, 2020 at 1:26 PM Marc Smith <msmith626 at gmail.com>
> wrote:
> > 
> > On Thu, Mar 12, 2020 at 10:51 AM Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > > 
> > > On Wed, 2020-03-11 at 17:24 -0400, Marc Smith wrote:
> > > > Hi,
> > > > 
> > > > I'm using Pacemaker 1.1.20 (yes, I know, a bit dated now). I
> > > > noticed
> > > 
> > > I'd still consider that recent :)
> > > 
> > > > when I modify a resource parameter (eg, update the value), this
> > > > causes
> > > > the resource itself to restart. And that's fine, but when this
> > > > resource is restarted, it doesn't appear to honor the full set
> > > > of
> > > > constraints for that resource.
> > > > 
> > > > I see the output like this (right after the resource parameter
> > > > change):
> > > > ...
> > > > Mar 11 20:43:25 localhost crmd[1943]:   notice: State
> > > > transition
> > > > S_IDLE -> S_POL
> > > > ICY_ENGINE
> > > > Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping
> > > > state:
> > > > S_POLICY_ENG
> > > > INE
> > > > Mar 11 20:43:25 localhost pengine[1942]:   notice: Clearing
> > > > failure
> > > > of
> > > > p_bmd_140c58-1 on 140c58-1 because resource parameters have
> > > > changed
> > > > Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> > > > p_bmd_140c58-1             (                   140c58-1 )   due
> > > > to
> > > > resource definition change
> > > > Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> > > > p_dummy_g_lvm_140c58-1     (                   140c58-1 )   due
> > > > to
> > > > required g_md_140c58-1 running
> > > > Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
> > > > p_lvm_140c58_vg_01         (                   140c58-1 )   due
> > > > to
> > > > required p_dummy_g_lvm_140c58-1 start
> > > > Mar 11 20:43:25 localhost pengine[1942]:   notice: Calculated
> > > > transition 41, saving inputs in
> > > > /var/lib/pacemaker/pengine/pe-input-173.bz2
> > > > Mar 11 20:43:25 localhost crmd[1943]:   notice: Initiating stop
> > > > operation p_lvm_140c58_vg_01_stop_0 on 140c58-1
> > > > Mar 11 20:43:25 localhost crmd[1943]:   notice: Transition
> > > > aborted by
> > > > deletion of lrm_rsc_op[@id='p_bmd_140c58-1_last_failure_0']:
> > > > Resource
> > > > operation removal
> > > > Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping
> > > > state:
> > > > S_TRANSITION_ENGINE
> > > > ...
> > > > 
> > > > The stop on 'p_lvm_140c58_vg_01' then times out, because the
> > > > other
> > > > constraint (to stop the service above LVM) is never executed. I
> > > > can
> > > > see from the messages it never even tries to demote the
> > > > resource
> > > > above
> > > > that.
> > > > 
> > > > Yet, if I use crmsh at the shell, and do a restart on that same
> > > > resource, it works correctly, and all constraints are honored:
> > > > crm
> > > > resource restart p_bmd_140c58-1
> > > > 
> > > > I can certainly provide my full cluster config if needed, but
> > > > hoping
> > > > to keep this email concise for clarity. =)
> > > > 
> > > > I guess my questions are: 1) Is the difference in restart
> > > > behavior
> > > > expected, and not all constraints are followed when resource
> > > > parameters change (or some other restart event that originated
> > > > internally like this)? 2) Or perhaps this is known bug that was
> > > > already resolved in newer versions of Pacemaker?
> > > 
> > > No to both. Can you attach that pe-input-173.bz2 file (with any
> > > sensitive info removed)?
> > 
> > Thanks; that system got wiped, so I reproduced it on another system
> > and I am attaching that pe-input file. Log snippet is below for
> > completeness:
> > 
> > Mar 16 17:16:50 localhost crmd[1340]:   notice: State transition
> > S_IDLE -> S_POL
> > ICY_ENGINE
> > Mar 16 17:16:50 localhost pengine[1339]:   notice:  * Restart
> > p_bmd_126c4f-1             (                   126c4f-1 )   due to
> > resource definition change
> > Mar 16 17:16:50 localhost pengine[1339]:   notice:  * Restart
> > p_dummy_g_lvm_126c4f-1     (                   126c4f-1 )   due to
> > required g_md_126c4f-1 running
> > Mar 16 17:16:50 localhost pengine[1339]:   notice:  * Restart
> > p_lvm_126c4f_vg_01         (                   126c4f-1 )   due to
> > required p_dummy_g_lvm_126c4f-1 start
> > Mar 16 17:16:50 localhost pengine[1339]:   notice: Calculated
> > transition 149, saving inputs in
> > /var/lib/pacemaker/pengine/pe-input-46.bz2
> > 
> 
> Hi Ken,
> 
> Just a friendly bump to see if you had a chance to take a look at
> this
> issue? I appreciate your time and expertise! =)
> 
> --Marc

Sorry, I've been slammed lately.

There does appear to be a scheduler bug. The relevant constraint is (in
plain language)

   start g_lvm_* then promote ms_alua_*

The implicit inverse of that is

   demote ms_alua_* then stop g_lvm_*

The bug is that ms_alua_* isn't demoted before g_lvm_* is stopped.
(Note however that the configuration does not require ms_alua_* to be
stopped.)

> > 
> > --Marc
> > 
> > 
> > > > 
> > > > I searched a bit for #2 but I didn't get many (well any) hits
> > > > on
> > > > other
> > > > users experiencing this behavior.
> > > > 
> > > > Many thanks in advance.
> > > > 
> > > > --Marc
> > > 
> > > --
> > > Ken Gaillot <kgaillot at redhat.com>
> > > 
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list