[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Thu Feb 27 13:42:11 EST 2020

On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot <kgaillot at redhat.com> wrote:
>On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
>> On Thu, 27 Feb 2020 09:48:23 -0600
>> Ken Gaillot <kgaillot at redhat.com> wrote:
>> 
>> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
>> > wrote:
>> > > On Thu, 27 Feb 2020 12:24:46 +0100
>> > > "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>> > >   
>> > > > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
>> > > > > > > 27.02.2020 um    
>> > > > 
>> > > > 11:05 in
>> > > > Nachricht <20200227110502.3624cb87 at firost>:
>> > > > 
>> > > > [...]  
>> > > > > What about something like "lock‑location=bool" and    
>> > > > 
>> > > > For "lock-location" I would assume the value is a "location". I
>> > > > guess you
>> > > > wanted a "use-lock-location" Boolean value.  
>> > > 
>> > > Mh, maybe "lock-current-location" would better reflect what I
>> > > meant.
>> > > 
>> > > The point is to lock the resource on the node currently running
>> > > it.  
>> > 
>> > Though it only applies for a clean node shutdown, so that has to be
>> > in
>> > the name somewhere. The resource isn't locked during normal cluster
>> > operation (it can move for resource or node failures, load
>> > rebalancing,
>> > etc.).
>> 
>> Well, I was trying to make the new feature a bit wider than just the
>> narrow shutdown feature.
>> 
>> Speaking about shutdown, what is the status of clean shutdown of the
>> cluster
>> handled by Pacemaker? Currently, I advice to stop resources
>> gracefully (eg.
>> using pcs resource disable [...]) before shutting down each nodes
>> either by hand
>> or using some higher level tool (eg. pcs cluster stop --all).
>
>I'm not sure why that would be necessary. It should be perfectly fine
>to stop pacemaker in any order without disabling resources.
>
>Start-up is actually more of an issue ... if you start corosync and
>pacemaker on nodes one by one, and you're not quick enough, then once
>quorum is reached, the cluster will fence all the nodes that haven't
>yet come up. So on start-up, it makes sense to start corosync on all
>nodes, which will establish membership and quorum, then start pacemaker
>on all nodes. Obviously that can't be done within pacemaker so that has
>to be done manually or by a higher-level tool.
>
>> Shouldn't this feature be discussed in this context as well?
>> 
>> [...] 
>> > > > > it would lock the resource location (unique or clones) until
>> > > > > the
>> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
>> > > > > matter what
>> > > > > happen to the resource, maintenance mode or not.
>> > > > > 
>> > > > > At a first look, it looks to peer nicely with
>> > > > > maintenance‑mode
>> > > > > and avoid resource migration after node reboot.    
>> > 
>> > Maintenance mode is useful if you're updating the cluster stack
>> > itself
>> > -- put in maintenance mode, stop the cluster services (leaving the
>> > managed services still running), update the cluster services, start
>> > the
>> > cluster services again, take out of maintenance mode.
>> > 
>> > This is useful if you're rebooting the node for a kernel update
>> > (for
>> > example). Apply the update, reboot the node. The cluster takes care
>> > of
>> > everything else for you (stop the services before shutting down and
>> > do
>> > not recover them until the node comes back).
>> 
>> I'm a bit lost. If resource doesn't move during maintenance mode,
>> could you detail a scenario where we should ban it explicitly from
>> other node to
>> secure its current location when getting out of maintenance? Isn't it
>
>Sorry, I was unclear -- I was contrasting maintenance mode with
>shutdown locks.
>
>You wouldn't need a ban with maintenance mode. However maintenance mode
>leaves any active resources running. That means the node shouldn't be
>rebooted in maintenance mode, because those resources will not be
>cleanly stopped.
>
>With shutdown locks, the active resources are cleanly stopped. That
>does require a ban of some sort because otherwise the resources will be
>recovered on another node.
>
>> excessive
>> precaution? Is it just to avoid is to move somewhere else when
>> exiting
>> maintenance-mode? If the resource has a preferred node, I suppose the
>> location
>> constraint should take care of this, isn't it?
>
>Having a preferred node doesn't prevent the resource from starting
>elsewhere if the preferred node is down (or in standby, or otherwise
>ineligible to run the resource). Even a +INFINITY constraint allows
>recovery elsewhere if the node is not available. To keep a resource
>from being recovered, you have to put a ban (-INFINITY location
>constraint) on any nodes that could otherwise run it.
>
>> > > > I wonder: Where is it different from a time-limited "ban"
>> > > > (wording
>> > > > also exists
>> > > > already)? If you ban all resources from running on a specific
>> > > > node,
>> > > > resources
>> > > > would be move away, and when booting the node, resources won't
>> > > > come
>> > > > back.  
>> > 
>> > It actually is equivalent to this process:
>> > 
>> > 1. Determine what resources are active on the node about to be shut
>> > down.
>> > 2. For each of those resources, configure a ban (location
>> > constraint
>> > with -INFINITY score) using a rule where node name is not the node
>> > being shut down.
>> > 3. Apply the updates and reboot the node. The cluster will stop the
>> > resources (due to shutdown) and not start them anywhere else (due
>> > to
>> > the bans).
>> 
>> In maintenance mode, this would not move either.
>
>The problem with maintenance mode for this scenario is that the reboot
>would uncleanly terminate any active resources.
>
>> > 4. Wait for the node to rejoin and the resources to start on it
>> > again,
>> > then remove all the bans.
>> > 
>> > The advantage is automation, and in particular the sysadmin
>> > applying
>> > the updates doesn't need to even know that the host is part of a
>> > cluster.
>> 
>> Could you elaborate? I suppose the operator still need to issue a
>> command to
>> set the shutdown‑lock before reboot, isn't it?
>
>Ah, no -- this is intended as a permanent cluster configuration
>setting, always in effect.
>
>> Moreover, if shutdown‑lock is just a matter of setting ±infinity
>> constraint on
>> nodes, maybe a higher level tool can take care of this?
>
>In this case, the operator applying the reboot may not even know what
>pacemaker is, much less what command to run. The goal is to fully
>automate the process so a cluster-aware administrator does not need to
>be present.
>
>I did consider a number of alternative approaches, but they all had
>problematic corner cases. For a higher-level tool or anything external
>to pacemaker, one such corner case is a "time-of-check/time-of-use"
>problem -- determining the list of active resources has to be done
>separately from configuring the bans, and it's possible the list could
>change in the meantime.
>
>> > > This is the standby mode.  
>> > 
>> > Standby mode will stop all resources on a node, but it doesn't
>> > prevent
>> > recovery elsewhere.
>> 
>> Yes, I was just commenting on Ulrich's description (history context
>> crop'ed
>> here).
>-- 
>Ken Gaillot <kgaillot at redhat.com>
>
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/

Hi Ken,

Can you tell me the logic of that feature?
So far it looks like:
1. Mark resources/groups that will be affected by the feature
2. Resources/groups  are stopped  (target-role=stopped)
3. Node exits the cluster cleanly when no resources are  running any more
4. The node rejoins the cluster  after  the reboot
5. A  positive (on the rebooted node) & negative (ban on the rest of the nodes) constraints  are  created for the marked  in step 1 resources
6.  target-role is  set back to started and the resources are back and running
7. When each resource group (or standalone resource)  is  back online -  the mark in step 1  is removed  and any location constraints  (cli-ban &  cli-prefer)  are  removed  for the resource/group.

Yet, if that feature will attract more end users (or even enterprises) - I think that it will be positive for the stack.

Best Regards,
Strahil Nikolov