[ClusterLabs] trigger something at ?

Fri Feb 2 02:15:13 EST 2024

On 01/02/2024 15:02, Jehan-Guillaume de Rorthais wrote:
> On Wed, 31 Jan 2024 18:23:40 +0100
> lejeczek via Users <users at clusterlabs.org> wrote:
>
>> On 31/01/2024 17:13, Jehan-Guillaume de Rorthais wrote:
>>> On Wed, 31 Jan 2024 16:37:21 +0100
>>> lejeczek via Users <users at clusterlabs.org> wrote:
>>>   
>>>> On 31/01/2024 16:06, Jehan-Guillaume de Rorthais wrote:
>>>>> On Wed, 31 Jan 2024 16:02:12 +0100
>>>>> lejeczek via Users <users at clusterlabs.org> wrote:
>>>>>   
>>>>>> On 29/01/2024 17:22, Ken Gaillot wrote:
>>>>>>> On Fri, 2024-01-26 at 13:55 +0100, lejeczek via Users wrote:
>>>>>>>> Hi guys.
>>>>>>>>
>>>>>>>> Is it possible to trigger some... action - I'm thinking specifically
>>>>>>>> at shutdown/start.
>>>>>>>> If not within the cluster then - if you do that - perhaps outside.
>>>>>>>> I would like to create/remove constraints, when cluster starts &
>>>>>>>> stops, respectively.
>>>>>>>>
>>>>>>>> many thanks, L.
>>>>>>>>   
>>>>>>> You could use node status alerts for that, but it's risky for alert
>>>>>>> agents to change the configuration (since that may result in more
>>>>>>> alerts and potentially some sort of infinite loop).
>>>>>>>
>>>>>>> Pacemaker has no concept of a full cluster start/stop, only node
>>>>>>> start/stop. You could approximate that by checking whether the node
>>>>>>> receiving the alert is the only active node.
>>>>>>>
>>>>>>> Another possibility would be to write a resource agent that does what
>>>>>>> you want and order everything else after it. However it's even more
>>>>>>> risky for a resource agent to modify the configuration.
>>>>>>>
>>>>>>> Finally you could write a systemd unit to do what you want and order it
>>>>>>> after pacemaker.
>>>>>>>
>>>>>>> What's wrong with leaving the constraints permanently configured?
>>>>>> yes, that would be for a node start/stop
>>>>>> I struggle with using constraints to move pgsql (PAF) master
>>>>>> onto a given node - seems that co/locating paf's master
>>>>>> results in troubles (replication brakes) at/after node
>>>>>> shutdown/reboot (not always, but way too often)
>>>>> What? What's wrong with colocating PAF's masters exactly? How does it
>>>>> brake any replication? What's these constraints you are dealing with?
>>>>>
>>>>> Could you share your configuration?
>>>> Constraints beyond/above of what is required by PAF agent
>>>> itself, say...
>>>> you have multiple pgSQL cluster with PAF - thus multiple
>>>> (separate, for each pgSQL cluster) masters and you want to
>>>> spread/balance those across HA cluster
>>>> (or in other words - avoid having more that 1 pgsql master
>>>> per HA node)
>>> ok
>>>   
>>>> These below, I've tried, those move the master onto chosen
>>>> node but.. then the issues I mentioned.
>>> You just mentioned it breaks the replication, but there so little
>>> information about your architecture and configuration, it's impossible to
>>> imagine how this could break the replication.
>>>
>>> Could you add details about the issues ?
>>>   
>>>> -> $ pcs constraint location PGSQL-PAF-5438-clone prefers
>>>> ubusrv1=1002
>>>> or
>>>> -> $ pcs constraint colocation set PGSQL-PAF-5435-clone
>>>> PGSQL-PAF-5434-clone PGSQL-PAF-5433-clone role=Master
>>>> require-all=false setoptions score=-1000
>>> I suppose "collocation" constraint is the way to go, not the "location"
>>> one.
>> This should be easy to replicate, 3 x VMs, Ubuntu 22.04 in
>> my case
> No, this is not easy to replicate. I have no idea how you setup your PostgreSQL
> replication, neither I have your full pacemaker configuration.
>
> Please provide either detailed setupS and/or ansible and/or terraform and/or
> vagrant, then a detailed scenario showing how it breaks. This is how you can
> help and motivate devs to reproduce your issue and work on it.
>
> I will not try to poke around for hours until I find an issue that might not
> even be the same than yours.
How about you start with the basics - strange inclination to 
complicate things when they are not, I hear from you - 
that's what I did while "stumbled" upon these "issues"
How about just:
a) do vanilla-default pgSQL in Ubuntu (or perhaps any other 
OS of your choice), I use _pg_createcluster_
b) follow PAF official guide (a single PAF resource should 
suffice)
Have a healthy pgSQL cluster, OS _reboot_ nodes - play with 
that, all should be ok, moving around/electing master should 
work a ok.
Then... add, play with "additional" co/location constraints, 
then OS reboots,- things should begin braking.
I have 3-node HA cluster & 3-node PAF resource = 1 master + 
2 slaves.
Only thing I deliberately set, to alleviate pgsql 
replication was _wal_keep_size_ - I increased that, but this 
is subjective.

It's fine with me if you don't feel like doing this.