[ClusterLabs] [Problem] In RHEL8.4beta, pgsql resource control fails.

Fri Apr 9 08:12:06 EDT 2021

On 4/8/21 11:21 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi Ken,
> Hi All,
>
> In the pgsql resource, crm_mon is executed in the process of demote and stop, and the result is processed.
>
> However, pacemaker included in RHEL8.4beta fails to execute this crm_mon.
>   - The problem also occurs on github master(c40e18f085fad9ef1d9d79f671ed8a69eb3e753f).
>
> The problem can be easily reproduced in the following ways.
>
> Step1. Modify to execute crm_mon in the stop process of the Dummy resource.
> ----
>
> dummy_stop() {
>      mon=$(crm_mon -1)
>      ret=$?
>      ocf_log info "### YAMAUCHI #### crm_mon[${ret}] : ${mon}"
>      dummy_monitor
>      if [ $? =  $OCF_SUCCESS ]; then
>          rm ${OCF_RESKEY_state}
>      fi
>      return $OCF_SUCCESS
> }
> ----
>
> Step2. Configure a cluster with two nodes.
> ----
>
> [root at rh84-beta01 ~]# crm_mon -rfA1
> Cluster Summary:
>    * Stack: corosync
>    * Current DC: rh84-beta01 (version 2.0.5-8.el8-ba59be7122) - partition with quorum
>    * Last updated: Thu Apr  8 18:00:52 2021
>    * Last change:  Thu Apr  8 18:00:38 2021 by root via cibadmin on rh84-beta01
>    * 2 nodes configured
>    * 1 resource instance configured
>
> Node List:
>    * Online: [ rh84-beta01 rh84-beta02 ]
>
> Full List of Resources:
>    * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta01
>
> Migration Summary:
> ----
>
> Step3. Stop the node where the Dummy resource is running. The resource will fail over.
> ----
> [root at rh84-beta02 ~]# crm_mon -rfA1
> Cluster Summary:
>    * Stack: corosync
>    * Current DC: rh84-beta02 (version 2.0.5-8.el8-ba59be7122) - partition with quorum
>    * Last updated: Thu Apr  8 18:08:56 2021
>    * Last change:  Thu Apr  8 18:05:08 2021 by root via cibadmin on rh84-beta01
>    * 2 nodes configured
>    * 1 resource instance configured
>
> Node List:
>    * Online: [ rh84-beta02 ]
>    * OFFLINE: [ rh84-beta01 ]
>
> Full List of Resources:
>    * dummy-1     (ocf::heartbeat:Dummy):  Started rh84-beta02
> ----
>
> However, if you look at the log, you can see that the execution of crm_mon in the stop processing of the Dummy resource has failed.
>
> ----
> Apr 08 18:05:17  Dummy(dummy-1)[2631]:    INFO: ### YAMAUCHI #### crm_mon[102] : Pacemaker daemons shutting down ...
> Apr 08 18:05:17 rh84-beta01 pacemaker-execd     [2219] (log_op_output)  notice: dummy-1_stop_0[2631] error output [ crm_mon: Error: cluster is not available on this node ]
Hmm ... is that with selinux enabled?
Respectively do you see any related avc messages?

Klaus
> ----
>
> Similarly, pgsql also executes crm_mon with demote or stop, so control fails.
>
> The problem seems to be related to the next fix.
>   * Report pacemakerd in state waiting for sbd
>    - https://github.com/ClusterLabs/pacemaker/pull/2278
>
> The problem does not occur with the release version of Pacemaker 2.0.5 or the Pacemaker included with RHEL8.3.
>
> This issue has a huge impact on the user.
>
> Perhaps it also affects the control of other resources that utilize crm_mon.
>
> Please improve the release version of RHEL8.4 so that it includes Pacemaker which does not cause this problem.
>   * Distributions other than RHEL may also be affected in future releases.
>
> ----
> This content is the same as the following Bugzilla.
>   - https://bugs.clusterlabs.org/show_bug.cgi?id=5471
> ----
>
> Best Regards,
> Hideo Yamauchi.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/