7.8. Test Cluster Failover

Previously, we used pcs cluster stop pcmk-1 to stop all cluster services on pcmk-1, failing over the cluster resources, but there is another way to safely simulate node failure.

We can put the node into standby mode. Nodes in this state continue to run corosync and pacemaker but are not allowed to run resources. Any resources found active there will be moved elsewhere. This feature can be particularly useful when performing system administration tasks such as updating packages used by cluster resources.

Put the active node into standby mode, and observe the cluster move all the resources to the other node. The node’s status will change to indicate that it can no longer host resources, and eventually all the resources will move.

[root@pcmk-1 ~]# pcs cluster standby pcmk-1
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 18:04:22 2018
Last change: Mon Sep 10 18:03:43 2018 by root via cibadmin on pcmk-1

2 nodes configured
5 resources configured

Node pcmk-1: standby
Online: [ pcmk-2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-2
 WebSite        (ocf::heartbeat:apache):        Started pcmk-2
 Master/Slave Set: WebDataClone [WebData]
     Masters: [ pcmk-2 ]
     Stopped: [ pcmk-1 ]
 WebFS  (ocf::heartbeat:Filesystem):    Started pcmk-2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Once we’ve done everything we needed to on pcmk-1 (in this case nothing, we just wanted to see the resources move), we can allow the node to be a full cluster member again.

[root@pcmk-1 ~]# pcs cluster unstandby pcmk-1
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 18:05:22 2018
Last change: Mon Sep 10 18:05:21 2018 by root via cibadmin on pcmk-1

2 nodes configured
5 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started pcmk-2
 WebSite        (ocf::heartbeat:apache):        Started pcmk-2
 Master/Slave Set: WebDataClone [WebData]
     Masters: [ pcmk-2 ]
     Slaves: [ pcmk-1 ]
 WebFS  (ocf::heartbeat:Filesystem):    Started pcmk-2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Notice that pcmk-1 is back to the Online state, and that the cluster resources stay where they are due to our resource stickiness settings configured earlier.