"I want a Pacemaker cluster to manage virtual machine resources, but I also want Pacemaker to be able to manage the resources that live within those virtual machines."
Without pacemaker_remote, the possibilities for implementing the above use case have significant limitations:
The cluster stack could be run on the physical hosts only, which loses the ability to monitor resources within the guests.
A separate cluster could be on the virtual guests, which quickly hits scalability issues.
The cluster stack could be run on the guests using the same cluster as the physical hosts, which also hits scalability issues and complicates fencing.
With pacemaker_remote:
The physical hosts are cluster nodes (running the full cluster stack).
The virtual machines are guest nodes (running the pacemaker_remote service). Nearly zero configuration is required on the virtual machine.
The cluster stack on the cluster nodes launches the virtual machines and immediately connects to the pacemaker_remote service on them, allowing the virtual machines to integrate into the cluster.
The key difference here between the guest nodes and the cluster nodes is that the guest nodes do not run the cluster stack. This means they will never become the DC, initiate fencing actions or participate in quorum voting.
On the other hand, this also means that they are not bound to the scalability limits associated with the cluster stack (no 16-node corosync member limits to deal with). That isn’t to say that guest nodes can scale indefinitely, but it is known that guest nodes scale horizontally much further than cluster nodes.
Other than the quorum limitation, these guest nodes behave just like cluster nodes with respect to resource management. The cluster is fully capable of managing and monitoring resources on each guest node. You can build constraints against guest nodes, put them in standby, or do whatever else you’d expect to be able to do with cluster nodes. They even show up in crm_mon
output as nodes.
To solidify the concept, below is an example that is very similar to an actual deployment we test in our developer environment to verify guest node scalability:
16 cluster nodes running the full corosync + pacemaker stack
64 Pacemaker-managed virtual machine resources running pacemaker_remote configured as guest nodes
64 Pacemaker-managed webserver and database resources configured to run on the 64 guest nodes
With this deployment, you would have 64 webservers and databases running on 64 virtual machines on 16 hardware nodes, all of which are managed and monitored by the same Pacemaker deployment. It is known that pacemaker_remote can scale to these lengths and possibly much further depending on the specific scenario.