All examples assume two nodes that are reachable by their short name and IP address:
- node1 - 192.168.1.1
- node2 - 192.168.1.2
The convention followed is that [ALL] # denotes a command that needs to be run on all cluster machines, and [ONE] # indicates a command that only needs to be run on one cluster host.
Ubuntu appears to have switched to Corosync 2 for it's LTS releases.
We use aptitude to install pacemaker and some other necessary packages we will need:
[ALL] # aptitude install pacemaker corosync fence-agents
Configure Cluster Membership and Messaging
Since the pcs tool from RHEL does not exist on Ubuntu, we well create the corosync configuration file on both machines manually:
[ALL] # cat <
Start the Cluster
On each machine, run:
[ALL] # service pacemaker start
Set Cluster Options
With so many devices and possible topologies, it is nearly impossible to include Fencing in a document like this. For now we will disable it.
[ONE] # crm configure property stonith-enabled=false
One of the most common ways to deploy Pacemaker is in a 2-node configuration. However quorum as a concept makes no sense in this scenario (because you only have it when more than half the nodes are available), so we'll disable it too.
[ONE] # crm configure property no-quorum-policy=ignore
For demonstration purposes, we will force the cluster to move services after a single failure:
[ONE] # crm configure property migration-threshold=1
Add a Resource
Lets add a cluster service, we'll choose one doesn't require any configuration and works everywhere to make things easy. Here's the command:
[ONE] # crm configure primitive my_first_svc ocf:pacemaker:Dummy op monitor interval=120s
"my_first_svc" is the name the service will be known as.
"ocf:pacemaker:Dummy" tells Pacemaker which script to use (Dummy - an agent that's useful as a template and for guides like this one), which namespace it is in (pacemaker) and what standard it conforms to (OCF).
"op monitor interval=120s" tells Pacemaker to check the health of this service every 2 minutes by calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # crm_mon -1
Simulate a Service Failure
We can simulate an error by telling the service stop directly (without telling the cluster):
[ONE] # crm_resource --resource my_first_svc --force-stop
If you now run crm_mon in interactive mode (the default), you should see (within the monitor interval - 2 minutes) the cluster notice that my_first_svc failed and move it to another node.
- Configure Fencing
- Add more services - see Clusters from Scratch for examples of how to add IP address, Apache and DRBD to a cluster
- Learn how to make services prefer a specific host
- Learn how to make services run on the same host
- Learn how to make services start and stop in a specific order
- Find out what else Pacemaker can do - see Pacemaker Explained for an comprehensive list of concepts and options