All examples assume two nodes that are reachable by their short name and IP address:

The convention followed is that [ALL] # denotes a command that needs to be run on all cluster machines, and [ONE] # indicates a command that only needs to be run on one cluster host.

Ubuntu

Ubuntu appears to have switched to Corosync 2 for it's LTS releases.

We use aptitude to install pacemaker and some other necessary packages we will need:

[ALL] # aptitude install pacemaker corosync fence-agents

Configure Cluster Membership and Messaging

Since the pcs tool from RHEL does not exist on Ubuntu, we well create the corosync configuration file on both machines manually:

[ALL] # cat < /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: pacemaker1 transport: udpu } nodelist { node { ring0_addr: node1 nodeid: 101 } node { ring0_addr: node2 nodeid: 102 } } quorum { provider: corosync_votequorum two_node: 1 wait_for_all: 1 last_man_standing: 1 auto_tie_breaker: 0 } EOF

Start the Cluster

On each machine, run:

[ALL] # service pacemaker start

Set Cluster Options

With so many devices and possible topologies, it is nearly impossible to include Fencing in a document like this. For now we will disable it.

[ONE] # crm configure property stonith-enabled=false

One of the most common ways to deploy Pacemaker is in a 2-node configuration. However quorum as a concept makes no sense in this scenario (because you only have it when more than half the nodes are available), so we'll disable it too.

[ONE] # crm configure property no-quorum-policy=ignore

For demonstration purposes, we will force the cluster to move services after a single failure:

[ONE] # crm configure property migration-threshold=1

Add a Resource

Lets add a cluster service, we'll choose one doesn't require any configuration and works everywhere to make things easy. Here's the command:

[ONE] # crm configure primitive my_first_svc ocf:pacemaker:Dummy op monitor interval=120s

"my_first_svc" is the name the service will be known as.

"ocf:pacemaker:Dummy" tells Pacemaker which script to use (Dummy - an agent that's useful as a template and for guides like this one), which namespace it is in (pacemaker) and what standard it conforms to (OCF).

"op monitor interval=120s" tells Pacemaker to check the health of this service every 2 minutes by calling the agent's monitor action.

You should now be able to see the service running using:

[ONE] # crm_mon -1

Simulate a Service Failure

We can simulate an error by telling the service stop directly (without telling the cluster):

[ONE] # crm_resource --resource my_first_svc --force-stop

If you now run crm_mon in interactive mode (the default), you should see (within the monitor interval - 2 minutes) the cluster notice that my_first_svc failed and move it to another node.

Next Steps