FAQ

From ClusterLabs

Jump to: navigation, search


Contents

If you have suggestions for possible FAQ entries, please send them to: andrew@beekhof.net

Organizational

Why Can't I Create a Wiki Account?

At the moment I've not sorted out the site security so I've disabled the creation of new users. If you'd like an account, just email us at pacemaker@oss.clusterlabs.org and we can create one manually for you.

Why was the Project Started?

Pacemaker grew out of the Heartbeat project.

See the project history for more details.

Why is the Project Called Pacemaker?

First of all, the reason it's not called the CRM (for Cluster Resource Manager) is because of the abundance of terms that are commonly abbreviated to those three letters.

The Pacemaker name came from Kham, a good friend of mine, and was originally used by a Java GUI that I was prototyping in early 2007. Alas other commitments have prevented the GUI from progressing much and, when it came time to choose a name for this project, Lars suggested it was an even better fit for an independent CRM.


The idea stems from the analogy between the role of this software and that of the little device that keeps the human heart pumping.

Pacemaker monitors the cluster and intervenes when necessary to ensure the smooth operation of the services it provides.


There were a number of other names (and acronyms) tossed around, but suffice to say Pacemaker was the best of the lot :-)

What is the Project's Relationship with Heartbeat?

Pacemaker keeps your applications running when they or the machines they're running on fail. However it can't do this without connectivity to the other machines in the cluster - a significant problem in its own right.

Rather than re-implement the wheel, Pacemaker supports existing implementations such as Heartbeat. Heartbeat provides:

  • a mechanism to reliably send messages between nodes,
  • notifications when machines appear and disappear
  • a list of machines that are up that is consistent throughout the cluster

Heartbeat was also the first stack supported by the Pacemaker codebase.

What is the Project's Relationship with Corosync?

Pacemaker keeps your applications running when they or the machines they're running on fail. However it can't do this without connectivity to the other machines in the cluster - a significant problem in its own right.

Rather than re-implement the wheel, Pacemaker supports existing implementations such as Corosync. Corosync provides pacemaker:

  • a mechanism to reliably send messages between nodes,
  • notifications when machines appear and disappear
  • a list of machines that are up that is consistent throughout the cluster

Corosync was the second stack supported by the Pacemaker codebase.

What is the Project's Relationship with OpenAIS?

This one is tricky.

Originally Corosync and OpenAIS were the same thing. Then they split into two parts... the core messaging and membership capabilities are now called Corosync, and OpenAIS retained the layer containing the implementation of the AIS standard.

Pacemaker itself only needs the Corosync piece in order to function, however some of the applications it can manage (such as OCFS2 and GFS2) require the OpenAIS layer as well.

Is there any documentation?

Yes. You can find the set relevant to your version in our documentation index.

Where should I ask questions?

Often basic questions can be answered on irc, but sending them to the relevant mailing list(s) is always a good idea so that everyone can benefit from the answer. Don't worry if you pick the wrong one, many of us are on multiple lists and someone will suggest a more appropriate list if necessary.

Do I need shared storage?

No. We can help manage it if you have some, but Pacemaker itself has no need for shared storage.

Which cluster filesystems does Pacemaker support?

Pacemaker supports the popular OCFS2 and GFS2 filesystems. As you'd expect, you can use them on top of real disks or network block devices like DRBD.

What kind of applications can I manage with Pacemaker?

Pacemaker is application agnostic, meaning anything that can be scripted can be made highly available - provided the script conforms to one of the supported standards: LSB, OCF, Systemd, or Upstart.

Do I need a fencing device?

Yes. Fencing is the only 100% reliable way to ensure the integrity of your data and that applications are only active on one host. Although Pacemaker is technically able to function without Fencing, there are a good reasons SUSE and Red Hat will not support such a configuration.

Do I need to know XML to configure Pacemaker?

No. Although Pacemaker uses XML as its native configuration format, there exist 2 CLIs and at least 4 GUIs that present the configuration in a human friendly format.

How do I synchronize the cluster configuration?

Any changes to Pacemaker's configuration are automatically replicated to other machines. The configuration is also versioned, so any offline machines will be updated when they return.

Should I choose pcs or crmsh?

Arguably the best advice is to use whichever one comes with your distro. This is the one that will be tailored to that environment, receive regular bugfixes and feature in the documentation.

Of course, for years people have been side-loading all of Pacemaker onto enterprise distros that didn't ship it, so doing the same for just a configuration tool should be easy if your favorite distro does not ship your favorite tool.

What if my question isn't here?

See our help page and let us know!

What Versions of Pacemaker Are Supported?

When seeking assistance, please try to ensure you have one of the versions supported directly by the project. Please refer to the Releases page for further details including the schedule of planned releases.

Supported Branches

Version Current Release First Released This Release Next Release
1.1 1.1.9 Jan 15, 2010 Mar 8, 2013 July 2013
1.0 1.0.13 Oct 9, 2008 Feb 13, 2013 As needed

Deprecated Branches

Version Last Release First Released Last Released
0.7 0.7.3 June 25, 2008 Sep 22, 2008
0.6 0.6.7 Jan 16, 2008 Dec 15, 2008

Technical

How Do I Install Pacemaker?

Installation from source and from pre-built packages is described on the Install page.

Can I use Pacemaker with Heartbeat?

Yes. Pacemaker started off life as part of the Heartbeat project and continues to support it as an alternative to Corosync. See this documentation for more details

Can I use Pacemaker with CMAN?

Yes. Pacemaker added support for CMAN in version 1.1.5 to better integrate with distros shipping the RHCS cluster stack. This is particularly relevant for those looking to use GFS2 or OCFS2. See the documentation for more details

Can I use Pacemaker with Corosync 1.x?

Yes. You will need to configure Corosync to load Pacemaker's custom plugin to provide the membership and quorum information we require. See the documentation for more details.

Can I use Pacemaker with Corosync 2.x?

Yes. Pacemaker can obtain the membership and quorum information it requires directly from Corosync in this configuration. See the documentation for more details.

Can I Choose which Messaging Layer to use at Run Time?

Yes. The CRM will automatically detect who started it and behave accordingly (assuming Pacemaker was built to support that messaging layer).

Can I Have a Mixed Heartbeat-OpenAIS/Corosync/CMAN Cluster?

No.

Where Can I Get the Source Code?

 git clone git://github.com/ClusterLabs/pacemaker.git

Where Can I Get Pre-built Packages?

Most users should be able to install Pacemaker directly from their distribution.

Pacemaker currently ships with Fedora (since 12), Red Hat Enterprise Linux (since 6.0 beta1), openSUSE (since 11.0), Debian Unstable, Ubuntu LTS (since 10.4 "Lucid Lynx”) and as a key component of the High Availability Extension for SUSE Linux Enterprise Server 11 (available free of charge to existing SLES10 customers).

It will also be part of Debian Squeeze.

Users of other distributions should refer to our Install page.

What Do the Prefixes in Changelog Mean?

  • High, Med, Low: These all indicate how much the end-user/admin should care about the change.
  • Dev: These are changes that fix bugs that don't exist in any released version of the project

Examples:

  • High - Preventing a segfault, implementing an important new feature or major changes to the behavior of a feature
  • Med - Hard to trigger bugs, bugs with workarounds, minor functional changes
  • Low - Non-functional changes, formatting or logging changes, changes to test code

How Do I Test My Cluster?

Pacemaker comes with a Cluster Test Suite (CTS for short) which is an integral part of our release testing. Traditionally this had been hard to set up and use however a new tool has been written to simplify the process.

It can be found at: http://github.com/ClusterLabs/pacemaker/tree/master/cts/cluster_test

Please give it a try and send feedback via the mailing list.

Resource is Too Active

Pacemaker will try and determine what resources are active on a machine when it starts. To do this, it sends what we call a probe which uses the monitor operation of your ResourceAgent.

There are two common reasons for seeing this message:

  • Your resource really is active on more than one node
    • Check you are _not_ starting it on boot
    • Did Pacemaker suffer an internal failure? If so, please check the Help:Contents page and report it
  • Your resource doesn't implement the monitor operation correctly
    • Make sure your Resource Agent conforms to the OCF-spec by using the ocf-tester script

You may also want to read the documentation for the multiple-active option which controls what Pacemaker does when it encounters this condition.

I Killed a Node but the Cluster Didn't Recover

One of the most common reasons for this is the way quorum is calculated for a 2-node cluster. Unlike Heartbeat, OpenAIS doesn't pretend 2-node clusters always have quorum.

In order to have quorum, more than half of the total number of cluster nodes need to be online. Clearly this is not the case when a node failure occurs in a 2-node cluster.

If you want to allow the remaining node to provide all the cluster services, you need to set the no-quorum-policy to ignore.

 crm configure property  no-quorum-policy=ignore

This provides the same behavior as Heartbeat, just be sure to set up STONITH to ensure data integrity.

How Do I Upgrade from Older Versions of Heartbeat?

If you plan to continue using the Heartbeat stack (as opposed to Corosync), check out the step-by-step guide on their website.

Features

How Do I Enable the GUI? (Corosync)

First you need to install the pacemaker-pygui package. Then you need to find the following lines in corosync.conf

service {
	# Load the Pacemaker Cluster Resource Manager
	name: pacemaker
	ver:  0
}

and add

	use_mgmtd: 1

before the closing bracket.

How Do I Enable the GUI? (Heartbeat)

First you need to install the pacemaker-pygui package. Then you need to add the following lines to ha.cf

 apiauth	mgmtd	uid=root
 respawn	root	/usr/lib/heartbeat/mgmtd -v

These used to be implied when crm yes was present but only when heartbeat is built with the built-in mgmtd (which it no longer is).

NOTE: People on 64-bit platforms will probably need to replace lib with lib64

Collocation Sets

The sequential option does not refer to ordering. Instead it tells Pacemaker to create a collocation chain between the members of the set.

Ie.

 colocation myset inf: app1 app2 app3 app4

is the equivalent of

 colocation myset-1 inf: app2 app1
 colocation myset-2 inf: app3 app2
 colocation myset-3 inf: app4 app3

(Eg. app4 -> app3 -> app2 -> app1)

Putting them in brackets sets sequential=false and removes the internal constraints. So

 colocation myset inf: app1 ( app2 app3 app4 )

is actually the equivalent of

 colocation myset-1 inf: app2 app1
 colocation myset-2 inf: app3 app1
 colocation myset-3 inf: app4 app1

(Eg. app2 -> app1, app3 -> app1, app4 -> app1)

The difference has implications when there is a failure. With sequential turned on, a failure in app2 results in app3 and app4 also being restarted. However with sequential turned off, a failure in app2 does not affect app3 or app4.

In both cases, a failure in app1 results in all resources being restarted.

Personal tools