We support many deployment scenarios, from the simplest
2-node standby cluster to a 32-node active/active
We can also dramatically reduce hardware costs by allowing
several active/passive clusters to be combined and share a common
We monitor the system for both hardware and software failures.
In the event of a failure, we will automatically recover
your application and make sure it is available from one
of the remaning machines in the cluster.
After a failure, we use advanced algorithms to quickly
determine the optimum locations for services based on
relative node preferences and/or requirements to run with
other cluster services (we call these "constraints").
At its core, a cluster is a distributed finite state
machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines.
System HA is possible without a cluster manager, but you save many headaches using one anyway
Even a distributed and/or replicated application that is
able to survive the failure of one or more components can
benefit from a higher level cluster:
- awareness of other applications in the stack
- a shared quorum implementation and calculation
- data integrity through fencing (a non-responsive process does not imply it is not doing anything)
- automated recovery of instances to ensure capacity
While SYS-V init replacements like systemd can provide
deterministic recovery of a complex stack of services, the
recovery is limited to one machine and lacks the context
of what is happening on other machines - context that is
crucial to determine the difference between a local
failure, clean startup or recovery after a total site
"The definitive open-source high-availability stack for the Linux
platform builds upon the Pacemaker cluster resource manager."
-- LINUX Journal,
of the Pack: the Pacemaker High-Availability Stack"
A Pacemaker stack is built on five core components:
- libQB - core services (logging, IPC, etc)
- Corosync - Membership, messaging and quorum
- Resource agents - A collection of scripts that interact with the underlying services managed by the cluster
- Fencing agents - A collection of scripts that interact with network power switches and SAN devices to isolate cluster members
- Pacemaker itself
We describe each of these in more detail as well as other optional components such as CLIs and GUIs.
Pacemaker has been around
and is primarily a collaborative effort
between Red Hat
and SUSE, however we also
receive considerable help and support from the folks
at LinBit and the community in
"Pacemaker cluster stack is the state-of-the-art high availability
and load balancing stack for the Linux platform."
Corosync also began life in 2004
but was then part of the OpenAIS project.
It is primarily a Red
Hat initiative, however we also receive considerable
help and support from the folks in the community.
The core ClusterLabs team is made up of full-time
developers from Australia, Austria, Canada, China, Czech
Repulic, England, Germany, Sweden and the USA. Contributions to
the code or documentation are always welcome.
The ClusterLabs stack ships with most modern enterprise
distributions and has been deployed in many critical
environments including Deutsche Flugsicherung GmbH
which uses Pacemaker to ensure
traffic control systems are always available.