11. Collective Resources
Pacemaker supports several types of collective resources, which consist of multiple, related resource instances.
11.1. Groups - A Syntactic Shortcut
One of the most common elements of a cluster is a set of resources that need to be located together, start sequentially, and stop in the reverse order. To simplify this configuration, we support the concept of groups.
A group of two primitive resources
<group id="shortcut">
<primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
<instance_attributes id="params-public-ip">
<nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
</instance_attributes>
</primitive>
<primitive id="Email" class="lsb" type="exim"/>
</group>
Although the example above contains only two resources, there is no limit to the number of resources a group can contain. The example is also sufficient to explain the fundamental properties of a group:
Resources are started in the order they appear in (Public-IP first, then Email)
Resources are stopped in the reverse order to which they appear in (Email first, then Public-IP)
If a resource in the group can’t run anywhere, then nothing after that is allowed to run, too.
If Public-IP can’t run anywhere, neither can Email;
but if Email can’t run anywhere, this does not affect Public-IP in any way
The group above is logically equivalent to writing:
How the cluster sees a group resource
<configuration>
<resources>
<primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
<instance_attributes id="params-public-ip">
<nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
</instance_attributes>
</primitive>
<primitive id="Email" class="lsb" type="exim"/>
</resources>
<constraints>
<rsc_colocation id="xxx" rsc="Email" with-rsc="Public-IP" score="INFINITY"/>
<rsc_order id="yyy" first="Public-IP" then="Email"/>
</constraints>
</configuration>
Obviously as the group grows bigger, the reduced configuration effort can become significant.
Another (typical) example of a group is a DRBD volume, the filesystem mount, an IP address, and an application that uses them.
11.1.1. Group Properties
Field |
Description |
---|---|
id |
A unique name for the group |
description |
An optional description of the group, for the user’s own
purposes.
E.g. |
11.1.2. Group Options
Groups inherit the priority
, target-role
, and is-managed
properties
from primitive resources. See Resource Options for information about
those properties.
11.1.3. Group Instance Attributes
Groups have no instance attributes. However, any that are set for the group object will be inherited by the group’s children.
11.1.4. Group Contents
Groups may only contain a collection of cluster resources (see
Resource Properties). To refer to a child of a group resource, just use
the child’s id
instead of the group’s.
11.1.5. Group Constraints
Although it is possible to reference a group’s children in constraints, it is usually preferable to reference the group itself.
Some constraints involving groups
<constraints>
<rsc_location id="group-prefers-node1" rsc="shortcut" node="node1" score="500"/>
<rsc_colocation id="webserver-with-group" rsc="Webserver" with-rsc="shortcut"/>
<rsc_order id="start-group-then-webserver" first="Webserver" then="shortcut"/>
</constraints>
11.1.6. Group Stickiness
Stickiness, the measure of how much a resource wants to stay where it
is, is additive in groups. Every active resource of the group will
contribute its stickiness value to the group’s total. So if the
default resource-stickiness
is 100, and a group has seven members,
five of which are active, then the group as a whole will prefer its
current location with a score of 500.
11.2. Clones - Resources That Can Have Multiple Active Instances
Clone resources are resources that can have more than one copy active at the same time. This allows you, for example, to run a copy of a daemon on every node. You can clone any primitive or group resource 1.
11.2.1. Anonymous versus Unique Clones
A clone resource is configured to be either anonymous or globally unique.
Anonymous clones are the simplest. These behave completely identically everywhere they are running. Because of this, there can be only one instance of an anonymous clone active per node.
The instances of globally unique clones are distinct entities. All instances are launched identically, but one instance of the clone is not identical to any other instance, whether running on the same node or a different node. As an example, a cloned IP address can use special kernel functionality such that each instance handles a subset of requests for the same IP address.
11.2.2. Promotable clones
If a clone is promotable, its instances can perform a special role that
Pacemaker will manage via the promote
and demote
actions of the resource
agent.
Services that support such a special role have various terms for the special role and the default role: primary and secondary, master and replica, controller and worker, etc. Pacemaker uses the terms promoted and unpromoted to be agnostic to what the service calls them or what they do.
All that Pacemaker cares about is that an instance comes up in the unpromoted role
when started, and the resource agent supports the promote
and demote
actions
to manage entering and exiting the promoted role.
11.2.3. Clone Properties
Field |
Description |
---|---|
id |
A unique name for the clone |
description |
An optional description of the clone, for the user’s own
purposes.
E.g. |
11.2.4. Clone Options
Options inherited from primitive resources:
priority, target-role, is-managed
Field |
Default |
Description |
---|---|---|
globally-unique |
false |
If true, each clone instance performs a distinct function |
clone-max |
0 |
The maximum number of clone instances that can be started across the entire cluster. If 0, the number of nodes in the cluster will be used. |
clone-node-max |
1 |
If |
clone-min |
0 |
Require at least this number of clone instances to be runnable before allowing resources depending on the clone to be runnable. A value of 0 means require all clone instances to be runnable. |
notify |
false |
Call the resource agent’s notify action for all active instances, before and after starting or stopping any clone instance. The resource agent must support this action. Allowed values: false, true |
ordered |
false |
If true, clone instances must be started sequentially instead of in parallel. Allowed values: false, true |
interleave |
false |
When this clone is ordered relative to another clone, if this option is false (the default), the ordering is relative to all instances of the other clone, whereas if this option is true, the ordering is relative only to instances on the same node. Allowed values: false, true |
promotable |
false |
If true, clone instances can perform a special role that Pacemaker will manage via the resource agent’s promote and demote actions. The resource agent must support these actions. Allowed values: false, true |
promoted-max |
1 |
If |
promoted-node-max |
1 |
If |
Note
Deprecated Terminology
In older documentation and online examples, you may see promotable clones referred to as multi-state, stateful, or master/slave; these mean the same thing as promotable. Certain syntax is supported for backward compatibility, but is deprecated and will be removed in a future version:
Using a
master
tag, instead of aclone
tag with thepromotable
meta-attribute set totrue
Using the
master-max
meta-attribute instead ofpromoted-max
Using the
master-node-max
meta-attribute instead ofpromoted-node-max
Using
Master
as a role name instead ofPromoted
Using
Slave
as a role name instead ofUnpromoted
11.2.5. Clone Contents
Clones must contain exactly one primitive or group resource.
A clone that runs a web server on all nodes
<clone id="apache-clone">
<primitive id="apache" class="lsb" type="apache">
<operations>
<op id="apache-monitor" name="monitor" interval="30"/>
</operations>
</primitive>
</clone>
Warning
You should never reference the name of a clone’s child (the primitive or group resource being cloned). If you think you need to do this, you probably need to re-evaluate your design.
11.2.6. Clone Instance Attribute
Clones have no instance attributes; however, any that are set here will be inherited by the clone’s child.
11.2.7. Clone Constraints
In most cases, a clone will have a single instance on each active cluster node. If this is not the case, you can indicate which nodes the cluster should preferentially assign copies to with resource location constraints. These constraints are written no differently from those for primitive resources except that the clone’s id is used.
Some constraints involving clones
<constraints>
<rsc_location id="clone-prefers-node1" rsc="apache-clone" node="node1" score="500"/>
<rsc_colocation id="stats-with-clone" rsc="apache-stats" with="apache-clone"/>
<rsc_order id="start-clone-then-stats" first="apache-clone" then="apache-stats"/>
</constraints>
Ordering constraints behave slightly differently for clones. In the
example above, apache-stats
will wait until all copies of apache-clone
that need to be started have done so before being started itself.
Only if no copies can be started will apache-stats
be prevented
from being active. Additionally, the clone will wait for
apache-stats
to be stopped before stopping itself.
Colocation of a primitive or group resource with a clone means that the resource can run on any node with an active instance of the clone. The cluster will choose an instance based on where the clone is running and the resource’s own location preferences.
Colocation between clones is also possible. If one clone A is colocated with another clone B, the set of allowed locations for A is limited to nodes on which B is (or will be) active. Placement is then performed normally.
11.2.7.1. Promotable Clone Constraints
For promotable clone resources, the first-action
and/or then-action
fields
for ordering constraints may be set to promote
or demote
to constrain the
promoted role, and colocation constraints may contain rsc-role
and/or
with-rsc-role
fields.
Constraints involving promotable clone resources
<constraints>
<rsc_location id="db-prefers-node1" rsc="database" node="node1" score="500"/>
<rsc_colocation id="backup-with-db-unpromoted" rsc="backup"
with-rsc="database" with-rsc-role="Unpromoted"/>
<rsc_colocation id="myapp-with-db-promoted" rsc="myApp"
with-rsc="database" with-rsc-role="Promoted"/>
<rsc_order id="start-db-before-backup" first="database" then="backup"/>
<rsc_order id="promote-db-then-app" first="database" first-action="promote"
then="myApp" then-action="start"/>
</constraints>
In the example above, myApp will wait until one of the database copies has been started and promoted before being started itself on the same node. Only if no copies can be promoted will myApp be prevented from being active. Additionally, the cluster will wait for myApp to be stopped before demoting the database.
Colocation of a primitive or group resource with a promotable clone
resource means that it can run on any node with an active instance of
the promotable clone resource that has the specified role (Promoted
or
Unpromoted
). In the example above, the cluster will choose a location
based on where database is running in the promoted role, and if there are
multiple promoted instances it will also factor in myApp’s own location
preferences when deciding which location to choose.
Colocation with regular clones and other promotable clone resources is also
possible. In such cases, the set of allowed locations for the rsc
clone is (after role filtering) limited to nodes on which the
with-rsc
promotable clone resource is (or will be) in the specified role.
Placement is then performed as normal.
11.2.7.2. Using Promotable Clone Resources in Colocation Sets
When a promotable clone is used in a resource set
inside a colocation constraint, the resource set may take a role
attribute.
In the following example, an instance of B may be promoted only on a node where A is in the promoted role. Additionally, resources C and D must be located on a node where both A and B are promoted.
Colocate C and D with A’s and B’s promoted instances
<constraints>
<rsc_colocation id="coloc-1" score="INFINITY" >
<resource_set id="colocated-set-example-1" sequential="true" role="Promoted">
<resource_ref id="A"/>
<resource_ref id="B"/>
</resource_set>
<resource_set id="colocated-set-example-2" sequential="true">
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
</rsc_colocation>
</constraints>
11.2.7.3. Using Promotable Clone Resources in Ordered Sets
When a promotable clone is used in a resource set
inside an ordering constraint, the resource set may take an action
attribute.
Start C and D after first promoting A and B
<constraints>
<rsc_order id="order-1" score="INFINITY" >
<resource_set id="ordered-set-1" sequential="true" action="promote">
<resource_ref id="A"/>
<resource_ref id="B"/>
</resource_set>
<resource_set id="ordered-set-2" sequential="true" action="start">
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
</rsc_order>
</constraints>
In the above example, B cannot be promoted until A has been promoted. Additionally, resources C and D must wait until A and B have been promoted before they can start.
11.2.8. Clone Stickiness
To achieve stable assignments, clones are slightly sticky by default. If no
value for resource-stickiness
is provided, the clone will use a value of 1.
Being a small value, it causes minimal disturbance to the score calculations of
other resources but is enough to prevent Pacemaker from needlessly moving
instances around the cluster.
Note
For globally unique clones, this may result in multiple instances of the
clone staying on a single node, even after another eligible node becomes
active (for example, after being put into standby mode then made active again).
If you do not want this behavior, specify a resource-stickiness
of 0
for the clone temporarily and let the cluster adjust, then set it back
to 1 if you want the default behavior to apply again.
Important
If resource-stickiness
is set in the rsc_defaults
section, it will
apply to clone instances as well. This means an explicit resource-stickiness
of 0 in rsc_defaults
works differently from the implicit default used when
resource-stickiness
is not specified.
11.2.9. Monitoring Promotable Clone Resources
The usual monitor actions are insufficient to monitor a promotable clone resource, because Pacemaker needs to verify not only that the resource is active, but also that its actual role matches its intended one.
Define two monitoring actions: the usual one will cover the unpromoted role,
and an additional one with role="Promoted"
will cover the promoted role.
Monitoring both states of a promotable clone resource
<clone id="myPromotableRsc">
<meta_attributes id="myPromotableRsc-meta">
<nvpair name="promotable" value="true"/>
</meta_attributes>
<primitive id="myRsc" class="ocf" type="myApp" provider="myCorp">
<operations>
<op id="public-ip-unpromoted-check" name="monitor" interval="60"/>
<op id="public-ip-promoted-check" name="monitor" interval="61" role="Promoted"/>
</operations>
</primitive>
</clone>
Important
It is crucial that every monitor operation has a different interval! Pacemaker currently differentiates between operations only by resource and interval; so if (for example) a promotable clone resource had the same monitor interval for both roles, Pacemaker would ignore the role when checking the status – which would cause unexpected return codes, and therefore unnecessary complications.
11.2.10. Determining Which Instance is Promoted
Pacemaker can choose a promotable clone instance to be promoted in one of two ways:
Promotion scores: These are node attributes set via the
crm_attribute
command using the--promotion
option, which generally would be called by the resource agent’s start action if it supports promotable clones. This tool automatically detects both the resource and host, and should be used to set a preference for being promoted. Based on this,promoted-max
, andpromoted-node-max
, the instance(s) with the highest preference will be promoted.Constraints: Location constraints can indicate which nodes are most preferred to be promoted.
Explicitly preferring node1 to be promoted
<rsc_location id="promoted-location" rsc="myPromotableRsc">
<rule id="promoted-rule" score="100" role="Promoted">
<expression id="promoted-exp" attribute="#uname" operation="eq" value="node1"/>
</rule>
</rsc_location>
11.3. Bundles - Containerized Resources
Pacemaker supports a special syntax for launching a service inside a container [https://en.wikipedia.org/wiki/Operating-system-level_virtualization] with any infrastructure it requires: the bundle.
Pacemaker bundles support Docker [https://www.docker.com/], podman [https://podman.io/] (since 2.0.1), and rkt [https://coreos.com/rkt/] container technologies. 2
A bundle for a containerized web server
<bundle id="httpd-bundle">
<podman image="pcmk:http" replicas="3"/>
<network ip-range-start="192.168.122.131"
host-netmask="24"
host-interface="eth0">
<port-mapping id="httpd-port" port="80"/>
</network>
<storage>
<storage-mapping id="httpd-syslog"
source-dir="/dev/log"
target-dir="/dev/log"
options="rw"/>
<storage-mapping id="httpd-root"
source-dir="/srv/html"
target-dir="/var/www/html"
options="rw,Z"/>
<storage-mapping id="httpd-logs"
source-dir-root="/var/log/pacemaker/bundles"
target-dir="/etc/httpd/logs"
options="rw,Z"/>
</storage>
<primitive class="ocf" id="httpd" provider="heartbeat" type="apache"/>
</bundle>
11.3.1. Bundle Prerequisites
Before configuring a bundle in Pacemaker, the user must install the appropriate container launch technology (Docker, podman, or rkt), and supply a fully configured container image, on every node allowed to run the bundle.
Pacemaker will create an implicit resource of type ocf:heartbeat:docker, ocf:heartbeat:podman, or ocf:heartbeat:rkt to manage a bundle’s container. The user must ensure that the appropriate resource agent is installed on every node allowed to run the bundle.
11.3.2. Bundle Properties
Field |
Description |
---|---|
id |
A unique name for the bundle (required) |
description |
An optional description of the group, for the user’s own
purposes.
E.g. |
A bundle must contain exactly one docker
, podman
, or rkt
element.
11.3.3. Bundle Container Properties
Attribute |
Default |
Description |
---|---|---|
image |
Container image tag (required) |
|
replicas |
Value of |
A positive integer specifying the number of container instances to launch |
replicas-per-host |
1 |
A positive integer specifying the number of container instances allowed to run on a single node |
promoted-max |
0 |
A non-negative integer that, if positive, indicates that the containerized service should be treated as a promotable service, with this many replicas allowed to run the service in the promoted role |
network |
If specified, this will be passed to the
|
|
run-command |
|
This command will be run inside the container
when launching it (“PID 1”). If the bundle
contains a primitive, this command must
start |
options |
Extra command-line options to pass to the
|
Note
Considerations when using cluster configurations or container images from Pacemaker 1.1:
If the container image has a pre-2.0.0 version of Pacemaker, set
run-command
to/usr/sbin/pacemaker_remoted
(note the underbar instead of dash).masters
is accepted as an alias forpromoted-max
, but is deprecated since 2.0.0, and support for it will be removed in a future version.
11.3.4. Bundle Network Properties
A bundle may optionally contain one <network>
element.
Attribute |
Default |
Description |
---|---|---|
add-host |
TRUE |
If TRUE, and |
ip-range-start |
If specified, Pacemaker will create an implicit
|
|
host-netmask |
32 |
If |
host-interface |
If |
|
control-port |
3121 |
If the bundle contains a |
Note
Replicas are named by the bundle id plus a dash and an integer counter starting with zero. For example, if a bundle named httpd-bundle has replicas=2, its containers will be named httpd-bundle-0 and httpd-bundle-1.
Additionally, a network
element may optionally contain one or more
port-mapping
elements.
Attribute |
Default |
Description |
---|---|---|
id |
A unique name for the port mapping (required) |
|
port |
If this is specified, connections to this TCP port
number on the host network (on the container’s
assigned IP address, if |
|
internal-port |
value of |
If |
range |
If this is specified, connections to these TCP
port numbers (expressed as first_port-last_port)
on the host network (on the container’s assigned IP
address, if |
Note
If the bundle contains a primitive
, Pacemaker will automatically map the
control-port
, so it is not necessary to specify that port in a
port-mapping
.
11.3.5. Bundle Storage Properties
A bundle may optionally contain one storage
element. A storage
element
has no properties of its own, but may contain one or more storage-mapping
elements.
Attribute |
Default |
Description |
---|---|---|
id |
A unique name for the storage mapping (required) |
|
source-dir |
The absolute path on the host’s filesystem that will be
mapped into the container. Exactly one of |
|
source-dir-root |
The start of a path on the host’s filesystem that will
be mapped into the container, using a different
subdirectory on the host for each container instance.
The subdirectory will be named the same as the
replica name.
Exactly one of |
|
target-dir |
The path name within the container where the host storage will be mapped (required) |
|
options |
A comma-separated list of file system mount options to use when mapping the storage |
Note
Pacemaker does not define the behavior if the source directory does not already exist on the host. However, it is expected that the container technology and/or its resource agent will create the source directory in that case.
Note
If the bundle contains a primitive
,
Pacemaker will automatically map the equivalent of
source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey
and source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log
into the
container, so it is not necessary to specify those paths in a
storage-mapping
.
Important
The PCMK_authkey_location
environment variable must not be set to anything
other than the default of /etc/pacemaker/authkey
on any node in the cluster.
Important
If SELinux is used in enforcing mode on the host, you must ensure the container is allowed to use any storage you mount into it. For Docker and podman bundles, adding “Z” to the mount options will create a container-specific label for the mount that allows the container access.
11.3.6. Bundle Primitive
A bundle may optionally contain one primitive resource. The primitive may have operations, instance attributes, and meta-attributes defined, as usual.
If a bundle contains a primitive resource, the container image must include
the Pacemaker Remote daemon, and at least one of ip-range-start
or
control-port
must be configured in the bundle. Pacemaker will create an
implicit ocf:pacemaker:remote resource for the connection, launch
Pacemaker Remote within the container, and monitor and manage the primitive
resource via Pacemaker Remote.
If the bundle has more than one container instance (replica), the primitive
resource will function as an implicit clone – a
promotable clone if the bundle has promoted-max
greater than zero.
Note
If you want to pass environment variables to a bundle’s Pacemaker Remote connection or primitive, you have two options:
Environment variables whose value is the same regardless of the underlying host may be set using the container element’s
options
attribute.If you want variables to have host-specific values, you can use the storage-mapping element to map a file on the host as
/etc/pacemaker/pcmk-init.env
in the container (since 2.0.3). Pacemaker Remote will parse this file as a shell-like format, with variables set as NAME=VALUE, ignoring blank lines and comments starting with “#”.
Important
When a bundle has a primitive
, Pacemaker on all cluster nodes must be able to
contact Pacemaker Remote inside the bundle’s containers.
The containers must have an accessible network (for example,
network
should not be set to “none” with aprimitive
).The default, using a distinct network space inside the container, works in combination with
ip-range-start
. Any firewall must allow access from all cluster nodes to thecontrol-port
on the container IPs.If the container shares the host’s network space (for example, by setting
network
to “host”), a uniquecontrol-port
should be specified for each bundle. Any firewall must allow access from all cluster nodes to thecontrol-port
on all cluster and remote node IPs.
11.3.7. Bundle Node Attributes
If the bundle has a primitive
, the primitive’s resource agent may want to set
node attributes such as promotion scores. However, with
containers, it is not apparent which node should get the attribute.
If the container uses shared storage that is the same no matter which node the container is hosted on, then it is appropriate to use the promotion score on the bundle node itself.
On the other hand, if the container uses storage exported from the underlying host, then it may be more appropriate to use the promotion score on the underlying host.
Since this depends on the particular situation, the
container-attribute-target
resource meta-attribute allows the user to specify
which approach to use. If it is set to host
, then user-defined node attributes
will be checked on the underlying host. If it is anything else, the local node
(in this case the bundle node) is used as usual.
This only applies to user-defined attributes; the cluster will always check the
local node for cluster-defined attributes such as #uname
.
If container-attribute-target
is host
, the cluster will pass additional
environment variables to the primitive’s resource agent that allow it to set
node attributes appropriately: CRM_meta_container_attribute_target
(identical
to the meta-attribute value) and CRM_meta_physical_host
(the name of the
underlying host).
Note
When called by a resource agent, the attrd_updater
and crm_attribute
commands will automatically check those environment variables and set
attributes appropriately.
11.3.8. Bundle Meta-Attributes
Any meta-attribute set on a bundle will be inherited by the bundle’s primitive and any resources implicitly created by Pacemaker for the bundle.
This includes options such as priority
, target-role
, and is-managed
. See
Resource Options for more information.
Bundles support clone meta-attributes including notify
, ordered
, and
interleave
.
11.3.9. Limitations of Bundles
Restarting pacemaker while a bundle is unmanaged or the cluster is in maintenance mode may cause the bundle to fail.
Bundles may not be explicitly cloned or included in groups. This includes the
bundle’s primitive and any resources implicitly created by Pacemaker for the
bundle. (If replicas
is greater than 1, the bundle will behave like a clone
implicitly.)
Bundles do not have instance attributes, utilization attributes, or operations, though a bundle’s primitive may have them.
A bundle with a primitive can run on a Pacemaker Remote node only if the bundle
uses a distinct control-port
.