11. Collective Resources

Pacemaker supports several types of collective resources, which consist of multiple, related resource instances.

11.1. Groups - A Syntactic Shortcut

One of the most common elements of a cluster is a set of resources that need to be located together, start sequentially, and stop in the reverse order. To simplify this configuration, we support the concept of groups.

A group of two primitive resources

<group id="shortcut">
   <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
    <instance_attributes id="params-public-ip">
       <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
    </instance_attributes>
   </primitive>
   <primitive id="Email" class="lsb" type="exim"/>
</group>

Although the example above contains only two resources, there is no limit to the number of resources a group can contain. The example is also sufficient to explain the fundamental properties of a group:

  • Resources are started in the order they appear in (Public-IP first, then Email)
  • Resources are stopped in the reverse order to which they appear in (Email first, then Public-IP)

If a resource in the group can’t run anywhere, then nothing after that is allowed to run, too.

  • If Public-IP can’t run anywhere, neither can Email;
  • but if Email can’t run anywhere, this does not affect Public-IP in any way

The group above is logically equivalent to writing:

How the cluster sees a group resource

<configuration>
   <resources>
    <primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
     <instance_attributes id="params-public-ip">
        <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
     </instance_attributes>
    </primitive>
    <primitive id="Email" class="lsb" type="exim"/>
   </resources>
   <constraints>
      <rsc_colocation id="xxx" rsc="Email" with-rsc="Public-IP" score="INFINITY"/>
      <rsc_order id="yyy" first="Public-IP" then="Email"/>
   </constraints>
</configuration>

Obviously as the group grows bigger, the reduced configuration effort can become significant.

Another (typical) example of a group is a DRBD volume, the filesystem mount, an IP address, and an application that uses them.

11.1.1. Group Properties

Properties of a Group Resource
Field Description
id

A unique name for the group

description

An optional description of the group, for the user’s own purposes. E.g. resources needed for website

11.1.2. Group Options

Groups inherit the priority, target-role, and is-managed properties from primitive resources. See Resource Options for information about those properties.

11.1.3. Group Instance Attributes

Groups have no instance attributes. However, any that are set for the group object will be inherited by the group’s children.

11.1.4. Group Contents

Groups may only contain a collection of cluster resources (see Resource Properties). To refer to a child of a group resource, just use the child’s id instead of the group’s.

11.1.5. Group Constraints

Although it is possible to reference a group’s children in constraints, it is usually preferable to reference the group itself.

Some constraints involving groups

<constraints>
    <rsc_location id="group-prefers-node1" rsc="shortcut" node="node1" score="500"/>
    <rsc_colocation id="webserver-with-group" rsc="Webserver" with-rsc="shortcut"/>
    <rsc_order id="start-group-then-webserver" first="Webserver" then="shortcut"/>
</constraints>

11.1.6. Group Stickiness

Stickiness, the measure of how much a resource wants to stay where it is, is additive in groups. Every active resource of the group will contribute its stickiness value to the group’s total. So if the default resource-stickiness is 100, and a group has seven members, five of which are active, then the group as a whole will prefer its current location with a score of 500.

11.2. Clones - Resources That Can Have Multiple Active Instances

Clone resources are resources that can have more than one copy active at the same time. This allows you, for example, to run a copy of a daemon on every node. You can clone any primitive or group resource [1].

11.2.1. Anonymous versus Unique Clones

A clone resource is configured to be either anonymous or globally unique.

Anonymous clones are the simplest. These behave completely identically everywhere they are running. Because of this, there can be only one instance of an anonymous clone active per node.

The instances of globally unique clones are distinct entities. All instances are launched identically, but one instance of the clone is not identical to any other instance, whether running on the same node or a different node. As an example, a cloned IP address can use special kernel functionality such that each instance handles a subset of requests for the same IP address.

11.2.2. Promotable clones

If a clone is promotable, its instances can perform a special role that Pacemaker will manage via the promote and demote actions of the resource agent.

Services that support such a special role have various terms for the special role and the default role: primary and secondary, master and replica, controller and worker, etc. Pacemaker uses the terms promoted and unpromoted to be agnostic to what the service calls them or what they do.

All that Pacemaker cares about is that an instance comes up in the unpromoted role when started, and the resource agent supports the promote and demote actions to manage entering and exiting the promoted role.

11.2.3. Clone Properties

Properties of a Clone Resource
Field Description
id

A unique name for the clone

description

An optional description of the clone, for the user’s own purposes. E.g. IP address for website

11.2.4. Clone Options

Options inherited from primitive resources: priority, target-role, is-managed

Clone-specific configuration options
Field Default Description
globally-unique false

If true, each clone instance performs a distinct function

clone-max 0

The maximum number of clone instances that can be started across the entire cluster. If 0, the number of nodes in the cluster will be used.

clone-node-max 1

If globally-unique is true, the maximum number of clone instances that can be started on a single node

clone-min 0

Require at least this number of clone instances to be runnable before allowing resources depending on the clone to be runnable. A value of 0 means require all clone instances to be runnable.

notify false

Call the resource agent’s notify action for all active instances, before and after starting or stopping any clone instance. The resource agent must support this action. Allowed values: false, true

ordered false

If true, clone instances must be started sequentially instead of in parallel. Allowed values: false, true

interleave false

When this clone is ordered relative to another clone, if this option is false (the default), the ordering is relative to all instances of the other clone, whereas if this option is true, the ordering is relative only to instances on the same node. Allowed values: false, true

promotable false

If true, clone instances can perform a special role that Pacemaker will manage via the resource agent’s promote and demote actions. The resource agent must support these actions. Allowed values: false, true

promoted-max 1

If promotable is true, the number of instances that can be promoted at one time across the entire cluster

promoted-node-max 1

If promotable and globally-unique are true, the number of clone instances can be promoted at one time on a single node

Note

Deprecated Terminology

In older documentation and online examples, you may see promotable clones referred to as multi-state, stateful, or master/slave; these mean the same thing as promotable. Certain syntax is supported for backward compatibility, but is deprecated and will be removed in a future version:

  • Using a master tag, instead of a clone tag with the promotable meta-attribute set to true
  • Using the master-max meta-attribute instead of promoted-max
  • Using the master-node-max meta-attribute instead of promoted-node-max
  • Using Master as a role name instead of Promoted
  • Using Slave as a role name instead of Unpromoted

11.2.5. Clone Contents

Clones must contain exactly one primitive or group resource.

A clone that runs a web server on all nodes

<clone id="apache-clone">
    <primitive id="apache" class="lsb" type="apache">
        <operations>
           <op id="apache-monitor" name="monitor" interval="30"/>
        </operations>
    </primitive>
</clone>

Warning

You should never reference the name of a clone’s child (the primitive or group resource being cloned). If you think you need to do this, you probably need to re-evaluate your design.

11.2.6. Clone Instance Attribute

Clones have no instance attributes; however, any that are set here will be inherited by the clone’s child.

11.2.7. Clone Constraints

In most cases, a clone will have a single instance on each active cluster node. If this is not the case, you can indicate which nodes the cluster should preferentially assign copies to with resource location constraints. These constraints are written no differently from those for primitive resources except that the clone’s id is used.

Some constraints involving clones

<constraints>
    <rsc_location id="clone-prefers-node1" rsc="apache-clone" node="node1" score="500"/>
    <rsc_colocation id="stats-with-clone" rsc="apache-stats" with="apache-clone"/>
    <rsc_order id="start-clone-then-stats" first="apache-clone" then="apache-stats"/>
</constraints>

Ordering constraints behave slightly differently for clones. In the example above, apache-stats will wait until all copies of apache-clone that need to be started have done so before being started itself. Only if no copies can be started will apache-stats be prevented from being active. Additionally, the clone will wait for apache-stats to be stopped before stopping itself.

Colocation of a primitive or group resource with a clone means that the resource can run on any node with an active instance of the clone. The cluster will choose an instance based on where the clone is running and the resource’s own location preferences.

Colocation between clones is also possible. If one clone A is colocated with another clone B, the set of allowed locations for A is limited to nodes on which B is (or will be) active. Placement is then performed normally.

11.2.7.1. Promotable Clone Constraints

For promotable clone resources, the first-action and/or then-action fields for ordering constraints may be set to promote or demote to constrain the promoted role, and colocation constraints may contain rsc-role and/or with-rsc-role fields.

Constraints involving promotable clone resources

<constraints>
   <rsc_location id="db-prefers-node1" rsc="database" node="node1" score="500"/>
   <rsc_colocation id="backup-with-db-unpromoted" rsc="backup"
     with-rsc="database" with-rsc-role="Unpromoted"/>
   <rsc_colocation id="myapp-with-db-promoted" rsc="myApp"
     with-rsc="database" with-rsc-role="Promoted"/>
   <rsc_order id="start-db-before-backup" first="database" then="backup"/>
   <rsc_order id="promote-db-then-app" first="database" first-action="promote"
     then="myApp" then-action="start"/>
</constraints>

In the example above, myApp will wait until one of the database copies has been started and promoted before being started itself on the same node. Only if no copies can be promoted will myApp be prevented from being active. Additionally, the cluster will wait for myApp to be stopped before demoting the database.

Colocation of a primitive or group resource with a promotable clone resource means that it can run on any node with an active instance of the promotable clone resource that has the specified role (Promoted or Unpromoted). In the example above, the cluster will choose a location based on where database is running in the promoted role, and if there are multiple promoted instances it will also factor in myApp’s own location preferences when deciding which location to choose.

Colocation with regular clones and other promotable clone resources is also possible. In such cases, the set of allowed locations for the rsc clone is (after role filtering) limited to nodes on which the with-rsc promotable clone resource is (or will be) in the specified role. Placement is then performed as normal.

11.2.7.2. Using Promotable Clone Resources in Colocation Sets

When a promotable clone is used in a resource set inside a colocation constraint, the resource set may take a role attribute.

In the following example, an instance of B may be promoted only on a node where A is in the promoted role. Additionally, resources C and D must be located on a node where both A and B are promoted.

Colocate C and D with A’s and B’s promoted instances

<constraints>
    <rsc_colocation id="coloc-1" score="INFINITY" >
      <resource_set id="colocated-set-example-1" sequential="true" role="Promoted">
        <resource_ref id="A"/>
        <resource_ref id="B"/>
      </resource_set>
      <resource_set id="colocated-set-example-2" sequential="true">
        <resource_ref id="C"/>
        <resource_ref id="D"/>
      </resource_set>
    </rsc_colocation>
</constraints>

11.2.7.3. Using Promotable Clone Resources in Ordered Sets

When a promotable clone is used in a resource set inside an ordering constraint, the resource set may take an action attribute.

Start C and D after first promoting A and B

<constraints>
    <rsc_order id="order-1" score="INFINITY" >
      <resource_set id="ordered-set-1" sequential="true" action="promote">
        <resource_ref id="A"/>
        <resource_ref id="B"/>
      </resource_set>
      <resource_set id="ordered-set-2" sequential="true" action="start">
        <resource_ref id="C"/>
        <resource_ref id="D"/>
      </resource_set>
    </rsc_order>
</constraints>

In the above example, B cannot be promoted until A has been promoted. Additionally, resources C and D must wait until A and B have been promoted before they can start.

11.2.8. Clone Stickiness

To achieve stable assignments, clones are slightly sticky by default. If no value for resource-stickiness is provided, the clone will use a value of 1. Being a small value, it causes minimal disturbance to the score calculations of other resources but is enough to prevent Pacemaker from needlessly moving instances around the cluster.

Note

For globally unique clones, this may result in multiple instances of the clone staying on a single node, even after another eligible node becomes active (for example, after being put into standby mode then made active again). If you do not want this behavior, specify a resource-stickiness of 0 for the clone temporarily and let the cluster adjust, then set it back to 1 if you want the default behavior to apply again.

Important

If resource-stickiness is set in the rsc_defaults section, it will apply to clone instances as well. This means an explicit resource-stickiness of 0 in rsc_defaults works differently from the implicit default used when resource-stickiness is not specified.

11.2.9. Monitoring Promotable Clone Resources

The usual monitor actions are insufficient to monitor a promotable clone resource, because Pacemaker needs to verify not only that the resource is active, but also that its actual role matches its intended one.

Define two monitoring actions: the usual one will cover the unpromoted role, and an additional one with role="Promoted" will cover the promoted role.

Monitoring both states of a promotable clone resource

<clone id="myPromotableRsc">
   <meta_attributes id="myPromotableRsc-meta">
       <nvpair name="promotable" value="true"/>
   </meta_attributes>
   <primitive id="myRsc" class="ocf" type="myApp" provider="myCorp">
    <operations>
     <op id="public-ip-unpromoted-check" name="monitor" interval="60"/>
     <op id="public-ip-promoted-check" name="monitor" interval="61" role="Promoted"/>
    </operations>
   </primitive>
</clone>

Important

It is crucial that every monitor operation has a different interval! Pacemaker currently differentiates between operations only by resource and interval; so if (for example) a promotable clone resource had the same monitor interval for both roles, Pacemaker would ignore the role when checking the status – which would cause unexpected return codes, and therefore unnecessary complications.

11.2.10. Determining Which Instance is Promoted

Pacemaker can choose a promotable clone instance to be promoted in one of two ways:

  • Promotion scores: These are node attributes set via the crm_attribute command using the --promotion option, which generally would be called by the resource agent’s start action if it supports promotable clones. This tool automatically detects both the resource and host, and should be used to set a preference for being promoted. Based on this, promoted-max, and promoted-node-max, the instance(s) with the highest preference will be promoted.
  • Constraints: Location constraints can indicate which nodes are most preferred to be promoted.

Explicitly preferring node1 to be promoted

<rsc_location id="promoted-location" rsc="myPromotableRsc">
    <rule id="promoted-rule" score="100" role="Promoted">
      <expression id="promoted-exp" attribute="#uname" operation="eq" value="node1"/>
    </rule>
</rsc_location>

11.3. Bundles - Containerized Resources

Pacemaker supports a special syntax for launching a service inside a container with any infrastructure it requires: the bundle.

Pacemaker bundles support Docker, podman (since 2.0.1), and rkt container technologies. [2]

A bundle for a containerized web server

<bundle id="httpd-bundle">
   <podman image="pcmk:http" replicas="3"/>
   <network ip-range-start="192.168.122.131"
            host-netmask="24"
            host-interface="eth0">
      <port-mapping id="httpd-port" port="80"/>
      </network>
   <storage>
      <storage-mapping id="httpd-syslog"
                       source-dir="/dev/log"
                       target-dir="/dev/log"
                       options="rw"/>
      <storage-mapping id="httpd-root"
                       source-dir="/srv/html"
                       target-dir="/var/www/html"
                       options="rw,Z"/>
      <storage-mapping id="httpd-logs"
                       source-dir-root="/var/log/pacemaker/bundles"
                       target-dir="/etc/httpd/logs"
                       options="rw,Z"/>
   </storage>
   <primitive class="ocf" id="httpd" provider="heartbeat" type="apache"/>
</bundle>

11.3.1. Bundle Prerequisites

Before configuring a bundle in Pacemaker, the user must install the appropriate container launch technology (Docker, podman, or rkt), and supply a fully configured container image, on every node allowed to run the bundle.

Pacemaker will create an implicit resource of type ocf:heartbeat:docker, ocf:heartbeat:podman, or ocf:heartbeat:rkt to manage a bundle’s container. The user must ensure that the appropriate resource agent is installed on every node allowed to run the bundle.

11.3.2. Bundle Properties

XML Attributes of a bundle Element
Field Description
id

A unique name for the bundle (required)

description

An optional description of the group, for the user’s own purposes. E.g. manages the container that runs the service

A bundle must contain exactly one docker, podman, or rkt element.

11.3.3. Bundle Container Properties

XML attributes of a docker, podman, or rkt Element
Attribute Default Description
image  

Container image tag (required)

replicas Value of promoted-max if that is positive, else 1

A positive integer specifying the number of container instances to launch

replicas-per-host 1

A positive integer specifying the number of container instances allowed to run on a single node

promoted-max 0

A non-negative integer that, if positive, indicates that the containerized service should be treated as a promotable service, with this many replicas allowed to run the service in the promoted role

network  

If specified, this will be passed to the docker run, podman run, or rkt run command as the network setting for the container.

run-command /usr/sbin/pacemaker-remoted if bundle contains a primitive, otherwise none

This command will be run inside the container when launching it (“PID 1”). If the bundle contains a primitive, this command must start pacemaker-remoted (but could, for example, be a script that does other stuff, too).

options  

Extra command-line options to pass to the docker run, podman run, or rkt run command

Note

Considerations when using cluster configurations or container images from Pacemaker 1.1:

  • If the container image has a pre-2.0.0 version of Pacemaker, set run-command to /usr/sbin/pacemaker_remoted (note the underbar instead of dash).
  • masters is accepted as an alias for promoted-max, but is deprecated since 2.0.0, and support for it will be removed in a future version.

11.3.4. Bundle Network Properties

A bundle may optionally contain one <network> element.

XML attributes of a network Element
Attribute Default Description
add-host TRUE

If TRUE, and ip-range-start is used, Pacemaker will automatically ensure that /etc/hosts inside the containers has entries for each replica name and its assigned IP.

ip-range-start  

If specified, Pacemaker will create an implicit ocf:heartbeat:IPaddr2 resource for each container instance, starting with this IP address, using up to replicas sequential addresses. These addresses can be used from the host’s network to reach the service inside the container, though it is not visible within the container itself. Only IPv4 addresses are currently supported.

host-netmask 32

If ip-range-start is specified, the IP addresses are created with this CIDR netmask (as a number of bits).

host-interface  

If ip-range-start is specified, the IP addresses are created on this host interface (by default, it will be determined from the IP address).

control-port 3121

If the bundle contains a primitive, the cluster will use this integer TCP port for communication with Pacemaker Remote inside the container. Changing this is useful when the container is unable to listen on the default port, for example, when the container uses the host’s network rather than ip-range-start (in which case replicas-per-host must be 1), or when the bundle may run on a Pacemaker Remote node that is already listening on the default port. Any PCMK_remote_port environment variable set on the host or in the container is ignored for bundle connections.

Note

Replicas are named by the bundle id plus a dash and an integer counter starting with zero. For example, if a bundle named httpd-bundle has replicas=2, its containers will be named httpd-bundle-0 and httpd-bundle-1.

Additionally, a network element may optionally contain one or more port-mapping elements.

Attributes of a port-mapping Element
Attribute Default Description
id  

A unique name for the port mapping (required)

port  

If this is specified, connections to this TCP port number on the host network (on the container’s assigned IP address, if ip-range-start is specified) will be forwarded to the container network. Exactly one of port or range must be specified in a port-mapping.

internal-port value of port

If port and this are specified, connections to port on the host’s network will be forwarded to this port on the container network.

range  

If this is specified, connections to these TCP port numbers (expressed as first_port-last_port) on the host network (on the container’s assigned IP address, if ip-range-start is specified) will be forwarded to the same ports in the container network. Exactly one of port or range must be specified in a port-mapping.

Note

If the bundle contains a primitive, Pacemaker will automatically map the control-port, so it is not necessary to specify that port in a port-mapping.

11.3.5. Bundle Storage Properties

A bundle may optionally contain one storage element. A storage element has no properties of its own, but may contain one or more storage-mapping elements.

Attributes of a storage-mapping Element
Attribute Default Description
id  

A unique name for the storage mapping (required)

source-dir  

The absolute path on the host’s filesystem that will be mapped into the container. Exactly one of source-dir and source-dir-root must be specified in a storage-mapping.

source-dir-root  

The start of a path on the host’s filesystem that will be mapped into the container, using a different subdirectory on the host for each container instance. The subdirectory will be named the same as the replica name. Exactly one of source-dir and source-dir-root must be specified in a storage-mapping.

target-dir  

The path name within the container where the host storage will be mapped (required)

options  

A comma-separated list of file system mount options to use when mapping the storage

Note

Pacemaker does not define the behavior if the source directory does not already exist on the host. However, it is expected that the container technology and/or its resource agent will create the source directory in that case.

Note

If the bundle contains a primitive, Pacemaker will automatically map the equivalent of source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey and source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log into the container, so it is not necessary to specify those paths in a storage-mapping.

Important

The PCMK_authkey_location environment variable must not be set to anything other than the default of /etc/pacemaker/authkey on any node in the cluster.

Important

If SELinux is used in enforcing mode on the host, you must ensure the container is allowed to use any storage you mount into it. For Docker and podman bundles, adding “Z” to the mount options will create a container-specific label for the mount that allows the container access.

11.3.6. Bundle Primitive

A bundle may optionally contain one primitive resource. The primitive may have operations, instance attributes, and meta-attributes defined, as usual.

If a bundle contains a primitive resource, the container image must include the Pacemaker Remote daemon, and at least one of ip-range-start or control-port must be configured in the bundle. Pacemaker will create an implicit ocf:pacemaker:remote resource for the connection, launch Pacemaker Remote within the container, and monitor and manage the primitive resource via Pacemaker Remote.

If the bundle has more than one container instance (replica), the primitive resource will function as an implicit clone – a promotable clone if the bundle has promoted-max greater than zero.

Note

If you want to pass environment variables to a bundle’s Pacemaker Remote connection or primitive, you have two options:

  • Environment variables whose value is the same regardless of the underlying host may be set using the container element’s options attribute.
  • If you want variables to have host-specific values, you can use the storage-mapping element to map a file on the host as /etc/pacemaker/pcmk-init.env in the container (since 2.0.3). Pacemaker Remote will parse this file as a shell-like format, with variables set as NAME=VALUE, ignoring blank lines and comments starting with “#”.

Important

When a bundle has a primitive, Pacemaker on all cluster nodes must be able to contact Pacemaker Remote inside the bundle’s containers.

  • The containers must have an accessible network (for example, network should not be set to “none” with a primitive).
  • The default, using a distinct network space inside the container, works in combination with ip-range-start. Any firewall must allow access from all cluster nodes to the control-port on the container IPs.
  • If the container shares the host’s network space (for example, by setting network to “host”), a unique control-port should be specified for each bundle. Any firewall must allow access from all cluster nodes to the control-port on all cluster and remote node IPs.

11.3.7. Bundle Node Attributes

If the bundle has a primitive, the primitive’s resource agent may want to set node attributes such as promotion scores. However, with containers, it is not apparent which node should get the attribute.

If the container uses shared storage that is the same no matter which node the container is hosted on, then it is appropriate to use the promotion score on the bundle node itself.

On the other hand, if the container uses storage exported from the underlying host, then it may be more appropriate to use the promotion score on the underlying host.

Since this depends on the particular situation, the container-attribute-target resource meta-attribute allows the user to specify which approach to use. If it is set to host, then user-defined node attributes will be checked on the underlying host. If it is anything else, the local node (in this case the bundle node) is used as usual.

This only applies to user-defined attributes; the cluster will always check the local node for cluster-defined attributes such as #uname.

If container-attribute-target is host, the cluster will pass additional environment variables to the primitive’s resource agent that allow it to set node attributes appropriately: CRM_meta_container_attribute_target (identical to the meta-attribute value) and CRM_meta_physical_host (the name of the underlying host).

Note

When called by a resource agent, the attrd_updater and crm_attribute commands will automatically check those environment variables and set attributes appropriately.

11.3.8. Bundle Meta-Attributes

Any meta-attribute set on a bundle will be inherited by the bundle’s primitive and any resources implicitly created by Pacemaker for the bundle.

This includes options such as priority, target-role, and is-managed. See Resource Options for more information.

Bundles support clone meta-attributes including notify, ordered, and interleave.

11.3.9. Limitations of Bundles

Restarting pacemaker while a bundle is unmanaged or the cluster is in maintenance mode may cause the bundle to fail.

Bundles may not be explicitly cloned or included in groups. This includes the bundle’s primitive and any resources implicitly created by Pacemaker for the bundle. (If replicas is greater than 1, the bundle will behave like a clone implicitly.)

Bundles do not have instance attributes, utilization attributes, or operations, though a bundle’s primitive may have them.

A bundle with a primitive can run on a Pacemaker Remote node only if the bundle uses a distinct control-port.

[1]Of course, the service must support running multiple instances.
[2]Docker is a trademark of Docker, Inc. No endorsement by or association with Docker, Inc. is implied.