12. Resource Agents

12.1. Action Completion

If one resource depends on another resource via constraints, the cluster will interpret an expected result as sufficient to continue with dependent actions. This may cause timing issues if the resource agent start returns before the service is not only launched but fully ready to perform its function, or if the resource agent stop returns before the service has fully released all its claims on system resources. At a minimum, the start or stop should not return before a status command would return the expected (started or stopped) result.

12.2. OCF Resource Agents

12.2.1. Location of Custom Scripts

OCF Resource Agents are found in /usr/lib/ocf/resource.d/$PROVIDER

When creating your own agents, you are encouraged to create a new directory under /usr/lib/ocf/resource.d/ so that they are not confused with (or overwritten by) the agents shipped by existing providers.

So, for example, if you choose the provider name of big-corp and want a new resource named big-app, you would create a resource agent called /usr/lib/ocf/resource.d/big-corp/big-app and define a resource:

12.2.2. Actions

All OCF resource agents are required to implement the following actions.

Required Actions for OCF Agents
Action Description Instructions

start

Start the resource Return OCF_SUCCESS on success and an appropriate error code otherwise. Must not report success until the resource is fully active.

stop

Stop the resource Return OCF_SUCCESS on success and an appropriate error code otherwise. Must not report success until the resource is fully stopped.

monitor

Check the resource’s state Return OCF_SUCCESS if the resource is running, OCF_NOT_RUNNING if it is stopped, and any other OCF exit code if it is failed. Note: The monitor action should test the state of the resource on the local machine only.

meta-data

Describe the resource Provide information about this resource in the XML format defined by the OCF standard. Return OCF_SUCCESS. Note: This is not required to be performed as root.

OCF resource agents may optionally implement additional actions. Some are used only with advanced resource types such as clones.

Optional Actions for OCF Resource Agents
Action Description Instructions

validate-all

Validate the instance parameters provided. Return OCF_SUCCESS if parameters are valid, OCF_ERR_ARGS if not valid, and OCF_ERR_CONFIGURED if resource is not configured.

promote

Bring the local instance of a promotable clone resource to the promoted role. Return OCF_SUCCESS on success.

demote

Bring the local instance of a promotable clone resource to the unpromoted role. Return OCF_SUCCESS on success.

notify

Used by the cluster to send the agent pre- and post-notification events telling the resource what has happened and what will happen. Must not fail. Must return OCF_SUCCESS.

reload

Reload the service’s own configuration. Not used by Pacemaker.

reload-agent

Make effective any changes in instance parameters marked as reloadable in the agent’s meta-data. This is used when the agent can handle a change in some of its parameters more efficiently than stopping and starting the resource.

recover

Restart the service. Not used by Pacemaker.

Important

If you create a new OCF resource agent, use ocf-tester to verify that the agent complies with the OCF standard properly.

12.2.3. How Are OCF Return Codes Interpreted?

The first thing the cluster does is to check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed, and recovery action is initiated.

There are three types of failure recovery:

Types of Recovery Performed by the Cluster
Type Description Action Taken by the Cluster

soft

A transient error Restart the resource or move it to a new location

hard

A non-transient error that may be specific to the current node Move the resource elsewhere and prevent it from being retried on the current node

fatal

A non-transient error that will be common to all cluster nodes (for example, a bad configuration was specified) Stop the resource and prevent it from being started on any cluster node

12.2.4. OCF Return Codes

The following table outlines the various OCF return codes and the type of recovery the cluster will initiate when a failure code is received. Although counterintuitive, even actions that return OCF_SUCCESS can be considered to have failed, if OCF_SUCCESS was not the expected return value.

OCF Exit Codes and Their Recovery Types
Exit Code OCF Alias Description Recovery

0

OCF_SUCCESS Success. The command completed successfully. This is the expected result for all start, stop, promote, and demote actions. soft

1

OCF_ERR_GENERIC Generic “there was a problem” error code. hard

2

OCF_ERR_ARGS The resource’s parameter values are not valid on this machine (for example, a value refers to a file not found on the local host). hard

3

OCF_ERR_UNIMPLEMENTED The requested action is not implemented. hard

4

OCF_ERR_PERM The resource agent does not have sufficient privileges to complete the task. hard

5

OCF_ERR_INSTALLED The tools required by the resource are not installed on this machine. hard

6

OCF_ERR_CONFIGURED The resource’s parameter values are inherently invalid (for example, a required parameter was not given). fatal

7

OCF_NOT_RUNNING The resource is safely stopped. This should only be returned by monitor actions, not stop actions. N/A

8

OCF_RUNNING_PROMOTED The resource is running in the promoted role. soft

9

OCF_FAILED_PROMOTED The resource is (or might be) in the promoted role but has failed. The resource will be demoted, stopped, and then started (and possibly promoted) again. soft

190

OCF_DEGRADED The resource is properly active, but in such a condition that future failures are more likely. none

191

OCF_DEGRADED_PROMOTED The resource is properly active in the promoted role, but in such a condition that future failures are more likely. none
other none Custom error code. soft

Exceptions to the recovery handling described above:

  • Probes (non-recurring monitor actions) that find a resource active (or in the promoted role) will not result in recovery action unless it is also found active elsewhere.
  • The recovery action taken when a resource is found active more than once is determined by the resource’s multiple-active property.
  • Recurring actions that return OCF_ERR_UNIMPLEMENTED do not cause any type of recovery.
  • Actions that return one of the “degraded” codes will be treated the same as if they had returned success, but status output will indicate that the resource is degraded.

12.2.5. Environment Variables

Pacemaker sets certain environment variables when it executes an OCF resource agent. Agents can check these variables to get information about resource parameters or the execution environment.

Note: Pacemaker may set other environment variables for its own purposes. They may be present in the agent’s environment, but Pacemaker is not providing them for the agent’s use, and so the agent should not rely on any variables not listed in the table below.

OCF Environment Variables
Environment Variable Description

OCF_CHECK_LEVEL

Requested intensity level of checks in monitor and validate-all actions. Usually set as an operation attribute; see Pacemaker Explained for an example.

OCF_EXIT_REASON_PREFIX

Prefix for printing fatal error messages from the resource agent.

OCF_RA_VERSION_MAJOR

Major version number of the OCF Resource Agent API. If the script does not support this revision, it should report an error. See the OCF specification for an explanation of the versioning scheme used. The version number is split into two numbers for ease of use in shell scripts. These two may be used by the agent to determine whether it is run under an OCF-compliant resource manager.

OCF_RA_VERSION_MINOR

Minor version number of the OCF Resource Agent API. See OCF_RA_VERSION_MAJOR for more details.

OCF_RESKEY_crm_feature_set

crm_feature_set on the DC (or on the local node, if the agent is run by crm_resource).

OCF_RESKEY_CRM_meta_interval

Interval (in milliseconds) of the current operation.

OCF_RESKEY_CRM_meta_name

Name of the current operation.

OCF_RESKEY_CRM_meta_notify_*

See Clone Notifications.

OCF_RESKEY_CRM_meta_on_node

Name of the node where the current operation is running.

OCF_RESKEY_CRM_meta_on_node_uuid

Cluster-layer ID of the node where the current operation is running (or node name for Pacemaker Remote nodes).

OCF_RESKEY_CRM_meta_physical_host

If the node where the current operation is running is a guest node, the host on which the container is running.

OCF_RESKEY_CRM_meta_timeout

Timeout (in milliseconds) of the current operation.

OCF_RESKEY_CRM_meta_*

Each of a resource’s meta-attributes is converted to an environment variable prefixed with “OCF_RESKEY_CRM_meta_”. See Pacemaker Explained for some meta-attributes that have special meaning to Pacemaker.

OCF_RESKEY_*

Each of a resource’s instance parameters is converted to an environment variable prefixed with “OCF_RESKEY_”.

OCF_RESOURCE_INSTANCE

The name of the resource instance.

OCF_RESOURCE_PROVIDER

The name of the resource agent provider.

OCF_RESOURCE_TYPE

The name of the resource type.

OCF_ROOT

The root of the OCF directory hierarchy.

OCF_TRACE_FILE

The absolute path or file descriptor to write trace output to, if OCF_TRACE_RA is set to true. Pacemaker sets this only to /dev/stderr and only when running a resource agent via crm_resource.

OCF_TRACE_RA

If set to true, enable tracing of the resource agent. Trace output is written to OCF_TRACE_FILE if set; otherwise, it’s written to a file in OCF_RESKEY_trace_dir if set or in a default directory if not. Pacemaker sets this to true only when running a resource agent via crm_resource with one or more -V flags.

PCMK_DEBUGLOG (and HA_DEBUGLOG)

Where to write resource agent debug logs. Pacemaker sets this to PCMK_logfile if set to a value other than none and if debugging is enabled for the executor.

PCMK_LOGFACILITY (and HA_LOGFACILITY)

Syslog facility for resource agent logs. Pacemaker sets this to PCMK_logfacility if set to a value other than none or /dev/null.

PCMK_LOGFILE (and HA_LOGFILE)

Where to write resource agent logs. Pacemaker sets this to PCMK_logfile if set to a value other than none.

PCMK_service

The name of the Pacemaker subsystem or command-line tool that’s executing the resource agent. Specific values are subject to change; useful mainly for logging.

12.2.6. Clone Resource Agent Requirements

Any resource can be used as an anonymous clone, as it requires no additional support from the resource agent. Whether it makes sense to do so depends on your resource and its resource agent.

12.2.6.1. Resource Agent Requirements for Globally Unique Clones

Globally unique clones require additional support in the resource agent. In particular, it must respond with OCF_SUCCESS only if the node has that exact instance active. All other probes for instances of the clone should result in OCF_NOT_RUNNING (or one of the other OCF error codes if they are failed).

Individual instances of a clone are identified by appending a colon and a numerical offset (for example, apache:2).

A resource agent can find out how many copies there are by examining the OCF_RESKEY_CRM_meta_clone_max environment variable and which instance it is by examining OCF_RESKEY_CRM_meta_clone.

The resource agent must not make any assumptions (based on OCF_RESKEY_CRM_meta_clone) about which numerical instances are active. In particular, the list of active copies is not always an unbroken sequence, nor does it always start at 0.

12.2.6.2. Resource Agent Requirements for Promotable Clones

Promotable clone resources require two extra actions, demote and promote, which are responsible for changing the state of the resource. Like start and stop, they should return OCF_SUCCESS if they completed successfully or a relevant error code if they did not.

The states can mean whatever you wish, but when the resource is started, it must begin in the unpromoted role. From there, the cluster will decide which instances to promote.

In addition to the clone requirements for monitor actions, agents must also accurately report which state they are in. The cluster relies on the agent to report its status (including role) accurately and does not indicate to the agent what role it currently believes it to be in.

Role Implications of OCF Return Codes
Monitor Return Code Description
OCF_NOT_RUNNING

Stopped

OCF_SUCCESS

Running (Unpromoted)

OCF_RUNNING_PROMOTED

Running (Promoted)

OCF_FAILED_PROMOTED

Failed (Promoted)

Other Failed (Unpromoted)

12.2.6.3. Clone Notifications

If the clone has the notify meta-attribute set to true and the resource agent supports the notify action, Pacemaker will call the action when appropriate, passing a number of extra variables. These variables, when combined with additional context, can be used to calculate the current state of the cluster and what is about to happen to it.

Environment Variables Supplied with Clone Notify Actions
Variable Description

OCF_RESKEY_CRM_meta_notify_type

Allowed values: pre, post

OCF_RESKEY_CRM_meta_notify_operation

Allowed values: start, stop

OCF_RESKEY_CRM_meta_notify_start_resource

Resources to be started

OCF_RESKEY_CRM_meta_notify_stop_resource

Resources to be stopped

OCF_RESKEY_CRM_meta_notify_active_resource

Resources that are running

OCF_RESKEY_CRM_meta_notify_inactive_resource

Resources that are not running

OCF_RESKEY_CRM_meta_notify_start_uname

Nodes on which resources will be started

OCF_RESKEY_CRM_meta_notify_stop_uname

Nodes on which resources will be stopped

OCF_RESKEY_CRM_meta_notify_active_uname

Nodes on which resources are running

The variables come in pairs, such as OCF_RESKEY_CRM_meta_notify_start_resource and OCF_RESKEY_CRM_meta_notify_start_uname, and should be treated as an array of whitespace-separated elements.

OCF_RESKEY_CRM_meta_notify_inactive_resource is an exception, as the matching uname variable does not exist since inactive resources are not running on any node.

Thus, in order to indicate that clone:0 will be started on sles-1, clone:2 will be started on sles-3, and clone:3 will be started on sles-2, the cluster would set:

Notification Variables

OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"

Note

Pacemaker will log but otherwise ignore failures of notify actions.

12.2.6.4. Interpretation of Notification Variables

Pre-notification (stop):

  • Active resources: $OCF_RESKEY_CRM_meta_notify_active_resource
  • Inactive resources: $OCF_RESKEY_CRM_meta_notify_inactive_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

Post-notification (stop) / Pre-notification (start):

  • Active resources
    • $OCF_RESKEY_CRM_meta_notify_active_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Inactive resources
    • $OCF_RESKEY_CRM_meta_notify_inactive_resource
    • plus $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources that were started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources that were stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

Post-notification (start):

  • Active resources:
    • $OCF_RESKEY_CRM_meta_notify_active_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • plus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Inactive resources:
    • $OCF_RESKEY_CRM_meta_notify_inactive_resource
    • plus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • minus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources that were started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources that were stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

12.2.6.5. Extra Notifications for Promotable Clones

Extra Environment Variables Supplied for Promotable Clones
Variable Description

OCF_RESKEY_CRM_meta_notify_promoted_resource

Resources that are running in the promoted role

OCF_RESKEY_CRM_meta_notify_unpromoted_resource

Resources that are running in the unpromoted role

OCF_RESKEY_CRM_meta_notify_promote_resource

Resources to be promoted

OCF_RESKEY_CRM_meta_notify_demote_resource

Resources to be demoted

OCF_RESKEY_CRM_meta_notify_promote_uname

Nodes on which resources will be promoted

OCF_RESKEY_CRM_meta_notify_demote_uname

Nodes on which resources will be demoted

OCF_RESKEY_CRM_meta_notify_promoted_uname

Nodes on which resources are running in the promoted role

OCF_RESKEY_CRM_meta_notify_unpromoted_uname

Nodes on which resources are running in the unpromoted role

12.2.6.6. Interpretation of Promotable Notification Variables

Pre-notification (demote):

  • Active resources: $OCF_RESKEY_CRM_meta_notify_active_resource
  • Promoted resources: $OCF_RESKEY_CRM_meta_notify_promoted_resource
  • Unpromoted resources: $OCF_RESKEY_CRM_meta_notify_unpromoted_resource
  • Inactive resources: $OCF_RESKEY_CRM_meta_notify_inactive_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources to be demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

Post-notification (demote) / Pre-notification (stop):

  • Active resources: $OCF_RESKEY_CRM_meta_notify_active_resource
  • Promoted resources:
    • $OCF_RESKEY_CRM_meta_notify_promoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Unpromoted resources: $OCF_RESKEY_CRM_meta_notify_unpromoted_resource
  • Inactive resources: $OCF_RESKEY_CRM_meta_notify_inactive_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources to be demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources that were demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource

Post-notification (stop) / Pre-notification (start)

  • Active resources:
    • $OCF_RESKEY_CRM_meta_notify_active_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Promoted resources:
    • $OCF_RESKEY_CRM_meta_notify_promoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Unpromoted resources:
    • $OCF_RESKEY_CRM_meta_notify_unpromoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Inactive resources:
    • $OCF_RESKEY_CRM_meta_notify_inactive_resource
    • plus $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources to be demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources that were demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources that were stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

Post-notification (start) / Pre-notification (promote)

  • Active resources:
    • $OCF_RESKEY_CRM_meta_notify_active_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • plus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Promoted resources:
    • $OCF_RESKEY_CRM_meta_notify_promoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Unpromoted resources:
    • $OCF_RESKEY_CRM_meta_notify_unpromoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • plus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Inactive resources:
    • $OCF_RESKEY_CRM_meta_notify_inactive_resource
    • plus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • minus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources to be demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources that were started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources that were demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources that were stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

Post-notification (promote)

  • Active resources:
    • $OCF_RESKEY_CRM_meta_notify_active_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • plus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Promoted resources:
    • $OCF_RESKEY_CRM_meta_notify_promoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_demote_resource
    • plus $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Unpromoted resources:
    • $OCF_RESKEY_CRM_meta_notify_unpromoted_resource
    • minus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • plus $OCF_RESKEY_CRM_meta_notify_start_resource
    • minus $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Inactive resources:
    • $OCF_RESKEY_CRM_meta_notify_inactive_resource
    • plus $OCF_RESKEY_CRM_meta_notify_stop_resource
    • minus $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources to be promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources to be demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources to be stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource
  • Resources that were started: $OCF_RESKEY_CRM_meta_notify_start_resource
  • Resources that were promoted: $OCF_RESKEY_CRM_meta_notify_promote_resource
  • Resources that were demoted: $OCF_RESKEY_CRM_meta_notify_demote_resource
  • Resources that were stopped: $OCF_RESKEY_CRM_meta_notify_stop_resource

12.3. LSB Resource Agents (Init Scripts)

12.3.1. LSB Compliance

The relevant part of the LSB specifications includes a description of all the return codes listed here.

Assuming some_service is configured correctly and currently inactive, the following sequence will help you determine if it is LSB-compatible:

  1. Start (stopped):

    # /etc/init.d/some_service start ; echo "result: $?"
    
    • Did the service start?
    • Did the echo command print result: 0 (in addition to the init script’s usual output)?
  2. Status (running):

    # /etc/init.d/some_service status ; echo "result: $?"
    
    • Did the script accept the command?
    • Did the script indicate the service was running?
    • Did the echo command print result: 0 (in addition to the init script’s usual output)?
  3. Start (running):

    # /etc/init.d/some_service start ; echo "result: $?"
    
    • Is the service still running?
    • Did the echo command print result: 0 (in addition to the init
      script’s usual output)?
  4. Stop (running):

    # /etc/init.d/some_service stop ; echo "result: $?"
    
    • Was the service stopped?
    • Did the echo command print result: 0 (in addition to the init script’s usual output)?
  5. Status (stopped):

    # /etc/init.d/some_service status ; echo "result: $?"
    
    • Did the script accept the command?
    • Did the script indicate the service was not running?
    • Did the echo command print result: 3 (in addition to the init script’s usual output)?
  6. Stop (stopped):

    # /etc/init.d/some_service stop ; echo "result: $?"
    
    • Is the service still stopped?
    • Did the echo command print result: 0 (in addition to the init script’s usual output)?
  7. Status (failed):

    This step is not readily testable and relies on manual inspection of the script.

    The script can use one of the error codes (other than 3) listed in the LSB spec to indicate that it is active but failed. This tells the cluster that before moving the resource to another node, it needs to stop it on the existing one first.

If the answer to any of the above questions is no, then the script is not LSB-compliant. Your options are then to either fix the script or write an OCF agent based on the existing script.