Product SiteDocumentation Site

B.3. How Does the Cluster Interpret the OCF Return Codes?

The first thing the cluster does is check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed and recovery action is initiated.
There are three types of failure recovery:
Table B.3. Types of recovery performed by the cluster
Recovery Type Description Action Taken by the Cluster
soft A transient error occurred Restart the resource or move it to a new location
hard A non-transient error that may be specific to the current node occurred Move the resource elsewhere and prevent it from being retried on the current node
fatal A non-transient error that will be common to all cluster nodes (I.e. a bad configuration was specified) Stop the resource and prevent it from being started on any cluster node

Assuming an action is considered to have failed, the following table outlines the different OCF return codes and the type of recovery the cluster will initiate when it is received.
Table B.4. OCF Return Codes and How They are Handled
OCF Return Code OCF Alias Description Recovery Type
0 OCF_SUCCESS Success. The command complete successfully. This is the expected result for all start, stop, promote and demote commands. soft
1 OCF_ERR_GENERIC Generic "there was a problem" error code. soft
2 OCF_ERR_ARGS The resource's configuration is not valid on this machine. Eg. Refers to a location/tool not found on the node. hard
3 OCF_ERR_UNIMPLEMENTED The requested action is not implemented. hard
4 OCF_ERR_PERM The resource agent does not have sufficient privileges to complete the task. hard
5 OCF_ERR_INSTALLED The tools required by the resource are not installed on this machine. hard
6 OCF_ERR_CONFIGURED The resource's configuration is invalid. Eg. A required parameters are missing. fatal
7 OCF_NOT_RUNNING The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action. N/A
8 OCF_RUNNING_MASTER The resource is running in Master mode. soft
9 OCF_FAILED_MASTER The resource is in Master mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again. soft
other NA Custom error code. soft

Although counter intuitive, even actions that return 0 (aka. OCF_SUCCESS) can be considered to have failed. This can happen when a resource that is expected to be in the Master state is found running as a Slave, or when a resource is found active on multiple machines..

B.3.1. Exceptions

  • Non-recurring monitor actions (probes) that find a resource active (or in Master mode) will not result in recovery action unless it is also found active elsewhere
  • The recovery action taken when a resource is found active more than once is determined by the multiple-active property of the resource
  • Recurring actions that return OCF_ERR_UNIMPLEMENTED do not cause any type of recovery