Product SiteDocumentation Site

4.5.2. Monitoring Resources for Failure

When Pacemaker first starts a resource, it runs one-time monitor operations (referred to as probes) to ensure the resource is running where it’s supposed to be, and not running where it’s not supposed to be. (This behavior can be affected by the resource-discovery location constraint property.)
Other than those initial probes, Pacemaker will not (by default) check that the resource continues to stay healthy. [10] You must configure monitor operations explicitly to perform these checks.

Example 4.7. An OCF resource with a recurring health check

<primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
  <operations>
     <op id="Public-IP-start" name="start" timeout="60s"/>
     <op id="Public-IP-monitor" name="monitor" interval="60s"/>
  </operations>
  <instance_attributes id="params-public-ip">
     <nvpair id="public-ip-addr" name="ip" value="192.0.2.2"/>
  </instance_attributes>
</primitive>
By default, a monitor operation will ensure that the resource is running where it is supposed to. The target-role property can be used for further checking.
For example, if a resource has one monitor operation with interval=10 role=Started and a second monitor operation with interval=11 role=Stopped, the cluster will run the first monitor on any nodes it thinks should be running the resource, and the second monitor on any nodes that it thinks should not be running the resource (for the truly paranoid, who want to know when an administrator manually starts a service by mistake).

Note

Currently, monitors with role=Stopped are not implemented for clone resources.


[10] Currently, anyway. Automatic monitoring operations may be added in a future version of Pacemaker.