SystemHealth
From Cluster Labs
System health is a feature which allows resources in a system to implicitly include a score which indicates the health of a node.
This feature is implemented in two parts. The first part consists of a change in pacemaker. The second part consists of health daemons setting health attributes.
Changes in Pacemaker
Pacemaker's policy engine will include a number of configuration entries. The first is node-health-strategy. The possible values for this key are:
- none
- migrate-on-red
- only-green
- progressive
- custom
none is the default value. This setting will have no effect on weight calculations within Pacemaker.
The next three values (migrate-on-red, only-green, and progressive) will have the following effect on weight calculations within Pacemaker. Every resource which is defined within Pacemaker will now search for attributes in a node that start with #health. Examples would include #health, #health-ipmi, #health-smart, #health-foo-bar, et cetera. An attribute can have the following values:
- red
- yellow
- green
- integer value
Each attribute in a node starting with #health will be summed up with whatever other weights that are defined for resources in the system. The weights will determine on which node a resource will run.
Now the differences between migrate-on-red, only-green, and progressive are as follows:
- migrate-on-red - red will have a value of -INF, yellow and green will have values of 0.
- only-green - red and yellow will have values of -INF, green will have a value of 0.
- progressive - red, yellow, and green will take their values from the corresponding policy engine settings:
- node-health-red (Note: the default is -INF)
- node-health-yellow (Note: the default is 0)
- node-health-green (Note: the default is 0)
custom indicates to Pacemaker that the system administrator will define rules to include whichever health attributes that they deem appropriate for their setup.
Health Daemons
A health daemon is a program that will periodically query or listen to events about the health status of a system. When it detects changes in the health, it will notify Pacemaker via the attrd_updater command.
Some mechanisms which report the status about the health of a system include:
- IPMI (Intelligent Platform Management Interface) http://www.intel.com/design/servers/ipmi/ipmi.htm
- iBMC (Integrated Baseboard Management Controller)
- /var/log/mcelog
- /var/log/messages
- RSA2 (Remote Supervisor Adapter 2)
- sysfs (Linux kernel filesystem)
- SMART (Self-Monitoring, Analysis, and Reporting Technology)

