migration-threshold resource meta-attribute. [16]
migration-threshold=N for a resource, it will be banned from the original node after N failures.
Note
migration-threshold is per resource, even though fail counts are tracked per operation. The operation fail counts are added together to compare against the migration-threshold.
crm_resource --cleanup or crm_failcount --delete (hopefully after first fixing the failure’s cause). It is possible to have fail counts expire automatically by setting the failure-timeout resource meta-attribute.
Important
migration-threshold=2 and failure-timeout=60s would cause the resource to move to a new node after 2 failures, and allow it to move back (depending on stickiness and constraint scores) after one minute.
Note
failure-timeout is measured since the most recent failure. That is, older failures do not individually time out and lower the fail count. Instead, all failures are timed out simultaneously (and the fail count is reset to 0) if there is no new failure for the timeout period.
start-failure-is-fatal is set to true (which is the default), start failures cause the fail count to be set to INFINITY and thus always cause the resource to move immediately.
Important
failure-timeout.