Multi-site clusters can be considered as “overlay” clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster can be managed by a CTR in order to guarantee that any cluster resource will be active on no more than one cluster site. This is achieved by using tickets that are treated as failover domain between cluster sites, in case a site should be down.
The following sections explain the individual components and mechanisms that were introduced for multi-site clusters in more detail.
Tickets are, essentially, cluster-wide attributes. A ticket grants the right to run certain resources on a specific cluster site. Resources can be bound to a certain ticket by rsc_ticket
constraints. Only if the ticket is available at a site can the respective resources be started there. Vice versa, if the ticket is revoked, the resources depending on that ticket must be stopped.
The ticket thus is similar to a site quorum, i.e. the permission to manage/own resources associated with that site. (One can also think of the current have-quorum
flag as a special, cluster-wide ticket that is granted in case of node majority.)
Tickets can be granted and revoked either manually by administrators (which could be the default for classic enterprise clusters), or via the automated CTR mechanism described below.
A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket. Each ticket must be granted once by the cluster administrator.
The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regards to a certain ticket, there are only two states for a site: true
(the site has the ticket) or false
(the site does not have the ticket). The absence of a certain ticket (during the initial state of the multi-site cluster) is the same as the value false
.