Appendix A: Details of How Quorum Works in a Failover Cluster

Applies To: Windows Server 2008, Windows Server 2008 R2

This appendix supplements the information in Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster, which we recommend that you read first.

In this appendix

Notes on the concept of quorum in Windows Server 2008 or Windows Server 2008 R2

Notes on quorum configuration in Windows Server 2008 or Windows Server 2008 R2

The process of achieving quorum

Notes on the concept of quorum in Windows Server 2008 or Windows Server 2008 R2

Quorum is not a new concept for clustering in Windows Server products, but the implementation, and thus the behavior, was new in Windows Server 2008. The new quorum model can be adapted more to the high availability characteristics requested by the system administrator to support applications, and is less tightly coupled to the way cluster hardware is hooked together. Furthermore, the new quorum model has eliminated the single point of failure that existed in previous clustering releases.

If you choose Node and Disk Majority for the quorum mode, when you select the disk witness, a \Cluster folder is created at the root of the selected disk, and cluster configuration information is stored there. The same information is also stored on each node.

  • \Cluster folder contains the cluster registry hive

  • No more checkpoint files or quorum log files

The three main reasons why quorum is important are to ensure consistency, act as a tie-breaker to avoid partitioning, and to ensure cluster responsiveness.

  • Because the basic idea of a cluster is multiple physical servers acting as a single logical server, a primary requirement for a cluster is that each of the physical servers always has a view of the cluster that is consistent with the other servers. The cluster hive acts as the definitive repository for all configuration information relating to the cluster. In the event that the cluster hive cannot be loaded locally on a node, the Cluster service does not start, because it is not able to guarantee that the physical server meets the requirement of having a view of the cluster that is consistent with the other servers.

  • A witness resource is used as the tie-breaker to avoid “split” scenarios and to ensure that one, and only one, collection of the members in a distributed system is considered “official.” A split scenario happens when all of the network communication links between two or more cluster nodes fail. In these cases, the cluster may be split into two or more partitions that cannot communicate with each other. Having only one official membership prevents unsynchronized access to data by other partitions (unsynchronized access can cause data corruption). Likewise, having only one official membership prevents clustered services or applications being brought online by two different nodes: only a node in the collection of nodes that has achieved quorum can bring the clustered service or application online.

  • To ensure responsiveness, the quorum model ensures that whenever the cluster is running, enough members of the distributed system are operational and communicative, and at least one replica of current state can be guaranteed. This means that no additional time is required to bring members into communication or to determine whether a given replica is guaranteed.

Notes on quorum configuration in Windows Server 2008 or Windows Server 2008 R2

The follow notes apply to quorum configuration in Windows Server 2008 or Windows Server 2008 R2:

  • It is a good idea to review the quorum mode after the cluster is created, before placing the cluster into production. The cluster software selects the quorum mode for a new cluster, based on the number of nodes, and this is usually the most appropriate quorum mode for that cluster.

  • After the cluster is in production, do not change the quorum configuration unless you have determined that the change is appropriate for your cluster. However, if you decide to change the quorum configuration and have confirmed that the new configuration will be appropriate, you can make the change without stopping the cluster.

  • When nodes are waiting for other members to appear, the Cluster service still shows as started in the Services Control Manager. This is different behavior than in Windows Server 2003.

  • If the Cluster service shuts down because quorum has been lost, Event ID 1177 will appear in the system log.

  • A cluster can be forced into service when it does not have majority by starting the Cluster service using the net start clussvc command with the /forcequorum option, as described in “Troubleshooting: how to force a cluster to start without quorum” in Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster.

The process of achieving quorum

Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster software on each node stores information about how many "votes" constitutes a quorum for that cluster. If the number drops below the majority, the cluster stops providing services. Nodes will continue listening for incoming connections from other nodes on port 3343, in case they appear again on the network, but the nodes will not begin to function as a cluster until quorum is achieved.

There are several phases a cluster must go through in order to achieve quorum. At a high level, they are:

  1. As a given node comes up, it determines if there are other cluster members that can be communicated with (this process may be in progress on multiple nodes simultaneously).

  2. Once communication is established with other members, the members compare their membership “views” of the cluster until they agree on one view (based on timestamps and other information).

  3. A determination is made as to whether this collection of members “has quorum,” or in other words, has enough members that a “split” scenario cannot exist. A “split” scenario would mean that another set of nodes that are in this cluster was running on a part of the network not accessible to these nodes.

  4. If there are not enough votes to achieve quorum, then the voters wait for more members to appear. If there are enough votes present, the Cluster service begins to bring cluster resources and applications into service.

  5. With quorum attained, the cluster becomes fully functional.