Microsoft Cluster Service

Article
12/09/2009

This module discusses the Microsoft® Cluster Service built into the Windows® 2000 Server operating system. It outlines the features and benefits of the Cluster Service, then briefly addresses deployment.

The Cluster Service addresses the issue of server availability. Called Microsoft Cluster Server in Windows NT®, this technology was renamed to Cluster Service in Windows 2000 to signify that it's just one of the services running on Windows 2000.

This diagram illustrates the three clustering technologies available in Windows 2000:

Network Load Balancing, which can be used to build front-end clusters that distribute IP traffic across the network load balancing cluster.
Component Load Balancing, can be used to build middle-tier clusters to load balance COM+ components. Microsoft is planning to include Component Load Balancing in Windows 2000 Datacenter Server when it ships, which is scheduled for the middle of 2000.
Cluster Service, which can be used to build back-end clusters to provide high validity for your data, mail store, files, and so on.

This diagram illustrates the architecture of a typical cluster deployment.

These clustering technologies are already in use—a lot of ISPs deployed clusters to improve the availability of their hosting and to get the most out of their network capacity.

Of course, clustering by itself won't deliver high availability. You have to think about the whole picture and how your environment, practices, hardware, and other factors affect the availability of your systems. As part of that big picture, this module talks about the middle circle of the diagram—high availability—and the operating system features that provide fail-over capabilities for applications and services.

With Cluster Service you can build back-end clusters with two nodes (four nodes with Windows 2000 Datacenter Server) to provide fail-over capabilities for applications.

You may be running applications on all the nodes in the cluster, with one instance of an application running on only one node in what is known as the shared nothing mode. If any of the components fails, it can cause the application or an entire node to fail. In these cases, Cluster Service detects these failures and then will either restart an application on the same node, if it's possible, or will migrate the application to another node and restart it there.

Node failure is detected through the use of the heartbeat method. All nodes in the class exchange heartbeats; when a node misses five heartbeats, the other nodes assume that it failed. They respond by going into the regroup process, in which they calculate the membership in the cluster of the node that cannot be reached, kick out the cluster, and move the node's resources to other surviving nodes.

The heartbeat method is also used for applications. Cluster Service, unlike Network Load Balancing, can monitor and detect application failures as well as node failures. And when Cluster Service detects that an application has failed, it will try to restart the application on the same node, depending on the local restart policy defined for this resource. If it fails to restart the application on the same node, it may decide to fail over this resource and all the resources it depends on to another node.

The goal is to maintain the clients' ability to access back-end data, whether it's a mail store, file shares, print spoolers, or similar resources.

This diagram illustrates how the process works.

In this typical example of a two-node cluster, both the nodes in the cluster are attached to one or more shared storage buses. (All nodes in the cluster have physical access to storage devices, but only one node in the class can physically read and write to the disk. All the other nodes are blocked from accessing the disks. This exclusion helps guarantee that no two nodes act as the same file system on any of the disks.) Private interconnectivity between the nodes is used for heartbeating and cluster management. And one or more public networks connect the cluster nodes to clients.

The two nodes exchange heartbeats over the private network. If the private network were to fail, they would exchange heartbeats over the public network. Microsoft recommends using multiple private networks because doing so improves performance and also eliminates the situation in which there is a single point of failure.

When one node fails, the other node detects the failure and then goes into arbitration to see whether it can take over the disk, which guarantees that only one host can access the disk.

Windows 2000 Datacenter Server is being designed to support four nodes. This model follows the same concept as the two-node model: a fail-over cluster, shared nothing mode, with all nodes connected to the same storage bus. Windows 2000 Datacenter Server, however, will support only Fibre Channel; if you plan to deploy it, you should use only Fibre Channel.

Similar to the two-node model, you can have one or more private networks and one or more public networks. Each node can own any of the disks, and when this node fails, the other node can pick up the disks and restart applications that use data on these disks.

Because disks can become a single point of failure in this configuration, you should use a rights storage system. Windows 2000 doesn't support software rights, so you have to use a special hardware resource, called a quorum resource, to guarantee that only one node can own a disk at any time.

Microsoft recommends that you configure one volume between 5 and 500 MB to be the dedicated quorum resource. This quorum resource has several purposes.

Cluster Configuration Repository

First, it stores the cluster configuration database—that is, where Cluster Service persists in its configuration. No matter which node from the cluster fails, the cluster can always restore its latest state from the quorum resource.

Tiebreaker for Split-Brain Situations

It's also used as a tiebreaker when nodes can no longer communicate (that is, are "split-brain"). When it cannot communicate with the nodes, Cluster Service cannot really detect the problem: It's possible that the nodes are dead, but it may also be possible that just the communication links are. In this situation, to prevent each node from thinking that it is the sole survivor and bringing your database online, they go into arbitration, using the quorum resource.

The node that owns the quorum resource puts a reservation on the device every three seconds; this guarantees that the second node cannot write to the quorum resource. When the second node determines that it cannot communicate with the quorum-owning node and wants to grab the quorum, it first puts a reset on the bus.

The reset breaks the reservation, waits for about 10 seconds to give the first node time to renew its reservation at least twice, and then tries to put a reservation on the quorum for the second node. If the second node's reservation succeeds, it means that the first node failed to renew the reservation. And the only reason for the failure to renew is because the node is dead. At this point, the second node can take over the quorum resource and restart all the resources.

Cluster Data Consistency

Partitioning time is another situation that the quorum resource addresses. Normally, when both nodes are working, they communicate with each other and replicate any configuration changes, so that at any given time all nodes in the class maintain the same state.

However, if one node is shut down (as in the case of power failure) and you update the other node's configuration, the node that's down has no way to know the state because it cannot participate in this global update. And while you cannot bring up the node that has the most recent state, you can bring up the node that was brought down first. But because this node has stale data, it won't be able to restore the most recent state.

This is when the quorum resource kicks in. The node that has the stale data will acquire the quorum resource and replace its own configuration database with the copy on the quorum, which is always up to date.

The diagrams on the pages that follow illustrate the steps involved in the Cluster Service failover mechanism.

There are a number of new features in the Windows 2000 Cluster Service:

It offers fail-over support for applications, services, logical resources (such as IP addresses, network names, file shares, and applications and services), and physical resources (such as disks, and modem pools).
You can configure dependencies. Resources, the basic unit that can be managed by the cluster, can be brought online or offline, can be stopped, and can be controlled by the cluster. What's more, resources can depend on each other, another feature that distinguishes Cluster Service from Network Load Balancing and from clustering in Windows NT.

Using the Service Control Monitor in Windows 2000, you can configure dependencies, but only between services. You cannot configure a dependency of a service on some other resource in the system. With Cluster Service, you can specify, for example, that a server running Microsoft SQL ServerTM depend on a certain IP outlet and network name. And you can specify that an application depend on SQL Server. As a result, when you bring resources online or offline, all the resources will be brought online or offline in the right order.
Cluster Service setup is fully integrated into Windows 2000 Setup. It's part of the Optional Component Manager, so you can install Cluster Service during the configuration of Windows 2000. You simply have to check the Cluster Service check box. The configuration is much more streamlined.
You can also create an answer file for Windows 2000 Setup, and then script an entire Windows 2000 and Cluster Service setup.
Support for SYSPREP allows you to configure a server, take a snapshot of the system disk, and then clone this image to other servers.
One of the most important features of Windows 2000 is support for rolling upgrades. On average, upgrading from Windows NT 4.0 to Windows 2000 takes an hour. Without rolling upgrades, you would have to shut down an entire cluster for an hour to do an upgrade; with rolling upgrades, you can limit downtime to a few minutes.
More services are supported in Windows 2000, such as Distributed File System (Dfs), Simple Mail Transfer Protocol (SMTP), and Network News Transfer Protocol (NNTP). Windows 2000 also features Microsoft Internet Information Service (IIS), the Web server built into Windows 2000.
Client network recovery has been improved in Windows 2000. In Windows NT 4.0, when any link connecting a cluster node to the public network failed, the cluster services couldn't detect it. In Windows 2000, on the other hand, all nodes in the cluster build a list of well-known third-party devices such as routers and gateways. All the nodes ping these third parties. If one node cannot see any of the devices, it goes into negotiation with the cluster to determine whether the problem is network interface failure, and, if necessary, the resources that depend on the network interface are moved to another node.
Windows 2000 Datacenter Server includes four-node support for customers that want to implement services such as server consolidation or want to minimize the costs associated with clustering.
Improved management features a wizard for configuring the virtual server, resource group, and application. The virtual server is an IP address, or virtual network name, that clients use to access resources on the cluster.
The Cluster Administrator is now integrated with the Microsoft Management Console (MMC) component built in to Windows 2000 Server. In the MMC snap-in, under Services Application, you will see Cluster. Clicking Cluster will involve the Cluster Administrator, which will enable you to manage any computer in the cluster node.
Support for Plug and Play technology makes it easier to add network resources without rebooting the system. For example, when you add a new storage device, you'll be able to create a volume and format this volume without rebooting the system. The cluster service will automatically detect the new volume, and if it's connected to the storage bus, you'll be able to configure this new volume as a cluster resource without shutting down any nodes or affecting other resources on the cluster.
A COM Automation interface has been added to the Cluster API (CLUSAPI). With the COM Automation server, you can use Visual Basic® Scripting Edition to write scripts for managing your cluster.
Two backup APIs have been added. A backup cluster database and a restore cluster database simplify the restoration of the cluster in cases such as cluster disk failure or when it is necessary to rebuild the cluster.
Although Windows 2000 doesn't include a class store object in the Active DirectoryTM service, you can use the cluster to distribute applications to clients. Using the class store, you can deploy Cluster Administrator extension DLLs to clients that you use to manage the cluster.
Because Windows 2000 supports only fully certified configurations, when you deploy a cluster, you have to make sure that the entire configuration is on the Hardware Compatibility List (HCL), posted on https://www.microsoft.com/. The reason for this limited support is that Windows 2000 depends on the SCSI reserved protocol. And while many SCSI devices work fine on a stand-alone server, they don't implement the SCSI protocol 100 percent. As a result, they fail, particularly in heterogeneous configurations.

For example, if you take two servers and put SCSI control in one node and an optical control in another, the cluster won't run. You will see strange failures because these two do bus resets differently even though they both supposedly support the SCSI protocol.

Therefore, Windows 2000 Server only supports fully certified SCSI and Fibre Channel configurations, and Windows 2000 Datacenter Server supports only Fibre Channel. All the information you need on building a test configuration and getting it certified and added to the HCL is available at https://www.microsoft.com/hwtest/default.asp.
Windows 2000 supports only PCI network adapters. If you decide to add additional private networks to your cluster, you have to use PCI network adapters and, if possible, use the same adapters that were used in the original configuration.
Microsoft recommends having multiple private networks in your clusters to eliminate situations in which there is a single point of failures.
Think about capacity. If you configure your cluster to have two nodes, both of them must have sufficient capacity to run all the resources on the cluster. If one node were to fail, all the resources would fail over to the other node, which would then be running all the applications and services for that cluster. Therefore, each node must have the capacity to run all the resources.
You also have to think about the distance between nodes. You are limited to about 20 meters (or 10 kilometers per Fibre Channel), but it's not enough to build geographically dispersed clusters, as the following page explains.

Another area of consideration concerns network characteristics.

For example, you have to decide whether to use private or public networks. When you configure the cluster, you have to specify the role for each network in the cluster. Usually, you will configure private interconnects as private networks used only for cluster communication. Public networks connecting the cluster to the clients can be configured either as client-only networks or mixed networks.

While mixed networks can be used for public communication and private communication, this approach isn't recommended. It is possible for someone to spoof the public network so that clusters send what are called "poison pockets" to one another when they detect inconsistencies in the cluster.

Because Cluster Service depends on the link controllers, you need to ensure the reliability of the domain. You must have enough replicas to guarantee that a domain controller is always available. Cluster Service will fail if it cannot find the domain controller.

Also, if you use Kerberos, Kerberos tickets can expire, and if Kerberos tickets expire Cluster Service will try to renew the ticket. If Cluster Service cannot access the domain controller, it will fail, and the cluster will fall apart.

In Windows 2000, you can use the same troubleshooting tools that you used in Windows NT 4.0.

One especially useful tool is the Cluster Log, where the Cluster Service on each node logs all the operations. The Cluster Log is usually stored in the Cluster Directory.

In Windows 2000, Cluster Log is turned on by default.

If it's not turned on, you have to create a system variable to turn it on. Once you've created this variable, you just have to restart the Cluster Service on the node, and Cluster Log will start logging information. Microsoft support technicians prefer Cluster Log as the tool for debugging customer problems.

Microsoft continues to enhance Cluster Service. These are some of the areas of focus for future releases.

These books are excellent sources of information on clustering.

These Web sites contain a wealth of information about clustering reliability.

Microsoft has set up these aliases so that you can submit feedback, suggestions, and questions directly to Cluster Service program managers.

Microsoft Cluster Service

Cluster Configuration Repository

Tiebreaker for Split-Brain Situations

Cluster Data Consistency

Additional resources