Evaluating the Benefits of Clustering

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

A cluster is two or more computers working together to provide higher availability, reliability, and scalability than can be obtained by using a single system. When failure occurs in a cluster, resources are redirected and the workload is redistributed. Microsoft cluster technologies guard against three specific types of failure:

  • Application and service failures, which affect application software and essential services.

  • System and hardware failures, which affect hardware components such as CPUs, drives, memory, network adapters, and power supplies.

  • Site failures in multisite organizations, which can be caused by natural disasters, power outages, or connectivity outages.

Benefits of Clustering

If one server in a cluster stops working, a process called failover automatically shifts the workload of the failed server to another server in the cluster. Failover ensures continuous availability of applications and data.

This ability to handle failure allows clusters to meet two requirements that are typical in most data center environments:

  • High availability. The ability to provide end users with access to a service for a high percentage of time while reducing unscheduled outages.

  • High reliability. The ability to reduce the frequency of system failure.

Additionally, Network Load Balancing clusters address the need for high scalability, which is the ability to add resources and computers to improve performance.

Limitations of Clustering

Server clusters are designed to keep applications available, rather than keeping data available. To protect against viruses, corruption, and other threats to data, organizations need solid data protection and recovery plans. Cluster technology cannot protect against failures caused by viruses, software corruption, or human error.

The Cluster service, the service behind server clusters, depends on compatible applications and services to operate properly. The software must respond appropriately when a failure occurs. Administrators must be able to configure where an application stores its data on the server cluster. Also, clients that are accessing a clustered application or service must be able to reconnect to the cluster virtual server after a failure has occurred and a new cluster node has taken over the application.

Only services and applications that use TCP/IP for client-server communication are supported on Network Load Balancing clusters and server clusters.

You cannot use Windows Server 2003 File Replication service (FRS) on shared cluster storage. You also cannot create domain-based Distributed File System (DFS) roots on shared cluster storage. Finally, without the proper management tools, you also cannot use dynamic disks on shared cluster storage. For more information about using dynamic disks on shared cluster storage, see article 237853, "Dynamic Disk Configuration Unavailable for Server Cluster Resources" In the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources. DFS and FRS are discussed in detail in "Designing and Deploying File Servers" in this book.

Clustering vs. Fault-Tolerant Hardware Components

Both clustering and fault-tolerant hardware protect your system from failures of components such as the CPU, memory, fan, or PCI bus. Fault-tolerant hardware is discussed in the section "Planning and Designing Fault-Tolerant Hardware Solutions" earlier in this chapter. Clustering and fault tolerance can be used together in a complete end-to-end solution, but be aware that the two technologies provide high availability in different ways.

Clustering can protect your system against an application or operating system failure, but a fault-tolerant standby server (or a server that uses hot-swappable hardware, which allows a device to be added while the server is running) cannot. It is also possible to upgrade an application or operating system or to install a service pack or hotfix without taking a cluster offline. Upgrades on standby servers, however, are only possible by taking that hardware offline.