Overview of Configuring the Head Node for Failover with Windows HPC Server 2008 R2

Article
03/15/2011

Applies To: Windows HPC Server 2008 R2

This guide provides procedures and guidance for deploying Windows HPC Server 2008 R2 in a failover cluster where the servers are running Windows Server 2008 R2. The guide describes how you can configure the head node in a failover cluster. This topic provides an overview of the configuration. For a detailed list of requirements for the configuration, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

In this section

Overview

Services and resources during failover of the head node

Overview

In an HPC cluster, if you want to provide high availability for the head node, you can configure it in a failover cluster. The failover cluster contains servers that work together, so if one server in the failover cluster fails, another server in the cluster automatically begins providing service (in a process known as failover).

Important

The word “cluster” can refer to a head node with compute nodes running software in Windows HPC Server 2008 R2, or to a set of servers running Windows Server 2008 R2 that are using the failover cluster feature. The word “node” can refer to a head node, compute node, or WCF broker node running software in Windows HPC Server 2008 R2, or to one of the servers in a failover cluster. In this document, servers in the context of a failover cluster are usually referred to as “servers,” to distinguish failover cluster nodes from an HPC cluster head node or compute node. Also, the word “cluster” is placed in an appropriate phrase (such as “failover cluster”) or used in context in a sentence to distinguish which type of cluster is being referred to.

Each of the servers in a failover cluster must have access to the failover cluster storage. Figure 1 shows the failover of head node services that can run on either of two servers in a failover cluster:

Failover of head node services in HPC cluster

Figure 1 Failover of head node services in HPC cluster

To support the head node, you must also configure a SQL Server, either as a SQL Server failover cluster (for higher availability) or as a standalone SQL Server. Figure 2 shows a configuration that includes a failover cluster that runs the head node and a failover cluster that runs SQL Server.

Head node failover cluster with SQL Server

Figure 2 Failover clusters supporting the head node and SQL Server

In the preceding figure (Figure 2), the failover cluster storage for the head node includes one disk (LUN) for a clustered file server and one disk as a disk witness. The disk witness is necessary for any failover cluster that has an even number of nodes (the head node failover cluster has two).

When both the head node and SQL Server are in failover clusters, multiple failover clusters are required. Figure 3 illustrates that when you configure multiple failover clusters, you must limit the exposure of each storage volume or logical unit number (LUN) to the nodes in one failover cluster:

Failover clusters with no overlap of LUNs

Figure 3 Two failover clusters, each with its own LUNs

Note that for the maximum availability of any server, it is important to follow best practices for server management—for example, carefully managing the physical environment of the servers, testing software changes before fully implementing them, and carefully keeping track of software updates and configuration changes on all servers in a failover cluster.

When the head node is configured in a failover cluster, for the network topology, we recommend either Topology 2 or Topology 4 (the topology shown in Figures 1 and 2). In these topologies, there is an enterprise network and at least one other network. Using multiple networks in this way helps avoid single points of failure. For more information about network topologies, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

For more information about network topologies in Windows HPC Server 2008 R2, see Appendix 1: HPC Cluster Networking (https://go.microsoft.com/fwlink/?LinkId=198313) in the Design and Deployment Guide for Windows HPC Server 2008 R2.

Services and resources during failover of the head node

This section summarizes some of the differences between running the head node for Windows HPC Server 2008 R2 on a single server and running it in a failover cluster.

Important

In a failover cluster, the head node cannot also be a compute node or WCF broker node. These options are disabled when the head node is configured in a failover cluster.
For connections to a head node that is configured in the context of a failover cluster, do not use the name of a physical server. Use the name that appears in Failover Cluster Manager. To see the name in Failover Cluster Manager, in the appropriate failover cluster, expand Services and applications, select the clustered instance of the head node, and in the center pane, view the name under Server Name. After the head node is configured in a failover cluster, it is not tied to a single physical server, and it does not have the name of a physical server.

The following table summarizes what happens to each service or resource during failover of the head node:

Service or Resource	What Happens in a Failover Cluster
HPC SDM Store Service HPC Job Scheduler Service HPC Session Service HPC Diagnostics Service	Fail over to the other server in the failover cluster.
Four file shares that are used by the head node	Ownership fails over to the other server in the failover cluster.
DHCP HPC Management Service HPC MPI Service HPC Node Manager Service HPC Reporting Service NAT WDS	Start automatically and run on each individual server. The failover cluster does not monitor these services for failure.
File sharing for compute nodes	Fails over to the other server in the failover cluster if configured through the Failover Cluster Manager snap-in.

Note

The HPC Basic Profile Web Service and the HPC Storage Management Surrogate service are also installed on a head node (whether that head node is in a failover cluster or not). However, these services are not activated by default. For information about uses and requirements for the HPC Basic Profile Web Service, see HPC Server Basic Profile Web Service Operations Guide (https://go.microsoft.com/fwlink/?LinkId=198311).

Additional references

Configuring Windows HPC Server 2008 R2 for High Availability of the Head Node

Requirements for Windows HPC Server 2008 R2 in Failover Clusters

Running the Head Node in a Failover Cluster with Windows HPC Server 2008 R2

Overview of Configuring the Head Node for Failover with Windows HPC Server 2008 R2

Overview

Services and resources during failover of the head node

Additional references

Additional resources