Performing Maintenance on WCF Broker Nodes in a Failover Cluster with Windows HPC Server 2008 R2

Article
09/20/2010

Applies To: Windows HPC Server 2008 R2

This topic explains how to perform maintenance on WCF broker nodes in a failover cluster where the WCF broker nodes are running Windows HPC Server 2008 R2. For information about the process of configuring a WCF broker node in a failover cluster, see Steps for Setting up Windows HPC Server 2008 R2 with Failover Clustering for WCF Broker Nodes.

Overview of performing maintenance on WCF broker nodes in a failover cluster

To review the sequence of actions to use when performing maintenance on servers running one or more WCF broker nodes in a failover cluster, see one of the following sections:

Performing maintenance on one physical server at a time
Performing maintenance on all the servers in a failover cluster at the same time

Performing maintenance on one physical server at a time

If you can perform maintenance on one server at a time, plan to start with standby servers (those not currently running an instance of a WCF broker node) and finish with active servers. The following list outlines the sequence of actions to take:

Use HPC Cluster Manager to take a physical server (WCF broker node) offline, and optionally, to shrink running jobs on the server you are taking offline. This ensures that the server will not accept additional jobs. For more information, see Use HPC Cluster Manager to take a physical server offline or bring it online, later in this topic.
Use Failover Cluster Manager to see where the clustered instance of the WCF broker node is currently running. If it is running on the server that you want to perform maintenance on, move the clustered instance to a different server. For more information, see Use Failover Cluster Manager to control a clustered instance.
Use Failover Cluster Manager to pause the failover cluster node (server) that you want to perform maintenance on. For more information, see Use Failover Cluster Manager to pause or resume a failover cluster node.
Perform the necessary maintenance on the server.
Use Failover Cluster Manager to resume the failover cluster node that you performed maintenance on. For more information, see Use Failover Cluster Manager to pause or resume a failover cluster node.
Use HPC Cluster Manager to bring the physical server (WCF broker node) online. This allows the server to begin accepting jobs again. For more information, see Use HPC Cluster Manager to take a physical server offline or bring it online, later in this topic.
As needed, repeat the process.

Performing maintenance on all the servers in a failover cluster at the same time

If you must perform maintenance on all the servers in a failover cluster at the same time, plan for downtime, and notify users as appropriate. The following list outlines the sequence of actions to take:

Use HPC Cluster Manager to take all affected WCF broker nodes offline, and optionally, to shrink running jobs on the servers you are taking offline. This ensures that those servers will not accept additional jobs. For more information, see Use HPC Cluster Manager to take a physical server offline or bring it online, later in this topic.
Use Failover Cluster Manager to take all clustered instances of WCF broker nodes (running in the failover cluster) offline. For more information, see Use Failover Cluster Manager to control a clustered instance.
Perform the necessary maintenance on the servers.
Use Failover Cluster Manager to bring all clustered instances of WCF broker nodes online. For more information, see Use Failover Cluster Manager to control a clustered instance.
Use HPC Cluster Manager to bring all affected WCF broker nodes online. This allows these servers to begin accepting jobs again. For more information, see Use HPC Cluster Manager to take a physical server offline or bring it online, later in this topic.

Procedures for performing maintenance on WCF broker nodes in a failover cluster

As outlined in the preceding lists, use the following procedures to perform maintenance on WCF broker nodes in a failover cluster.

Use HPC Cluster Manager to take a physical server offline or bring it online

After you use HPC Cluster Manager to take a physical server offline, the server will not accept additional jobs. When you bring the server back online, it will accept jobs again.

To use HPC Cluster Manager to take a physical server offline or bring it online

In HPC Cluster Manager, in Node Management, navigate to a view that shows the WCF broker nodes that you want to perform maintenance on.
In the views pane, right-click a node, and then click Take Offline or Bring Online.
If you are taking a node offline, in the Take Offline dialog box, you can optionally select the check box labeled Force the node offline and shrink running jobs. If you do not select this check box, the node enters the Draining state, in which jobs are given some time to complete before the node is taken offline.

Use Failover Cluster Manager to control a clustered instance

In Failover Cluster Manager, you can control a clustered instance of a WCF broker node that is running in the failover cluster. You can move the clustered instance to a different server in the failover cluster, you can take the clustered instance offline, or you can bring the clustered instance online.

To use Failover Cluster Manager to control a clustered instance

To open Failover Cluster Manager, click Start, click Administrative Tools, and then click Failover Cluster Manager. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Yes.
Under Services and Applications, expand the console tree.
In the console tree, click the clustered instance of the WCF broker node and view the status in the center pane. Note the Current Owner, which is listed as part of the status.
In the console tree, right-click the clustered instance of the WCF broker node, and then select the appropriate command:
- Move this service or application to another node
- Take this service or application offline
- Bring this service or application online
When prompted, confirm your choice.

Use Failover Cluster Manager to pause or resume a failover cluster node

You can pause a failover cluster node before performing maintenance on the node, and then resume the node after the maintenance is complete.

To use Failover Cluster Manager to pause or resume a failover cluster node

To open Failover Cluster Manager, click Start, click Administrative Tools, and then click Failover Cluster Manager. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Yes.
Under Nodes, expand the console tree.
Right-click the node that you want to pause or resume, and then click Pause or Resume.

Additional references

Overview of Windows HPC Server 2008 R2 and SOA in Failover Clusters