Deploying and Managing Microsoft Windows Compute Cluster Server 2003

Article
08/16/2010

Applies To: Windows Compute Cluster Server 2003

Deploying Microsoft® Windows® Compute Cluster Server 2003 is not a complex operation, but as with all technology, it requires forethought and planning. Therefore, it is critical to understand the key concepts involved, which include:

The Windows compute cluster architecture
Hardware and software requirements
Supported network topologies
Deployment strategies, including setup options

Note

For general information about Windows Compute Cluster Server 2003 features and capabilities, see the white paper Overview of Microsoft Windows Compute Cluster Server 2003 (https://go.microsoft.com/fwlink/?LinkId=56090).

Windows Compute Cluster Server 2003 supports several different deployment scenarios based on the different system configurations the administrator selected before deployment. Deployment is simplified by the Compute Cluster Administrator, which provides a wizard interface that guides administrators through the process after deployment decisions are made. This paper discusses each supported deployment scenario and covers the basics of post-deployment compute cluster administration.

Windows Compute Cluster Server 2003 Architecture

Windows Compute Cluster Server 2003 provides a complete solution for high-performance computing (HPC) by leveraging several different components—some required, some optional. The cluster design the administrator selects depends on the intended computational goal as well as on the configuration of the servers that make up the cluster. Optimal deployment depends on the applications being run in the cluster. (Sample scenarios that will help in determining appropriate configurations are included later in this white paper.) Deployment is facilitated by a wizard, after the appropriate scenario is selected by the administrator.

Windows Compute Cluster Server 2003 components include:

Head node. The head node provides deployment and administration user interfaces (UIs) as well as management services for the compute cluster. The UIs include the Compute Cluster Administrator, the Compute Cluster Job Manager, and a Command Line Interface (CLI). Management services include job scheduling as well as job and resource management. Optionally, administrators can use Remote Installation Services (RIS) to support automated compute node deployment. If compute nodes do not have interfaces to the public network (private interfaces only), Internet Connection Sharing (ICS) is used to configure Network Address Translation (NAT) between the nodes and the public network. In this case, the head node acts as a gateway between the public and the private networks that make up the cluster and also provides limited Dynamic Host Configuration Protocol (DHCP) and Domain Name System (DNS) services to the private network. These limited DHCP and DNS services are integral to ICS, and administrators can configure them when ICS is implemented. Of course, administrators can also assign static addresses to the compute nodes.
Compute nodes. Any computer configured to provide computational resources as part of the compute cluster is a compute node. Compute nodes allow users to run computational jobs. These nodes must run a supported operating system, but they do not require the same operating system or even the same hardware configuration. Optimally, compute nodes include a similar configuration to simplify deployment, administration, and (especially) resource management. Compute clusters composed of different hardware configurations will limit the cluster’s capabilities, because jobs running in Parallel mode and requiring nodes of different capabilities will be able to run only at the speed of the slowest processor in the selected nodes.
Job Scheduler. The Compute Cluster Job Scheduler runs on the head node and manages the job queue, all resource allocations, and all job executions by communicating with the Node Manager Service that runs on each compute node.
Microsoft ® Message Passing Interface (MPI). Microsoft MPI software (called MS MPI) is a key networking component of the compute cluster. MS MPI can utilize any Ethernet connection that Microsoft Windows Server™ 2003 supports as well as low-latency and high-bandwidth connections, such as InfiniBand or Myrinet, through Winsock Direct drivers provided by the hardware manufacturers. Gigabit Ethernet provides a high-speed and cost-effective connection fabric, while InfiniBand is ideal for latency sensitive and high-bandwidth applications. MS MPI supports several networking scenarios. Selecting the right networking scenario for the MS MPI component of the cluster is one of the most important decisions in compute cluster design and should be performed with care, based on the criteria detailed in the previous section.
Public and private networks. Compute nodes are often connected to each other through multiple network interfaces. For management and node deployment, administrators can configure compute clusters with a private network. Administrators can also use a private network for MPI traffic. This traffic can be shared with the management private network, but the highest level of performance is achieved with a second, dedicated private network that supports only MPI traffic. Windows Compute Cluster Server 2003 version 1 supports five different network topologies.
Microsoft Active Directory® directory service. Each node of the cluster must be a member of an Active Directory domain, because Active Directory provides authorization and authentication services for Windows Compute Cluster Server 2003. The Active Directory domain can be independent of the cluster (for example, with a cluster running in a production Active Directory domain) or can run within the cluster, specifically on the head node in scenarios where the cluster is a production environment in and of itself.

Figure 1 illustrates the relationship among the elements that make up a compute cluster.

Cluster architectural elements

Figure 1. Compute Cluster Server architectural elements

Hardware and Software Requirements

The hardware and software components that an administrator selected prior to installing the cluster directly affect the configuration of the cluster as it is built.

Hardware requirements

Minimum hardware requirements for Windows Compute Cluster Server 2003 are identical to those of Microsoft® Windows Server™ 2003 operating system, Standard x64 Edition (see Table 1). This means that Windows Compute Cluster Server 2003 supports up to 32 gigabytes (GB) of RAM.

Note

Microsoft® Windows Server™ 2003, Compute Cluster Edition operating system can be used only as part of a Windows Compute Cluster Server 2003 cluster, not as a general-purpose infrastructure server.

The minimum hardware requirements are show in the following table:

Hardware	Requirements
CPU	X64-bit computer with Intel Pentium or Xeon family processors with Extended Memory 64 Technology (EM64T) architecture; AMD Opteron or Athlon family processors; other compatible processor(s)
RAM	512 megabytes (MB) minimum
Multiprocessor support	Windows Compute Cluster Server 2003 and Windows Server 2003, Standard x64 Edition, support up to four (4) processors per server. Windows Server 2003, Enterprise x64 Edition, supports up to eight (8) processors per server.
Disk space for setup	4 GB
Disk volumes	Head node. Two volumes—a system volume and a data volume—are required if RIS is used, because RIS data cannot reside on the system volume. Compute nodes. A single system volume is required. Redundant array of independent disks (RAID). RAID is supported but not required.
Network adapter	Each node requires at least one network adapter. Additional network adapters can be used to set up a private network for the cluster or to set up a high-speed network for MS MPI.
Preboot execution environment (PXE) support—basic input/output system (BIOS) and network adaptor	If you plan to use the Automated Addition method of adding compute nodes, servers that you plan to add as compute nodes must support PXE in the boot sequence. This is usually configured in the BIOS.

Software requirements

Windows Compute Cluster Server 2003 has the following software requirements.

Operating system

Windows Compute Cluster Server 2003 requires one of the following operating systems:

Windows Server 2003, Compute Cluster Edition
Windows Server 2003, Standard x64 Edition
Windows Server 2003, Enterprise x64 Edition
Windows Server 2003 R2, Standard x64 Edition
Windows Server 2003 R2, Enterprise x64 Edition

For an overview of Microsoft 64-bit operating systems, see the Windows Server 2003 x64 Editions Product Overview on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=63899).

Remote workstation computer

The Compute Cluster Administrator and the Compute Cluster Job Manager are installed on the head node by default. If you install the client utilities on a remote workstation, an administrator can manage clusters from that workstation. If you install the Compute Cluster Administrator or Job Manager on a remote computer, the computer must have one of the following operating systems installed:

Windows Server 2003, Compute Cluster Edition
Windows Server 2003, Standard x64 Edition
Windows Server 2003, Enterprise x64 Edition
Windows XP Professional x64 Edition
Windows Server 2003 R2, Standard x64 Edition
Windows Server 2003 R2, Enterprise x64 Edition

Additional requirements

In addition, Windows Compute Cluster Server 2003 requires the following:

Microsoft .NET Framework 2.0
Microsoft Management Console (MMC) 3.0 to run the Compute Cluster Administrator snap-in
Microsoft® SQL Server™ 2000 Desktop Engine (MSDE) to store all job information

Updates

Finally, two updates may be required, depending on the selected implementation:

In a configuration that uses ICS on the head node and in which the head node has been promoted to a domain controller (DC) role as well as head of the cluster, an update is required to allow ICS to operate. For more information about this issue, see Microsoft Knowledge Base article 897616, "The Internet Connection Sharing area does not appear in the properties of the active network connection after you install Active Directory to configure a computer that is running Windows Server 2003 with SP1 as a domain controller," on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=56098).
In a configuration that uses RIS on the head node, an update is required. For more information about how to use RIS to deploy compute nodes in Windows Compute Cluster Server 2003 and provides update detailsIn this case, see Microsoft Knowledge Base article 907639, "How to use Remote Installation Services (RIS) to deploy compute nodes in Windows Compute Cluster Server 2003," on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=56097).

Network topology

Selecting the appropriate topology depends on the goals for the compute cluster. Typically, administrators select the applications they want to run, and then select the appropriate network topology to support these applications. Windows Compute Cluster Server 2003 supports five different network topologies. Each is described here along with a sample scenario for implementation. Which topology an administrator selects depends on the level of performance required between the compute nodes.

Scenario 1: Two network adapters on the head node, one network adapter on compute nodes

In this configuration, the head node provides ICS between the compute nodes and the public network. The public network adapter for the head node is registered in DNS on the public network, and the private network adapter controls all communication to the cluster. The head node acts as a gateway for all public network–to–compute cluster communications (see Figure 2). The private network is used for the management and deployment of all compute nodes; it can be also be used for high-speed MS MPI computational traffic.

Note

Because this configuration relies on ICS, compute nodes are hidden behind the head node.

Scenario 1 network topology

Figure 2. Scenario 1 network topology

This configuration optionally supports RIS to simplify deployments of compute nodes and can be used to run tightly parallel applications. Because there is no direct communication between the public network and the compute nodes, administrators must perform application debugging directly on the compute nodes or on separate systems on the private network.

Note

If Active Directory is installed on the head node, the ICS update must also be installed to run ICS. In addition, if RIS is selected, the RIS update is required.

Scenario 2. Two network adapters on each node

This configuration supports one network adapter connected to the public (corporate) network and one to the private, dedicated cluster network (see Figure 3). This private network is used for cluster management, optional RIS deployment of node images, and MS MPI traffic. Because each compute node is also directly connected to the public network, debugging applications running on the nodes is easier. Programmers can connect directly to the nodes when issues arise, making debugging more efficient.

Scenario 2 network topology

Figure 3. Scenario 2 network topology

Like the first scenario, this scenario supports tightly parallel applications, but with the added benefit that the head node does not act as a bottleneck, forcing all computational results to pass through it when being reported to operational programs.

In this scenario, use of ICS is optional, because each compute node can directly communicate with production DHCP and DNS servers through the public network. If RIS is used and ICS has not been enabled on the head node, the full-featured DHCP service must be configured for the private network on the head node in support of RIS, because RIS requires automatic IP address assignments during the remote installation process.

Note

If Active Directory is installed on the head node, the ICS update must also be installed to run ICS. In addition, if RIS is selected, the RIS update is required.

Scenario 3: Three network adapters on the head node, two on compute nodes

This configuration is similar to scenario 1 with one key difference: Because each compute node has two network adapters, they have a connection to the private, dedicated cluster network and a connection to a secondary private network running the MS MPI high-speed protocol (see Figure 4). The head node provides ICS between the compute nodes and the public network as well as supporting RIS deployments.

Scenario 3 network topology

Figure 4. Scenario 3 network topology

This scenario is better suited to tightly parallel applications, because their computational traffic is routed on a separate private network. This configuration removes network latency from the equation when running parallel applications. This MS MPI network can run on Ethernet, possibly Gigabit Ethernet, or can use InfiniBand in support of latency sensitive and high-bandwidth applications.

In this scenario, the administrator must configure the IP address of each MS MPI interface on each node manually, ideally when the node is joined to the cluster but not activated yet.

Again, debugging support is limited, because access to the compute nodes is limited to the head node itself.

Note

If Active Directory is installed on the head node, the ICS update must also be installed to run ICS. In addition, if RIS is selected, the RIS update is required.

Scenario 4: Three network adapters on each node

This comprehensive configuration is based on three network adapters on each node in the cluster, including the head node and the compute nodes. It supports one network adapter connected to the public (corporate) network; one to a private, dedicated cluster management network; and one to a high-speed, dedicated, MS MPI network as shown in the following graphic:

Scenario 4 network topology

Figure 5. Scenario 4 network topology

This scenario is ideal for organizations that need to run tightly parallel applications, because computations are performed on the dedicated MS MPI network, while management and deployment can be performed through the private network. In addition, developers can use the public network connection to the compute nodes for debugging application code when errors occur.

In this scenario, administrators must configure the IP address of each MS MPI interface on each node manually, ideally when the node is joined to the cluster but not activated yet.

Note

If Active Directory is installed on the head node, the ICS update must also be installed to run ICS. In addition, if RIS is selected, the RIS update is required.

Scenario 5: One Network Adapter Per Node

In this configuration, the public network is shared for all network traffic (see Figure 6). This scenario is appropriate for testing and concept validation of compute clusters or in scenarios where computations on one node do not depend on other nodes. This configuration is not well suited to parallel computations, because these computations can cause a great deal of intra-cluster communications that may affect the organizational network.

Scenario 5 network topology

Figure 6. Scenario 5 network topology

In addition, because a single network adapter is used on each node, RIS deployment of compute nodes is not supported. Each compute node must be manually installed.

Update Management of Compute Clusters

Administrators should always be concerned about security, especially in compute clusters that can include hundreds of nodes. Microsoft has released Windows Server Update Services (WSUS), a comprehensive set of technologies that provide update management for Microsoft technologies. WSUS is freely available to users of Windows Server 2003.

When planning compute cluster configurations, administrators should include WSUS as part of the cluster in order to facilitate update management for the clusters they design. Information about deploying WSUS can be located at the Microsoft Windows Update Services Web site (https://go.microsoft.com/fwlink/?LinkId=54561).

Deployment Strategies

Deploying Windows Compute Cluster Server 2003 is straightforward. First, the administrator installs the head node, then the compute nodes. Compute nodes can be installed manually or by using automated deployment tools. Windows Compute Cluster Server 2003 leverages RIS for automated deployment, but administrators can use non-Microsoft tools, as well. To use RIS on the head node, a private network is required as part of the network topology.

Windows Compute Cluster Server 2003 supports three compute node deployment scenarios.

Manual addition. This scenario relies on separate installations both of the operating system and of the compute cluster elements on the node. The administrator then uses the Compute Cluster Administrator on the head node to add the node to the cluster.
Compute Cluster Pack installation. This scenario relies on a separate installation of the operating system, followed by the installation of the Windows Compute Cluster Server 2003 elements on the nodes through the Microsoft Compute Cluster Pack CD. During the installation of the Windows Compute Cluster Server 2003 elements, operators have the opportunity to join the cluster immediately.
Automated addition. This scenario relies on RIS to deploy operating system images to the new nodes and integrate them into the cluster.

In all cases, the administrator must install and configure the head node separately.

Two CDs are required to perform Windows Compute Cluster Server 2003 installation. The first CD includes Windows Server 2003, Compute Cluster Edition. The second CD includes the Compute Cluster Pack—a combination of the interfaces, utilities, and management infrastructure that make up Windows Compute Cluster Server 2003. Administrators use this CD to configure a head or compute node.

Windows Compute Cluster Server 2003 installation involves two required steps and one optional step:

Install and configure the operating system and the head node services.
Install additional head node services such as ICS and RIS. (This step may be optional depending on the selected network topology.)
Install and configure compute nodes.

Each step involves several activities. As with any successful deployment, proper, documented procedures are required for each step. Complete installation and deployment testing is recommended before the compute cluster is used in production.

Install and Configure the Head Node

The procedure for creating a head node is similar to a typical Windows Server setup but includes additional activities. Local Administrator access rights are required for each operation. The process for installing and configuring the head node is illustrated in Figure 7.

Head node installation and configuration

Figure 7. The head node installation and configuration process

System preparation

Preparing the head node system involves decisions such as the number of network adapters to include, the amount of base memory necessary, and the number of disk volumes to set up. A single disk volume may be used if compute nodes are deployed manually. Two volumes are required if RIS is used. More may be required if the head node supports additional services, such as file services.

Note

Be sure to select an appropriate computer name for the head node. This name becomes the cluster name after the head node software is installed and cannot be changed.

Operating system installation

After the system hardware is configured, the administrator can install the operating system. If additional disk volumes have been earmarked for other services, they are also configured at this time.

Active Directory integration

When the head node system is up and running, it must join an Active Directory domain. If this system is being built for testing purposes, the administrator may choose to use a domain separate from the production Active Directory domain. If the system is to be used for production, it must be made a member of the production directory. Optionally, the head node can become a domain controller. If this is the case, it can be in its own separate domain or part of the production domain. If the administrator uses a separate domain, especially a separate forest, the administrator must establish a trust to allow access from the production domain or forest to the compute cluster.

Note

If the head node is a domain controller, the ICS update is required to run ICS on the head node.

Configure the Head Node

To configure the head node, use the Compute Cluster Pack CD. The Microsoft Compute Cluster Pack Installation Wizard automatically starts when the CD is inserted (see Figure 8). Select Create a new compute cluster with this server as the head node, and then click Next to automatically assign the computer name of the head node as the cluster name. Follow the prompts, and identify the location for installed files.

Starting head node installation

Figure 8. Starting head node installation

When installation is complete, a To Do List is displayed outlining the next steps in the process. This To Do List is also displayed in the Compute Cluster Administrator:

The To Do list

Figure 9. The To Do List in Compute Cluster Administrator

Define the network topology

Use the Networking tile in the To Do List to start the Define Cluster Network Topology Wizard. The defined topology varies based on the number of network adapters included in the head node. This wizard also supports the optional configuration of ICS on the head node.

Configure RIS (optional)

If you plan to use the Automated Addition method to deploy compute nodes using RIS, install and configure RIS on the head node. To do so, use the Configure RIS Wizard from the RIS tile. RIS images must be stored on a volume separate from the system disk. In addition, you must install the RIS beta hotfix. All the configuration options (except for RIS update installation) are performed through the wizard.

Create RIS images (optional)

If RIS has been installed, setup images are required. The administrator can create these images can through the Manage Images Wizard on the same tile. Image creation requires a copy of the installation CD for the edition of Windows Server 2003 in use. In addition, each image requires a valid license key. This process is vastly simplified when Enterprise Agreement (EA) licenses are used, because the same key can be used for each compute node. Using retail versions of the installation CD requires a separate key per compute node.

Configure access rights

Deployment is a good time to configure access rights for administrators and users. Use the Configure Users Wizard on the Cluster Security tile. Administrators have access to all the features of the cluster and automatically become local administrators on each compute node. Users can submit jobs through the Compute Cluster Job Manager or the CLI. Though they can view the entire job queue, users can modify only their own jobs. Jobs run under the submitting user’s credentials, so jobs have access only to network resources the users themselves can access.

Optional services

When the core configuration is complete, the administrator can configure additional services if they are required. For example, if DHCP is required to provide private IP addresses to compute nodes, use the Manage Your Server Web interface (located in Administrative Tools) to add this role to the head node. Ensure that this service is configured on a separate disk volume. Administrators can also use the Manage Your Server Web interface to install and configure file sharing services. Manage Your Server offers several server role installation scenarios that not only enable the services on the server but also install custom MMC snap-ins for the management of the newly installed service. You do this by clicking Add or remove a role link in Manage Your Server.

Complete all head node configuration tasks before moving on to compute node deployment.

Add Compute Nodes

As mentioned earlier, administrators can add compute nodes in one of three ways. One method (Automatic Addition) involves automated deployment of the operating system from the head node using RIS system images. RIS can be used with EA licenses or retail versions, but using retail versions of installation software requires the multiple license keys—one for each compute node installation. The other two methods (manually adding nodes and using the Compute Cluster Pack CD) require that the operating system of the prospective compute node computer be installed before Compute Cluster Pack setup is run.

Manually adding nodes and the Compute Cluster Pack addition

Both manual addition and the Compute Cluster Pack addition require that the operating system be preinstalled. The difference lies in how the administrator adds the node to the cluster. The process described here is appropriate for single installations of compute clusters that include only a few nodes. Local administrator access rights are required for each operation. Use the following process to prepare the compute notes:

System preparation. The preparation of the compute node system involves decisions such as the number of network adapters to include and the amount of base memory to set up.
Operating system installation. After the system hardware is configured, the operating system can be installed. This operating system must be one of the supported operating systems.
Active Directory integration. When the compute node system is up and running, it must be joined to the same Active Directory domain as the head node.
Add the compute node to the cluster. At this step of the installation process, you can specify a cluster head node to join, or defer that decision as described in the following bullets.
- Compute Cluster Pack addition installs the node components and joins the cluster in one step. The Microsoft Compute Cluster Pack Installation Wizard automatically starts when the Compute Cluster Pack CD is inserted. Select Join this server to an existing compute cluster as a compute node, and then type the name of the head node to join. Follow the prompts, and identify the local folder location into which files can be copied.
Note

If the head node is not found, Compute Cluster Pack services will be installed, but the compute node will not be joined to the cluster. Join the compute node to the cluster using the same procedure as the manual addition process described below.
- For manual addition of the node, select Join this server to an existing compute cluster as a compute node, but this time do not specify the name of the head node. Follow the prompts, and identify the location into which files can be copied. After the Compute Cluster Pack components have been installed, make a note of the compute node name and return to the head node to start the Compute Cluster Administrator. Select Add Nodes from the Node Management pane (see next graphic). Select Manual Addition on the Select Add Node Method page, add the node by typing the computer name (DNS or NetBIOS), and click Add. Multiple nodes can be added at the same time. Follow the wizard prompts until the operation is complete.

Manually add nodes

Figure 10. Use Compute Cluster Administrator to manually add nodes

Approve new nodes. When the addition is complete, added nodes display as Pending for Approval in the Node Management pane. Right-click the node name, and then select Approve. More than one node can be approved at a time. After they are approved, nodes are in a Paused state. Additional configuration, such as software installation or the application of scripts, can be performed while the nodes are in a Paused state.

Note

It is also possible to automate the manual operating system installation process by using unattended text files or disk images with the Sysprep tool. Using disk images may require additional non-Microsoft software. For more information about automating Windows Server 2003 installations, see Windows Server 2003 Deployment Kit: Automating and Customizing Installations (https://go.microsoft.com/fwlink/?LinkId=50476).

Automatically adding nodes (RIS)

Automated node installation is appropriate for multiple-node deployments. In addition, automated deployments are useful when recurrent node imaging is required to reset compute nodes to a given state. Automated additions can only be performed if the following conditions are met:

The cluster includes a private network.
RIS is installed on the head node along with the RIS update.
A proper RIS image has been created for the node, and this image includes a valid license key.
The nodes being added have been configured to perform a PXE boot during startup before starting from local media. This step is usually performed through the BIOS settings for the node.

The person performing these procedures must have cluster administrator access rights. Use the following process to automatically add compute nodes:

Add the compute node to the cluster. On the head node, start the Compute Cluster Administrator. Select Add Nodes from the Node Management pane. Select Automated addition using Remote Installation Services (RIS) on the Select Add Node Method page. Select the image to use. If you are using retail installation media, a different license key is required for each node. You can do this by using a unique.udb file that lists unique keys for each system. When you have mapped the images to each computer, you can start RIS.
Reboot the nodes. When RIS setup is ready, you must restart each node manually to initiate the RIS installation process. Make sure the nodes start using PXE boot.

Note

When using retail operating system images, the order in which nodes are restarted is important. Nodes should be restarted during RIS reimaging in the order in which they were originally imaged.
Stop RIS. When each system has a new RIS image, that system restarts. You can then stop RIS in the Compute Cluster Administrator and finish the deployment process.
Approve new nodes. Newly added nodes display as Pending for Approval in the Node Management pane. Right-click the node name, and then select Approve. More than one node can be approved at a time. When approved, nodes are in a Paused state. You can perform additional configuration activities while nodes are in this state. When the nodes are ready, click Resume Node to restart the Job Scheduler on the new nodes.

Create an Administration Client Console

The most important cluster administrative tasks are performed from the Compute Cluster Administrator on the head node, For most day-to-day management tasks, however, you can install the Compute Cluster Pack client utilities, which include the Compute Cluster Server Administrator, on any remote (non-cluster) workstation running a supported operating system. Using this remote console, called a client console, you can perform routine tasks, like performance monitoring and job execution, without logging onto the head node.

To set up a client console, being by inserting the Compute Cluster Pack CD Into the workstation. The Microsoft Compute Cluster Pack Installation Wizard will start automatically. Then select Install Client Utilities Only, and click Next. For an administration console only, install the client utilities. For a development workstation, install the software development kit (SDK) as well as the client utilities.

Windows Compute Cluster Server 2003 installation is complete.

Administration

Administrators manage the compute cluster through the Compute Cluster Administrator. Administrators manage computational jobs through the Compute Cluster Job Manager.

The Compute Cluster Administrator

The Compute Cluster Administrator is based on MMC 3.0 Pre-Release and has five major pages:

Start page. This monitoring page displays the number of nodes and their status as well as the number of processors in use and available. The page also displays job information, including the number of jobs and their status.
To Do List. Similar to the To Do List displayed at the end of head node installation, this page supports the configuration and administration of the cluster, including networking, RIS, node addition and removal, and security settings.
Node Management. This page provides information about nodes and jobs in the cluster and supports the control of node tasks such as adding or approving, pausing or resuming, or restarting nodes.
Remote Desktop Sessions (RDP). This page is used to access compute nodes through remote desktop sessions. To simplify RDP sessions to multiple compute nodes, a global user name and password can be used when initiating RDP to the nodes.
System Monitor. This page displays performance monitoring data for the cluster, including processor time and jobs and processor statistics per node.

The Compute Cluster Job Manager and the CLI

Users can schedule jobs, allocate resources for jobs, and change the associated tasks through two tools: the Compute Cluster Job Manager and the CLI. The latter supports several languages for scripting, including the Perl, C/C++, C#, and Java™ languages.

Cluster jobs can be as simple as a single task or can include multiple tasks. In addition, jobs can specify which processors are required. Processors can be assigned exclusively to jobs or can be shared among jobs and tasks.

The Compute Cluster Job Manager includes powerful features for job and task management. Each feature has a corresponding equivalent in the CLI. Features include error recovery (that is, automatically retrying failed jobs or tasks as well as identifying unresponsive nodes), automated cleanup of jobs after they are complete, to avoid “run-away” processes on the compute nodes, and security (each job runs in the user’s security context, limiting job and task access rights to those of the user initiating them).

Note

For more information about the Compute Cluster Job Manager, see the white paper "Using the Windows Compute Cluster Server 2003 Job Scheduler" on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=55929).

Remote Command Execution

Windows Compute Cluster Server 2003 allows you to run commands remotely. This provides administrators with flexible execution of jobs and tasks.

You can issue a command line action on one or more selected compute nodes.

The command is immediately run through the Job Scheduler on a priority basis. The command can be any valid command line argument.

The credentials of the user submitting the command line action are used to execute the action.

You can use command line actions to run scripts and programs on compute nodes.

Launch Remote Desktop Connection

You can create Remote Desktop sessions to multiple compute nodes, switching between each node to perform required operations.

Through the Remote Desktop Session Properties interface, you can store a user name and password so that supplied credentials will be used whenever you start a Remote Desktop session to multiple nodes. These global credentials for desktop sessions are stored on the local computer from which you open the session. After the information is saved, future sessions will use the stored credentials.

Multiple cluster administrators can run the Compute Cluster Administrator remotely on different computers against a single cluster. For each administrator, the global credentials are stored locally on the computer that is running the Compute Cluster Administrator. Cluster administrators can use those credentials to access multiple sessions.

Open System Monitor

Windows Compute Cluster Server 2003 provides two categories of performance objects:

Compute Cluster: Cluster-specific performance counters
Compute Nodes: Node-specific performance counters

Each object exposes a rich set of performance counters to monitor different characteristics of the cluster and nodes. The cluster administrator can monitor compute cluster and node performance by running an instance of System Monitor from within the Compute Cluster Administrator.

Conclusion

Windows Compute Cluster Server 2003 allows users to create HPC clusters using the familiar Windows platform. Clusters can range in size from only a few compute nodes to hundreds of compute nodes. This paper outlines proven practices for the setup and preparation of Windows Compute Cluster Server 2003 clusters. Following these recommendations simplifies operation of the clusters after they have been deployed.

About the Authors

Danielle Ruest and Nelson Ruest (MCSE, MCT, MVP Windows Server) are IT professionals specializing in systems administration, migration, and design. They are the authors of multiple books, most notably Windows Server 2003: Best Practices for Enterprise Deployments (McGraw-Hill/Osborne Media, 2003), Windows Server 2003 Pocket Administrator (McGraw-Hill/Osborne Media, 2003), and Preparing for .NET Enterprise Technologies (Pearson Education, 2001). Both authors work for Resolutions Enterprises Ltd. (https://go.microsoft.com/fwlink/?LinkId=66871).