Step 1: Prepare for Your Deployment

 

Applies To: Windows HPC Server 2008 R2, Windows HPC Server 2008

The first step in the deployment of InfiniBand device drivers using a node template is to prepare the hardware and software that is needed during the deployment. The following checklist describes the steps involved in preparing for your deployment.

Checklist: Prepare for your deployment

Task

Description

1.1. Obtain the latest device drivers

Obtain the latest version of the InfiniBand device drivers with NetworkDirect support for your InfiniBand Host Channel Adapters (HCA) cards.

1.2. Install the InfiniBand hardware

Install the InfiniBand HCA cards in the head node and in all the compute nodes, and connect the InfiniBand fabric.

1.3. Install Windows Server on the head node

Install a supported edition of Windows Server on the computer that will act as the head node.

1.4. Install the latest device drivers on the head node

Install the latest version of the InfiniBand device drivers on the head node computer.

1.5. Start an InfiniBand subnet manager

If your InfiniBand switch is not already running a subnet manager, configure one to run on your network.

1.1. Obtain the latest device drivers

You can obtain the latest version of the InfiniBand device drivers with NetworkDirect support for your HCA cards by downloading them from the website of your InfiniBand hardware vendor, or by directly contacting your vendor and requesting them. For more information about hardware compatibility with Windows HPC Server, see Windows HPC Server Hardware Compatibility (https://go.microsoft.com/fwlink/?LinkId=208302).

Note

When available, obtain device drivers that have the Certified for Windows Server logo or that specifically support Windows HPC Server. If your hardware vendor provides different device drivers for 64-bit editions of Windows Server 2008 and for Windows Server 2008 R2, ensure that you obtain the drivers that are appropriate for the operating system that is running on the nodes in your Windows HPC Server cluster.

You can also download the latest device drivers that are published by the OpenFabrics Alliance (OFA), which work with most commercially available HCA cards. For more information and to download the latest device drivers that are published by the OFA, see The OpenFabrics Alliance (https://go.microsoft.com/fwlink/?LinkID=137347).

DISCLAIMER: Reference to any non-Microsoft products or websites is intended solely for informational purposes and does not constitute or imply any endorsement by Microsoft.

1.2. Install the InfiniBand hardware

The InfiniBand HCA cards have to be installed in the computer that will serve as the head node of your HPC cluster, as well as in all the computers that will serve as compute nodes.

Also, the InfiniBand fabric has to be installed and connected to provide all nodes in your cluster access to the application network. For more information about the installation of your InfiniBand fabric, contact your hardware vendor.

Note

This guide assumes that the head node connects to the application network that uses InfiniBand hardware, which is a common deployment scenario. Although it is possible to configure an InfiniBand network for your cluster that does not include the head node, the steps for doing this are beyond the scope of this guide.

1.3. Install Windows Server on the head node

To deploy the head node of your HPC cluster, you must start by installing a supported edition of Windows Server on the computer that will act as the head node, as follows:

Cluster

Windows Server edition installed on head node

Installation instructions

Windows HPC Server 2008

Windows Server 2008 HPC Edition, or another 64-bit edition of Windows Server 2008

Installing Windows Server 2008 (https://go.microsoft.com/fwlink/?LinkID=119578)

Windows HPC Server 2008 R2

Windows Server 2008 R2 HPC Edition, or another edition of Windows Server 2008 R2

Installing Windows Server 2008 R2 (https://go.microsoft.com/fwlink/?LinkID=194693)

Important

We strongly recommend that you perform a clean installation of Windows Server before installing Microsoft HPC Pack. If you want to install Microsoft HPC Pack on an existing installation of Windows Server, remove all server roles first and then follow the procedures in this guide.

Note

It is recommended that you obtain the latest device drivers for your head node computer from the website of your hardware vendors.

1.4. Install the latest device drivers on the head node

The following procedure explains how to update or install the InfiniBand device drivers with NetworkDirect support on your head node computer.

Membership in the local Administrators group, or equivalent, is the minimum required to complete this procedure.

To update or install the latest InfiniBand device drivers

  1. Using a network connection or removable media, copy to the head node computer the device drivers that you downloaded in 1.1. Obtain the latest device drivers.

  2. If you are updating the device drivers from a previous version, you must first uninstall the existing device drivers. To do this, in Control Panel, go to Programs and Features. Or, follow the instructions provided in the documentation that accompanies the device drivers.

  3. Install the device drivers that you downloaded by running the Windows Installer file that is included with the device drivers (.msi file name extension), or by following the instructions provided in the documentation that accompanies the device drivers.

    Note

    If included in the installation program for the device drivers, select the option to install and enable the NetworkDirect service provider interface.

  4. If the NetworkDirect service provider interface is not enabled, after installing the device drivers, open an elevated Command Prompt window. Click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

    To install the NetworkDirect service provider, type the following command:

    C:\Windows\System32\ndinstall –i
    
  5. To verify that the NetworkDirect service provider is properly registered, type the following command:

    C:\Windows\System32\ndinstall –l
    

    If the NetworkDirect provider is properly registered, it should be listed in the output of the command. For example, the output of the command should include a line that is similar to: 0000001011 - OpenFabrics Network Direct Provider. In some cases, the name of your hardware vendor is listed instead of OpenFabrics.

1.5. Start an InfiniBand subnet manager

Important

You do not need to perform this step if a subnet manager is already running on the InfiniBand network that will be used for your HPC cluster. Although it is possible to run multiple subnet managers on the same InfiniBand subnet, the resolution of conflicts can be a complex task. For this reason, it is recommended that you have a single subnet manager on each InfiniBand subnet.

Note

Many InfiniBand switches have an internal subnet manager that runs by default when the switch is powered on. You can run this type of subnet manager on your network, but on large clusters, with many compute nodes, the internal subnet manager that is running on a switch might run out of resources, causing intermittent connection failures, especially when applications that utilize the InfiniBand network are starting. If you are experiencing problems, discuss with your hardware vendor the possibility of starting a subnet manager on a computer, as explained in this section. A simple way to determine if your InfiniBand network is already running a subnet manager is to review the current state of the InfiniBand network connection on your head node, after you have installed and connected all the InfiniBand hardware and powered up all the switches. If the state that is reported is different than Network cable unplugged, then a subnet manager is already running. A Network cable unplugged state indicates that a subnet manager was not found on the InfiniBand subnet.

If you do not have a subnet manager running on your network, you have to configure a computer to run a subnet manager in the InfiniBand subnet that will be used as the application network for your HPC cluster. The computer that you choose to run that InfiniBand subnet manager generally depends on the size of your HPC cluster:

  • If you will be deploying a small HPC cluster (for example, 16 nodes or less), you can configure a subnet manager to run on the head node computer.

    Note

    The availability of system resources on the head node computer limits the number of nodes that a subnet manager that is running on this computer can manage. A common symptom of a head node computer that does not have enough system resources for a subnet manager is intermittent connection failures that occur, especially when applications that utilize the InfiniBand network are starting. This is a clear sign that the subnet manager needs to run on a different computer.

  • If you will be deploying a large HPC cluster, you need to configure a dedicated computer to run a subnet manager, or configure one of the computers that will act as a compute node to run a subnet manager at the highest available privileges.

    Important

    If the computer that will run an InfiniBand subnet manager will act as a compute node, first deploy that compute node together with the other compute nodes in your HPC cluster by following the steps in this guide, and then return to this section to create the task that will start the InfiniBand subnet manager on that computer.

A subnet manager software application is needed to configure and start a subnet manager on your network. This application is usually provided by your InfiniBand hardware vendor. If your vendor has provided a subnet manager software application, install it on the computer that will be running the subnet manager, by following the documentation that accompanies that application. If you install the subnet manager that is provided by your vendor on the same computer that will serve as a Dynamic Host Configuration Protocol (DHCP) server for your InfiniBand subnet, review the second example procedure in this section that explains how to ensure that the subnet manager starts before the DHCP server.

You can also use the OpenSM application that is provided by the OpenFabrics Alliance, which is available with some InfiniBand device driver distributions. The executable file for the OpenSM application is opensm.exe, and it will be used in the following procedure that explains the process of configuring and starting a subnet manager as a service in a computer that is running Windows Server 2008 R2 or Windows Server 2008. The procedure is only intended as an example. If you prefer, you can install and use the subnet manager provided by your vendor.

Important

The following procedure assumes that the latest InfiniBand device drivers have already been installed on the computer. It also assumes that the opensm.exe file is stored in a folder on the computer.

Membership in the local Administrators group, or equivalent, is the minimum required to complete this procedure.

Example: To start an InfiniBand subnet manager as a service using the OpenSM application

  1. On the computer that will run an InfiniBand subnet manager, copy opensm.exe to the System32 folder of the Windows installation - for example, C:\Windows\System32. Also, verify that the ibal.dll and complib.dll files are already stored in the System32 folder. If they are not, copy them also.

  2. Open an elevated Command Prompt window. Click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

  3. To register opensm.exe as a service that starts automatically when the computer starts, type the following command:

    sc create opensm binPath= "%SystemRoot%\system32\opensm.exe --service" start= auto DisplayName= "IB Subnet Manager"
    

    Important

    The blank spaces after the equal signs (“=”) are required by the sc command-line tool.

  4. To start the service that you created, type the following command:

    sc start opensm
    
  5. To verify that the service that you created is running, type the following command:

    sc query opensm
    

    If the service has successfully started and is running, its state will be listed as RUNNING.

Additional considerations:

  • You can configure the service to start with high priority, by including the start /high command. For example:

    sc create opensm binPath= "start /high %SystemRoot%\system32\opensm.exe --service" start= auto DisplayName= "IB Subnet Manager”
    
  • You can specify that the service must be restarted within five seconds if it fails, by typing the following command:

    sc failure opensm reset= 30 actions= restart/5000
    
  • For more information about the sc command-line tool, see Sc Commands.

If you are running the subnet manager on the same computer that will serve as a DHCP server for your InfiniBand subnet (for example, the head node computer), you must ensure that the subnet manager starts before the DHCP server. Otherwise, the DHCP server might fail to create the proper bindings for the InfiniBand subnet. The following procedure gives an example of how to configure OpenSM to start before the DHCP Server service, on a computer that is running Windows Server 2008 R2 or Windows Server 2008. If you have installed a different subnet manager, you can use this example procedure as a reference, and configure your subnet manager and DHCP server similarly.

Important

The following procedure assumes that you have performed all the steps in the previous procedure in this topic and have already configured OpenSM to run as the opensm service.

Membership in the local Administrators group, or equivalent, is the minimum required to complete this procedure.

Example: To configure OpenSM to start before the DHCP Server service

  1. On the computer that is running OpenSM as a service, open an elevated Command Prompt window. Click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

  2. To stop the DHCP Server service, type the following command:

    sc stop dhcpserver
    
  3. To make the DHCP Server service start only after the opensm service has started, type the following command:

    sc config dhcpserver depend= RpcSs/Tcpip/SamSs/EventLog/EventSystem/opensm
    

    Important

    The blank spaces after the equal signs (“=”) are required by the sc command-line tool.

  4. To start the DHCP Server service, type the following command:

    sc start dhcpserver