Using Windows Deployment Services with Compute Cluster Server
Applies To: Windows Compute Cluster Server 2003
This paper is designed to cover how to use Microsoft® Windows® Deployment Services (WDS) to deploy Microsoft Windows® Compute Cluster Server 2003 (WCCS). Although the example cluster in this document is rather small, the principles employed in creating, imaging, and deploying systems apply to clusters as large as you care to have. In addition, the example in this document is specific to high performance computing (HPC), but you could just as easily be deploying email servers or 5000 new Windows VISTA® desktops – the same general techniques apply.
There are good reasons for using WDS instead of the tuned version of Remote Installation Services (RIS) that comes with Windows Compute Cluster Server 2003. The first is that WDS represents the future of Windows imaging where everything is based on a Windows Image file (WIM) format. Understanding the new imaging techniques and tools will be key to properly managing your environment. Secondly, the WIM format allows for tremendous flexibility of image deployment – because the image format is file based, you can push an existing image to a machine different from the one that the image was originally taken. Say, for example, you created an image of a system but wanted to deploy that image to a new server type with a different hard disk configuration and previously unknown/unused network interface cards (NICs). In the RIS world, you would have to start again - take that new machine, make it an image master by installing everything by hand again and then capture that whole new image back to disk. Depending on how complex that is, you probably just spent another 4 to 8 hours building an image again. With WDS and WIM based files, you can simply use the existing image, “mount” it on the technician computer and inject the new drivers you need into it. Even if it took time to properly inject the new drivers, it would most likely take far less than one hour to complete. Happily, as soon as you install SP2 for Windows Server 2003, you will get the goodness of WDS by default – Service Pack 2 (SP2) replaces RIS with WDS while still allowing for backwards compatibility with RIS functionality.
After spending time with WDS, I can now push an image to a new server and be booted and running in about 8 minutes. This is WAY cool.
It is important to distinguish between the various terms being used to describe the imaging process. Depending on whose process or technique you are using, there are a lot of ways to image a system and lots of different jargon to describe each. I am defining the ones I use here for those who are new to using the Microsoft tools.
The word *image* is used throughout this paper to describe a couple of different concepts. First, it is used to explain the process by which you will capture the entirety of a master computer. “Bob, I’m going to go ‘image’ that new Windows VISTA PC” translates into “Bob, I’m going to go capture an image file of that new Windows VISTA PC”.
Secondly, *image* is used to define the physical file that is captured from the master computer as part of the process. You will *image* a system and then move the image file to some other place to be deployed.
The master computer is the machine that is the basic system you will build up to create an image. You setup this PC/Server with all the appropriate settings, configuration and software you need so that it looks the way you would want it to if the next person to touch it just got it from the factory. Keep in mind that some settings are user-specific and will be lost by the preparation process. This loss of fidelity can be partially/totally overcome by a combination of Group Policy, unattended installation files, and post setup installation techniques (that are triggered by policy or unattended files).
The technician PC is the one on which you load all of your tools. On this machine, you will load the Windows Automated Installation Kit (WAIK). You will most likely also create and burn-to-disk/CD/DVD some of the image files you will use to image new systems.
When using WDS, there are four distinct image types you will be using to capture and deploy system images.
The first is a Windows Pre-Installation Environment (WinPE) image. This is a mini version of Windows (limited GUI, command-line only) that allows for system maintenance and in this case it lets us boot a machine to a usable state without touching the files located on the local hard drive. You create this file on the technician machine to start the imaging process.
The second type of image file is a boot image; based on the physical file, boot.wim. This file is located on any Windows VISTA or Windows Server 2008 disk in the \x86\sources folder or the x64\sources folder. It is distributed by Microsoft only and cannot be generated by anyone else currently. This file is also WinPE based and is used by WDS to “boot” the target machine for your deployment and then kick off the remote installation process. The key difference between this file and the WinPE disk you will create later is that the WinPE version, if executed on a target system, will merely dump you to a command prompt where using a Boot.wim executes installation of the final image type.
The third type of image is the installation image. This is the image file is captured on/from the master PC and will be deployed to other machines. This image, like the boot image can be updated after it is created to incorporate changes in your environment or process. (For example, you can mount it and add new device drivers if you need to).
The final image type is called a discover image and will not be covered in detail in this paper. You can read more about it in documentation for the Windows Automated Install Kit, or download the whitepaper Deploying and Managing the Windows Deployment Services Update on Windows Server 2003. With the discover image, you can capture the contents of the master machine in an automated fashion but with less control than if you use your own WinPE disk.
To recap, there are 4 images we care about here: 1) a WinPE image that we use to boot the master machine and capture the state of that machine; 2) a capture image, derived from the boot.wim, that you can use as an alternative to automate capturing the state of the master machine; 3) a boot image to boot the target system during deployment; 4) and finally, the installation image which is pushed to the target system and consists of the actual copy of the master you are now deploying elsewhere.
How does this work?
The whole idea of imaging and deployment seems convoluted since it is not entirely clear sometimes just what it is you are doing.
The key here is to understand one basic fact; you cannot capture or deploy an image of a Windows-based system while the system is booted and running from its own hard disk. There are too many moving parts, and capturing the state (or updating the state if you are deploying) of such a system is impossible. Think of trying to move your \Windows folder while you are using that system – half the files are locked open by some system process.
Therefore, you will be using the different image types mentioned above to capture and deploy the entirety of the master system to a new target machine. The difference here is that you will be booting the target system from a Boot image that is loaded from CD/DVD or from over the wire and using that copy of WinPE to lay down a new operating system image on the local hard drive. If I cannot make a copy of a PC while Windows is loaded from the local HD, then I similarly cannot push an updated version of Windows to the local HD for the same reason.
So how does this all work? First, you will boot the WinPE image described above and capture an image of some PC or Server that is your source (master) system. Next, once that captured image is loaded back into WDS and made available, the boot image is loaded on the new target system. Settings and choice of Installation image can be fully automated to make the process manageable and used to push/pull the Install image to that new server. Finally, your settings and options will kick off the process of laying down the install image which eventually gives you a “mirror” image of the master machine.
The power and flexibility of WDS comes to life, though, through automation. Using the basic tools to image one system once is not very interesting and only mildly useful – many examples you will find on the Internet demonstrate this scenario. However, what happens if you have to image, deploy and manage 400 servers in an HPC cluster or other large environment? This technique is not useful since it overlooks the vagaries of performing a mass deployment. You would have to have automation tools in place to make this work. This is where WDS shows how flexible and powerful it can be.
There are two basic types of automation. The first is the WDS Client piece. The boot image that is pushed to the target system by WDS and boots that computer (referred to collectively as the WDS Client) can be automated using a unattend.xml file. The name of this file can be whatever you want but for our discussion here, it will be unattend.xml. This file, as the name implies, is XML based and used to automate the process of getting the target system to a spot where the Installation image is loading hands-free. For example, if you manually walk through a given Windows installation setup today, you will most likely be asked to choose a language, prepare your hard disk for installation, then choose what partition to install to. All of this can be handled ‘automagically’ for you with the unattend.xml file. You will choose a number of settings that will be used for ALL clients connecting to the WDS server to load a boot image. You can even enforce the choice of which Installation image to use. Connect that with tweaks to your PXE settings on the target systems and your installations can be literally hands-free.
One key piece of the installation puzzle that is not thoroughly documented is that when using WDS to deploy ‘down-level’ clients (for example Windows XP and Windows Server 2003) you are going to have to mix the use of WDS/WIM style imaging with the older sysprep/unattend style unattended installation files. This is REALLY confusing since all the WDS documents talk about unattended installation as if you are only ever going to install Windows VISTA or the upcoming Windows Server 2008.
The second type of automation then is that used to speed up the mini-setup/Out of Box Experience (OOBE) setup of the image being deployed. In pre-Windows Vista versions of Windows, you would use a combination of sysprep.inf and other unattend.txt files to do this. In addition, when using WDS to deploy pre-Windows VISTA versions of Windows (including Windows CCS), that does not change. These files contain information such as what the local administrator account should be, whether to join a domain, the default phone settings and your product key. You could also do post setup configuration in this scenario and that is also still supported.
For example, RIS-style unattend documentation describes that you will build a directory structure underneath your installation files called \$OEM$. Under that you will add \$1\Sysprep to perform post setup installations of drivers and configurations (and load additional drivers and packages that need to be installed on the clients). You will still need to create these same structures underneath the folder containing your Installation image so that the same kinds of post-installation configuration can take place. This will be demonstrated later in this paper.
Now for the current Windows versions, this second type of unattended installation file is referred to as an ImageUnattend file. It is unique in that from the command-line or the GUI you can associate a specific unattend file per image, which gives you the flexibility to have a high-degree of control over the settings for various images. These files also allow for some Macro settings that will pull required information (like Domain name and Computer Name) from the WDS server during OOBE, which really makes the infrastructure flexible.
I am describing my environment below to give some idea of what is needed on both the Head Node (HN) and the individual Compute Nodes (CN) in order to properly deploy and manage a Windows-based cluster. As noted earlier, this information can be used to deploy 25 Exchange Servers or 5000 new Windows VISTA Desktops. Currently, the version of WDS shipping for Windows Server 2003 does not support Multi-casting which limits its effective use to deploying ~50 nodes or less (your mileage will vary) at any one time. However, the version that ships with Windows Server 2008 DOES support multi-casting, which means you will have the flexibility to deploy much larger numbers of servers more easily. Current benchmarks target serving ~200 machines at a time with the upcoming multicasting version. Incidentally, there are several features of the Server 2008 version, which make it compelling to consider even while in ‘beta’. The most interesting is the ‘always on’ support for multicasting where clients can request an image at any point in time and trigger a new multicast deployment or join mid-transmission to an existing deployment and still receive all the data. Please verify you have a license from Microsoft that allows you to roll any of this into (pre)production first though.
The testing for this paper is based on a five node Windows CCS cluster – one head node and four compute nodes (plus 2 admin nodes – 1 for WDS and one for experimenting with System Center Essentials (SCE)). Each node is powered by an AMD® Athlon™ 64 X2 processor configured with 4 GB of RAM. To successfully use a cluster, you should have at least two networks (you can use one, but the traffic latency over the wire may pose problems) and preferably 3 networks – one public network for accessing information off-cluster, a second ‘private’ network for cluster administration and the third network is a data network for moving MPI-based data between nodes. My cluster consists of two networks, a public and a private network. In my case, the cluster uses the private network for both data traffic and node management. My private network is a Class A net with all nodes using a Realtek NIC with the PXE boot-prom installed. It’s important to have a fully validated NIC with in-box driver support before you start this process to avoid hand-adding the drivers during installation or having install failures. For example, systems that use the nForce4 chip-set typically have a network bus enumerator device that sits in front of the NICs themselves on the motherboard. When you boot WinPE on this system, network connectivity will fail since WinPE cannot see the NICs because the bus device (and its driver) cannot be loaded until a full install of Windows completes and the operating system has time to properly enumerate the bus and then load the correct drivers. In other words, make sure you have all the drivers you need handy and that they do not require any special-case installations.
This diagram gives you a better idea of how my network is laid out.
For this little cluster, I chose not to use Internet Connection Sharing but for real-world work, I highly recommend it for both security and traffic management purposes. You will want to limit access to your HPC cluster to protect sensitive information that is being used by the cluster but also to manage the cluster to prevent random users from consuming resources that are better spent on known and defined workloads. Having the nodes in a private/child domain is part of the solution but if the nodes cannot be seen from the outside world to begin with, then all the better.
If WDS is running on the head-node, then the HN must be configured with two or more hard disks to hold all the boot and installation images. This second drive must be an NTFS drive that is physically separate from the operating system drive for performance reasons (not just a partition of the first drive). In addition, if you add Windows Server Update Services (WSUS) to keep your cluster systems updated, then you will need this extra set of disk(s) to hold your updates as well. Side-note, keep in mind that if your HN is Windows Compute Cluster Server 2003 versus Windows Server 2003 Standard Edition, your license may strictly prohibit you from running SQL on CCS nodes which means you will need another server for WSUS/SCE that is running at least Windows Server 2003 Standard Edition. I discovered this while trouble-shooting some odd connectivity behavior.
It is also understood that the private and data networks need to be as fast as possible – GB/e or better. In the best scenario, your data network will be 10GB/e or InfiniBand (or Quadrics) based. Apart from the need for speed for your HPC applications, blowing a 4GB image over a slow network is painful to endure. Save yourself the agony and go with the fastest connections you can get.
The last gotcha I will warn you about is that when you are setting up your nodes, you should consider the network binding order very carefully (Control Panel -> Network Connections -> Advanced -> Advanced Settings -> Adapters and Bindings). I ran into any number of issues related to the fact that the binding order was different on several of the nodes. This resulted in cases where traffic that was meant for the head-node would end up on the public network resulting in connection failures when the head-node could not be properly contacted. To correct this I was zealous about insuring that all servers had exactly the same binding order and that cluster network was the first one.
Where to Begin?
To make this process work, we will need to set up a head node that is also a Domain Controller, DHCP Server, Wins Server, DNS Server and optionally, WSUS/SCE. Alternately, if your HN is a version of Windows CCS you will need to move WSUS to a server running Windows Server 2003 Standard Edition/Enterprise Edition because of licensing restrictions. Installing WSUS/SCE is a very modest way to ensure that all systems are up to date with the latest critical patches. Using WSUS is also cost-effective since it is free and you do not have to buy additional software as you would for Operations Manager or Configuration Manager.
Step by Step Instructions
Step 1: Install and Configure Windows on the Head Node (HN)
Install the operating system – Either Compute Cluster Server or your other choice. Keep in mind that CCS is a subset of Standard Server and is licensed differently. If you want to run WSUS for updates, you will need a version of Windows that is licensed to run SQL on.
Install ALL updates – After you have the operating system installed and running, you will want to connect to Microsoft Update and make sure you have all the lastest patches and updates installed.
Check Device Manager – If your head node is pre-installed at the factory, you can probably bypass this step. However, for clusters you are building from whiteboxes or where your peripherals are newer than the operating system version, you will want to ensure that no devices are banged out. If they are, get the correct drivers installed before proceeding.
Configure default gateway and network order – Assuming that you have at least two networks available for your cluster, you will need to configure the TCP settings on the NIC attached to the private network so that there is no default gateway. Windows determines where to route traffic based on the gateway setting. If you have the private network, configured first in order, Windows will search that network first to locate resources and failing to locate them here will send the request out to the default gateway.
Test connectivity – Do a basic connectivity check and see if you can connect to the Internet. If not, check your proxy settings, your default gateway and look under Advanced Settings to see if the network order is correct (the private network should be first). Alternately if you something sitting on either network, see if you can connect to it.
Install and Configure DHCP, DNS and/or WINS – In order to serve PXE requests via your WDS server, you will need to have DHCP installed somewhere on your cluster network. For simplicity I installed it on my HN (but ended up moving it to the WDS Server) which was already acting as a domain controller for my cluster domain (more on that below). You will also need to install DNS and (optionally) WINS for Host Name resolution. I installed these on my HN as well to keep the number of servers I need to manage low. Since I can predict a low volume of traffic generated to these services, I was not worried about running them on a Domain Controller (DC). Optionally, you can leave DNS installed, but unconfigured. When you promote this server to Domain Controller, the upgrade process will force the necessary configuration anyway.
DNS – You can leave it unconfigured for now if you wish. However, for completeness sake I built out both a Forward and Reverse Lookup zone that covered only the private network for my cluster. On the interfaces tab of the DNS properties dialog for the server I also removed the public interface from being listened on – I did not want to have my DNS server responding to public network requests.
WINS – I tend to browse and use network resources based on NetBIOS (or friendly) names. Using WINS makes this possible.
DHCP – I configured a scope using the wizard and specifically did not include a ‘router’ (a.k.a. gateway) in my options for this scope so that Compute Nodes (CNs) would be able to reach the corporate network if necessary. Also, ensure that your DHCP server is only listening on the Private Network interface to prevent it from serving private network addresses to nodes on the public network. (DHCP MMC snap-in -> Right-click on your server and go to Properties -> Advanced -> Bindings)
Run DCPROMO – For security purposes I decided to create a child domain in the greater corporate forest. This allowed me to tightly control who has access to the cluster resources, meet corporate security and governance policies and make it easy to administer the cluster.
Install the Windows Automated Install Kit (WAIK) – the WAIK has all the tools and information you will need to build out the various images you will be booting or deploying later on.
- You will likely need to install MSXML6 and the other prerequisites that are included on the WAIK disk.
Install Windows Deployment Services.
If you have NOT installed Service Pack 2 for Windows Server 2003, follow these steps:
WDS is an upgrade to RIS; therefore, RIS must be installed already. Go to Add/Remove programs under Control Panel and install RIS there. DO NOT CONFIGURE RIS though. The WDS installation and upgrade will take care of this.
Once WDS is installed, click on Start, All Programs, Administrative Tools, Windows Deployment Services. DO NOT Select Windows Deployment Services Legacy. This is a link back to RIS and using it is not recommended now that you have upgraded to WDS. Using it will put your server into Legacy Mode, which is undesirable if you are trying to boot Windows PE 2.0. Add your server in to the list in the MMC and then start the configuration process.
Basic Settings - If you have a configuration like mine where the headnode/WDS server also has DHCP installed, you will need to select the options for Not listening on port 67 and configuring DHCP Option Tag #60 to ensure that clients are properly given an IP address during boot and then directed to the local WDS server to start the imaging process.
PXE Server settings – you will need to decide how you want the server to respond to clients. I configured mine to respond to all client queries and to auto-add the machines.
- If you HAVE installed Windows Server 2003 Service Pack 2 then you simply go to Add/Remove in Control Panel and add the WDS role in there. SP2 Removes RIS as a feature and you will only see WDS as an installable option.
Once WDS is installed click on Start, All Programs, Administrative Tools, Windows Deployment Services. DO NOT Select Windows Deployment Services Legacy. This is a link back to RIS and will fail now that you have upgraded to WDS. Add your server in to the list in the MMC and then start the configuration process.
Basic Settings - If you have a configuration like mine where the headnode/WDS server also has DHCP installed, you will need to select the options for Not listening on port 67 and configuring DHCP Option Tag #60 to insure that clients are properly given an IP address during boot and then directed to the local WDS server to start the imaging process.
PXE Server settings – you will need to decide how you want the server to respond to clients. I configured mine to respond to all client queries and to auto-add the machines.
Step 2: install Compute Cluster Pack on the Head Node
Walk through the installer, reboot as necessary.
Install all patches and updates even if you are not using them currently. In addition, in the case of the RIS patch, you will never use it but the installer will not proceed unless you put it on your system.
I saved all of the patches (MMC, ICS, RIS) to the local drive for later use – I found that when my private network was first in the binding order that nodes would attempt to download (I mentioned earlier the need to properly configure the network gateway – this was the main manifestation) these patches during client installation of CCP and would fail.
Note too that if you have SP2 loaded that many of the needed patches will be included with it or with updates downloaded already.
A Word of Caution. I do not like the idea of running MSDE on my servers when I can get a spiffy copy of SQL 2005 Express (complete with current SP’s and Management Consoles) so I installed SQL Express first. Then when I started the CCP install, setup still insisted on trying to install MSDE anyway. So the short story is: do not help, let WCCS install the version it likes.
Configure the Head Node – when setup for the Compute Cluster Pack completes, you will be shown a To Do list of administration tasks. Walk through the “Configure Cluster Network Topology (wizard)” configuration wizard and define your networks (Public/private/data). You may be asked about firewall options. I chose to disable the firewall on my private/data network and keep it enabled on the public network.
Ignore everything you see about setting up RIS – WDS is a superset to RIS and we will not be using it.
- If SP2 is loaded, you will see notations in the Cluster Administration MMC that there isn’t an image installed under RIS – but you will not see warnings or cautions about installing RIS to begin with.
Step 3: Install and Configure a Compute Node (Master Node)
Your next step is to build out a master computer, which you will use to image the rest of your nodes. You are going to build this machine the way you want all your nodes to look. Then you are going to image it and capture that image back to the HN (or wherever your WDS Services are running) and push the image out to CNs.
Install Windows manually on this node.
Install all relevant updates. Connect to the Windows Update site and install all pertinent patches for your system. This is where having WSUS, or System Center Essentials running is going to help since now you have had to download potentially 1 GB worth of updates and having an in-house patching mechanism would speed this up considerably. When I rebuilt my test infrastructure post SP2, I found I needed ~500MB of updates (SP2, Visual Studio, IE 7.0, etc) PER MACHINE.
Check Device Manager and correct any banged out devices. I found that the motherboards in my systems were newer than the operating system and so I had lots of missing devices. This is reason number 3 why using WDS saves you time and money since you will do this for the first node, then all the rest of the nodes get all these updates too at no extra labor/time cost to you.
Ensure that the Public, Private and Data networks are correctly configured – I use DHCP for the Private and Data networks and DHCP is enforced by IT on the public network. This is fine since it avoids the potential issues with having the same hard-coded IP show up on all your nodes, for example.
Join the Compute node to your HPC domain and reboot when prompted.
Install the Compute Cluster Pack - Connect to the head-node and run Setup.exe from the \CcpInstallShare.
I choose to “Join this server to an existing compute cluster as a compute node”.
I also opted to install the client utilities.
Check your work. At this point, you will want to verify that the CN is properly communicating with the HN. Go to the head-node and see if the compute node shows up properly in Compute Cluster Administrator under Node Management. If you have something like a cluster ping-pong utility for checking connectivity you can run that here.
Build a tools folder - You will need a folder on the local C drive to place tools (potentially) and to capture the client image with and to store the captured image back to that you will later push back to the head node. I chose to store them in a folder called c:\tools. Later when you are creating the first WinPE capture image, you will add this folder to the [ExclusionList] section of WIMScript.ini to keep it from being captured in your compute node Install image.
Last Step: Once you have completely installed everything and are satisfied with the configuration and settings, then “Sysprep“ the compute node. Since we’re going to automate installation, you will sysprep this system to ‘generalize’ the master image and thus when each compute node boots the first time it will be like it came from the factory. To get around the problem of product keys and making installation hands-free I employed RIS style automation in the form of a sysprep.inf file. More on this shortly. (Get sysprep.exe from your Windows installation CD under \tools\deploy.cab)
Run SYSPREP and choose RESEAL. It is important to note that you must choose ‘reseal’ versus the option for Factory when the sysprep dialog box appears. Once you choose factory, the image maintains most of the current settings and bypasses mini-setup/oobe-setup. However, in order to customize your configuration you need to be able to pass the parameters you specify in your unattended files to the image as it boots the first time. Doing so only works when you choose Reseal.
Once you have started Sysprep, the utility will prepare the master computer and then shutdown the machine.
Remember that since we are essentially building the prototype from which all other nodes are derived, it is important for this node to be both as specific and as vague as possible. ‘Specific’ in that this node should be as complete as possible with all driver, configuration and related tools and information that will be common amongst all nodes. In addition, it should be ‘vague’ in that we want to use as few hard-coded settings (like static IP addresses, etc) that will cause an instant conflict and thus wasted admin time to correct.
Step 4: Start the Imaging Process – build a WinPE Boot disk
Below I have detailed how to create a WinPE boot disk manually on the assumption that many users will have a need to customize the image and perform some sort of custom action with the completed disk. However, there are certainly occasions where this is not necessary so I have added an alternate Step 4 that walks you through creating a Capture image to do the same thing.
Install the WAIK - if you have not done so yet, download and install the Windows Automated Install Kit, which includes a Windows Deployment Services version for Windows Server 2003 SP1 (this also assumes you have not installed SP2). This should be done on your technician computer.
Build a WinPE RAM Disk – the next stage of the project will be to construct a WinPE boot disk that you will load and run on the master system you just SYSPREPed. The following steps were taken, almost verbatim, from the “Windows Automated Installation Kit (Windows AIK) User’s Guide” (which is installed when you load the WAIK on your technician computer). With the guide on the screen in front of you, search on the phrase: Walkthrough: Create a Bootable Windows PE RAM Disk on CD-ROM. (or click that link for the latest online version) Please double-check the docs to make sure some important step has not changed.
On your technician computer, click Start, point to All Programs, point to Windows OPK or Windows AIK, and then click Windows PE Tools Command Prompt.
The menu shortcut opens a Command Prompt window and automatically sets environment variables to point to all the necessary tools. By default, all tools are installed at C:\Program Files\Windows AIK\<version>\Tools, where <version> can be Windows OPK or Windows AIK.
At the command prompt, run the Copype.cmd script. The script requires two arguments: hardware architecture and destination location. For example,
copype.cmd <arch> <destination>
Where <arch> can be x86, amd64, or ia64 and <destination> is a path to local directory. For example,
copype.cmd x86 c:\winpe_x86
The script creates the following directory structure and copies all the necessary files for that architecture. For example,
\winpe_x86 \winpe_x86\ISO \winpe_x86\mount
Add additional customizations - This step is optional but recommended. You can add applications and scripts to your Windows PE image that you may need while working in Windows PE. The following is a list of common tools to include in your Windows PE image,
ImageX - A tool for capturing and applying images during deployment scenarios. For example, at a command prompt, type:
copy “c:\program files\<version>\Tools\<architecture>\imagex.exe” c:\winpe_x86\iso\
Package Manager (Pkgmgr.exe) - A tool for servicing Windows image (.wim) files offline. You must copy the entire \Servicing folder and MSXML6 binaries. Offline servicing requires ImageX. For example,
xcopy “c:\program files\<version> \Tools\<architecture>\Servicing” c:\winpe_x86\iso\Servicing /s copy %windir%\system32\msxml6*.dll c:\winpe_x86\iso\Servicing
Note You will need to pull these files from a 64-bit machine if you are building out this disk for a 64-bit system on a 32-bit PC.
Where <version> can be Windows OPK or Windows AIK and <architecture> can be x86, amd64, or ia64. In both previous examples, the tools are not loaded into memory during a Windows PE RAM boot. The media must be available to access the tools.
Create an exclusion list - This step is optional, but recommended if you include ImageX as part of your Windows PE image. During an ImageX capture operation, some files may be locked, which will cause ImageX to fail. You can exclude specific files from being captured by creating a configuration file called Wimscript.ini. A configuration file is a text file; the following is a sample configuration file that includes common files that you must exclude during a capture operation.
Create a configuration file called Wimscript.ini by using any text editor (for example, Notepad).
[ExclusionList] ntfs.log hiberfil.sys pagefile.sys "System Volume Information" RECYCLER Windows\CSC [CompressionExclusionList] *.mp3 *.zip *.cab
You can add additional files or directories that you intend to exclude during a capture operation. For more information about configuration files, see Create an ImageX Configuration File.
Save the configuration file to the same location as ImageX as specified in step 2. For example,
ImageX will automatically detect wimscript.ini only if it is saved to the same location.
Create a bootable CD-ROM - This step describes how to put Windows PE RAM disk onto a CD-ROM. This option requires that you create an .iso file by using the Oscdimg tool.
On your technician computer, create an .iso file with Oscdimg. At a command prompt, type:
oscdimg -n -bc:\winpe_x86\etfsboot.com c:\winpe_x86\ISO c:\winpe_x86\winpe_x86.iso
Burn the image (winpe_x86.iso) onto a CD-ROM.
Step 4 Alternative: Start the Imaging Process – Build a Capture Image
While the steps above are great for fine-grained control over how and what is put onto your WinPE disk, you can also automate much of this process by creating a capture image. A capture image is a mostly automated process for booting and capturing an image of the master computer. Using the steps below, you will create a capture.wim file, which will be added to the list of boot images on the WDS server. The master machine will boot from this image and an automated capture process immediately begins.
Right click on an existing boot image in WDS management UI and select “Create Capture Boot Image.”
Right-click on the Boot Images folder and select Add Boot Image. Add the newly created capture image.
Step 5: Capture an Image
Boot the master computer with your bootable Windows PE media (put the PE disk in the CD drive and explicitly choose to boot from it during POST)
Once the PE disk is fully booted, an open command prompt window should be displayed. Note that you will have several drive letter choices. The command window should show you at Drive X:\. You will find that the hard disk is at C:\ and that your DVD is at D:\ or some nearby drive location. In the command prompt window type the following command to capture an image of your master computer:
imagex /compress fast /check /scroll /capture c: c:\myimage.wim "Title Goes Here" "Image Description Goes Here"
When the imaging is complete, map a drive from the command line of your WinPE environment back to your head node and copy the image there. I tested my WinPE disk on several different systems and did not encounter a situation where the WinPE image did not have suitable network drivers installed. If you find that you do not have network connectivity from here, this is most likely caused by the WinPE image not having the correct drivers for your NIC. See the documentation here as a starting point for injecting drivers into a WIM file. You will need to mount the WinPE image you created above and inject the correct drivers for your NIC(s) into it. Once completed, you will need to burn a new WinPE boot disk using this updated image. Once you have that new disk, come back to this machine, boot up with it and copy the WIM file to your head-node.
For example, on the master node with the WinPE environment running you should have a command prompt that looks something like this:
From there you would map your drive
Net use j: \\<headnode name>\<some share> /u:<domain>\<some account with write access to the share you just named>
You will be prompted for your credentials. Once entered and you have the mapped drive, do this:
X:\windows\system32>j: <hit enter> J:\>copy c:\<name of directory on master where the WIM was copied to>\hpcnode.wim (or whatever you called your WIM file)
Step 5 Alternative: Use the Capture Image to, well, capture an Image
Boot the master computer with your updated Capture image – the one you alternately created in the second Step 4 above. During POST, choose to PXE boot and when the choice of boot images appears, select the Capture Image you created above.
Once the PE image is fully booted, the WDSCapture utility will auto-magically kick off and guide you through capturing an image of the master computer.
- The first choice you have is which volume to capture. If no volumes are present, you have forgotten to Sysprep the master computer. Run Sysprep on it and choose Reseal then start this section again.
Follow the wizard to give your image a name, description, location to save to and additionally have it upload to your WDS server.
If the utility did not kick off properly in step 2, you will need to modify the captured WIM file. To do this, Take your new capture.wim file and mount it.
Create a directory on your technician computer named something like
Type the following inside a Windows PE Tools Command Prompt Window (available off the start menu under the WAIK folder). This will mount the WIM file to the folder you created above and allow you to browse or manipulate the files inside it as if they were actually part of your local file system. The
mountrwswitch allows you to read and write to the mounted image.
imagex /mountrw c:\testpe\winpe.wim c:\mounted_image
Using Windows Explorer, navigate to the %systemroot%\system32 folder in the \mounted_image directory and create a text file called winpesh.ini.
Add the following 2 lines to this file and save it.
Next unmount your WIM and commit the changes by typing:
imagex /unmount /commit c:\mounted_image
You will need to add this WIM file to your Boot Images on the WDS Server. Open the WDS Server management MMC, right-click on Boot Images, Add Boot Image.
Finally, boot the master computer up from the network and load this image when offered a boot image choice by WDS. The Capture utility will immediately kick off.
Step 6: Prepare to Deploy - Deployment Automation
Now that we have captured the image of the basic CCS server we want to distribute, we will want to explore automating the deployment process as much as possible to avoid having to go 256 nodes and manually select setup/installation options. That would defeat the purpose of using WDS to begin with.
At this point I will reiterate that there are two distinct pieces of setup that we will need to script. The first is referred to as the WDS-Client, which is the piece that gets downloaded from the WDS server over TFTP to the target CN and kicks off the loading and execution of the boot.wim. For this piece of setup we will choose such things as whether or not to partition the hard disk of the target system, what language to use for setup, what administrator password should the client use to connect back to the server, etc. The final choice that we can consider automating is the choice of Image to deploy to the clients/target systems. In a large WDS deployment, it is possible and even likely that you will have a number of distinct images that can be deployed. By scripting this part of installation, we can force the selection of one of these images. For this type of automation, we will be using WDS-style unattend.xml files
The second type of automation that we will need to consider is the OOBE (Out Of Box Experience) for the install image we captured. Remember that we ran Sysprep on the master image, which effectively removed the server’s domain affiliation, local settings and reset SIDs (Security ID Descriptors) on the master machine. For this type of automation, we will rely on RIS-style sysprep.inf files and related infrastructure. Here we will be electing to join the domain that the cluster sits in, be configuring proxy settings, and in the case of the image being deployed to hardware for which the image does not contain drivers, script the additional driver installs here as well. In my case, I imaged one machine but was deploying back to other nodes, which were identical so I did not need to do the additional steps.
Given the number of possible permutations for each of these files, I will simply list out both of my default files here, discuss them and then give you the link for more information. Scott Dickens of Microsoft (lead program manager for WDS) has written an in-depth whitepaper covering all the possible variations you could ever need for automation. Go here for his excellent treatise on the subject.
WDS Client Unattend File
This is a copy of the WDS-Client unattend.xml file (aka boot image unattend file) I use to specify the settings and install image that is to be deployed on the target system. I have highlighted the important bits to illustrate some key items to keep in mind and to call out some different options.
The file is straightforward to read and makes sense once you have parsed it all. Notice at the top that the processor architecture is set to AMD64. This is important to distinguish. In my case, I am using a 64-bit version of WinPE, the boot image and of course, the install image are also 64-bit. If you use WDS for a 32-bit image or one for Itanium, you will need to adjust that parameter accordingly.
<?xml version="1.0" ?> <unattend xmlns="urn:schemas-microsoft-com:unattend"> <settings pass="windowsPE"> <component name="Microsoft-Windows-Setup" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" processorArchitecture="amd64"> <WindowsDeploymentServices> <Login> <WillShowUI>OnError</WillShowUI> <Credentials> <Username>Administrator</Username> <Domain>HPC</Domain> <Password>password</Password> </Credentials> </Login> <ImageSelection> <WillShowUI>OnError</WillShowUI> <InstallImage> <ImageName>HPC Node Image</ImageName> <ImageGroup>HPC</ImageGroup> <Filename>hpcnode.wim</Filename> </InstallImage> <InstallTo> <DiskID>0</DiskID> <PartitionID>1</PartitionID> </InstallTo> </ImageSelection> </WindowsDeploymentServices> <DiskConfiguration> <Disk> <CreatePartitions> <CreatePartition> <Extend>true</Extend> <Order>1</Order> <Type>Primary</Type> </CreatePartition> </CreatePartitions> <ModifyPartitions> <ModifyPartition> <Active>true</Active> <Format>NTFS</Format> <Label>OS</Label> <Letter>C</Letter> <Order>1</Order> <PartitionID>1</PartitionID> </ModifyPartition> </ModifyPartitions> <DiskID>0</DiskID> <WillWipeDisk>true</WillWipeDisk> </Disk> <WillShowUI>OnError</WillShowUI> </DiskConfiguration> </component> </settings> </unattend>
Notice too that my configuration is set to wipe the drive of the target system. This is handy in an HPC scenario where you will need to do this pretty regularly. By the way, you will want to consider reimaging periodically in HPC scenarios anyway given that simply adding new codes to a CN can have a potential performance impact versus reimaging the node with just the codes needed for the task at hand.
Once you have this file built out the way you like it you will use the WDS admin MMC snap-in to map it to the boot.wim file you are using (Right Click on your Server name in the MMC, go to Properties, Client, “Enable unattended installation”, and for each architecture you can add an unattend.xml file). You will notice that you can only have one wds-client unattend file per architecture type but you can have one sysprep.inf/unattend.xml file per Install image.
Install Image Unattend file
As I mentioned above the format for the Install image unattended installation is via RIS-style sysprep.inf files. There are
;SetupMgrTag [Unattended] OemSkipEula=Yes InstallFilesPath=C:\sysprep\i386 [GuiUnattended] AdminPassword="Pass@W0rd1" EncryptedAdminPassword=NO OEMSkipRegional=1 OEMDuplicatorstring="HPC Sysprep Image" TimeZone=4 OemSkipWelcome=1 [UserData] ProductKey=ABC1D-EFG7H-8IJK9-KLMNO-PQ2RS FullName="%ORGNAME%" OrgName="%ORGNAME%" ComputerName= "%MACHINENAME%" [LicenseFilePrintData] AutoMode=PerSeat [TapiLocation] CountryCode=1 Dialing=Tone AreaCode=425 [RegionalSettings] LanguageGroup=1 Language=00000409 [Identification] JoinDomain= %MACHIINEDOMAIN% DomainAdmin=HPC\Administrator DomainAdminPassword=Pass@W0rd2 DoOldStyleDomainJoin=Yes [Networking] InstallDefaultComponents=Yes [Branding] BrandIEUsingUnattended=Yes [Proxy] Proxy_Enable=1 Use_Same_Proxy=1 HTTP_Proxy_Server=svlproxy:74 Proxy_Override=<local>
You will notice that some of the setting above have been replaced by unattend variables. The values for these variables are pulled directly from the WDS server, which makes it super easy to auto-populate most of the major fields.
Of special interest is the %Machinename% variable. You may be wondering where that value comes from since it has to be unique to each server. This data is provided by WDS as specified on the Directory Services tab of the properties page for your server in the WDS MMC snap-in. In my case, the format is set as HPC-NODE%03# which translates to machine names of HPC-NODE001, HPC-NODE002, etc. When the nodes are built out with the new operating system image, their machine name is set to the next machine name in order following that format and their machine account is created in AD – we made sure that would happen by specifying a domain account and the “JoinDomain” command in the sysprep.inf file we covered.
For more information on unattend variables and for more detail on unattended installation in general, please refer to Chapter 8 of Deploying and Managing the Windows Deployment Services Update on Windows Server 2003
By the way, you can automate the creation of the Sysprep.inf file to a large degree by using the Setupmgr.exe utility included in the deploy.cab.
Final Step: Start Deploying Compute Node Images
Now that you have a captured image, you will want to start deploying it broadly and quickly. The first thing to check is that your client can successfully PXE boot to the WDS server. You should boot one of your CN’s up and ensure that it can both see and attempt a PXE boot from the WDS server. One way you will know that you are being initially successful is that the server’s IP address will appear on screen as part of the boot process.
You may find that on systems running x64 processors (like AMD Athlon™ processors and AMD Opteron™ processors), which are capable of booting both 32-bit and 64-bit versions of Windows, that PXE booting fails with an error that looks something like this:
CLIENT IP: 192.168.10.20 MASK 255.255.255.0 DHCP IP: 192.168.10.10
Contacting Server: 192.168.10.10
TFTP Download: boot\x64\pxeboot.com
Failed to restart TFTP.
TFTP download failed
Systems that are capable of running both 32-bit and 64-bit operating systems are sometimes seen by WDS (owing to BIOS issues) as being only 32-bit aware or are not correctly detected as 64-bit capable. In the example above, WDS is attempting to deploy a 64-bit version of Windows to the client but is failing. To resolve this issue, you will want to enable the following switch via the WDSUTIL command-line tool: ArchitectureDiscovery:YES. This switch causes WDS to more thoroughly interrogate the hardware and correctly report back which operating system versions it is capable of running.
To enable the switch via command line go to Start, All Programs, Microsoft Windows AIK, Windows PE Tools Command Prompt and type the following command:
WDSUTIL /set-server /architecturediscovery:yes
Alternately, you can force a specific platform architecture choice by using this command instead:
WDSUTIL /set-server /DefaultX86X64ImageType:x86
WDSUTIL /set-server /DefaultX86X64ImageType:x64
The last option that you will want to setup via the command line is the ability to bypass having to hit the F12 key on each machine in order to choose a network boot. On a small cluster, hitting F12 four times is not a huge deal but it becomes an impossible task when you have to worry about 256 nodes. To enable this feature, type the following in the Windows PE Tools Command Prompt:
WDSUTIL /Set-Server /AllowN12ForNewClients:Yes
Please note, however, that for the current Windows 2003 SP2 based version of WDS, there is a known bug that prevents this from working correctly on 64-bit targeted systems. This is corrected for Windows Server 2008 though.
Before walking through a deployment, ensure that the network boot device is the second one in order in BIOS and that the hard drive is the first device. This is necessary because if the NIC is the first device then the system will constantly reboot and reimage itself. With this configuration, once the system has been imaged, it will boot to disk and not attempt to boot from the network again.
At some point, you are going to want to re-image those servers so how do you do that? The solution here is to issue a diskpart command on the targeted system and wipe the operating system drive then force a reboot. When the system reboots again, it will attempt to boot to disk and fail since the disk is now empty, and will then boot to the network and download your new image file.
Sample Diskpart Script
C:\>Diskpart REM Select future C:\ drive SELECT DISK 1 REM Wipe the disk clean CLEAN REM Create a single Primary Partition CREATE PARTITION PRIMARY REM Set the drive to mount at C:\ ASSIGN LETTER=C Exit Shutdown.exe /r /t 005
You will notice that this script wipes my disk, recreates a primary partition and forces a reboot at the end. When the system reboots it will attempt to boot to disk, fail then boot to the network.
A Word About System Center Essentials 2007
OK, then how do you push a script like that out to your compute nodes? There are a number of methods for performing that task. At the beginning of this white paper, I talked about using WSUS to push updates and keep your systems up to date. However, a better approach would be to incorporate a unified set of tools that lets you manage the health of your nodes, update them and push out products as necessary. Remember too that your management and security profile increase with each new tool so going with something consolidated and lightweight makes a lot of sense. I discovered System Center Essentials at TechEd this year and am now using that to manage the cluster. With this beasty, I can push updates to my nodes as they come in from Windows Update. I can push out the latest build of some code that will be running on the cluster and I can kill all the nodes and reload the operating system on them in short order by pushing out that diskpart script as a software package. I can also keep a running profile of what is going on with my nodes health through the same tools. Very cool.
Tools for remotely deploying software:
http://www.codeguru.com/Cpp/I-N/network/remoteinvocation/article.php/c5433 (not tested for this article but looks interesting)
With this white paper I have showed you, systematically, how to build and deploy a Windows based HPC Cluster using Windows Deployment Services and WSUS/SCE to manage the nodes and keep them up to date. Once you have completed one deployment solution you will probably want to start using the tools for more scenarios than just your cluster.
Please send me any thoughts or suggestions you have to:
About the Author:
John McCrae is a member of the Microsoft team at AMD where he works to understand and articulate the benefits of technology to solve every-day and not so every-day problems. When he is not discussing High Performance Computing or Microsoft Office SharePoint Server, he can be found playing hockey with the Greater Seattle Hockey League.