The Desktop Files Network-Booting Windows

Wes Miller

Contents

How PXE Works
Diving into RIS
WDS—The Beginning
Other Players
Wrapping Up

Over the next few months I'm planning to cover Windows Deployment Services (WDS), which is available for Windows Server 2003 and is built into Windows Server 2008. WDS can be a very important component in your deployment infra-structure, so I want to make sure you have a good foundation for the discussion. This

first column will drill into the architecture of the Pre-boot eXecution Environment (PXE; pronounced pixie), the history of Remote Installation Services (RIS), as well as other PXE-related technologies used at Microsoft.

When I moved to the Windows core OS group in 2001, RIS was one of the technologies I inherited (and was a bit terrified to own, due to its complexity and dependency on BIOS implementations and hardware). But it was, in addition to Windows® PE, one of the technologies I enjoyed most in my role as a Program Manager.

I remember when I first installed Windows 3.0—using a set of 3.5" diskettes. Over time, installation became easier by virtue of bootable CDs (included in some versions of Windows 98). But the thing was, installation always required local media of some type, as well as a local hard disk.

Through the years, the ability to network-boot Windows—that is, to boot it completely over the network without requiring a hard disk—has been a frequent request of customers. Though some early versions of Windows had that ability, Windows NT® never did. And while current versions of Windows Server® 2003 and Windows Server 2008 can be booted via an iSCSI initiator, the process is quite different in that it isn't truly local—it entails an ongoing dependency on a remote drive as the boot drive.

Prestaging Clients

Beginning with Windows 2000, Microsoft began developing the technology that would eventually come to be called RIS and that would allow for network-based installation. The goal of RIS was a relatively simple one—to put an operating system image onto the local disk of the target computer using PXE.

How PXE Works

Figure 1 shows the PXE boot sequence. PXE is a relatively simple protocol that was developed by Intel and other vendors as part of the Wired for Management initiative. PXE is derived from Dynamic Host Configuration Protocol (DHCP), itself derived from BootP, and is typically implemented in your Network Interface Card (NIC). Simply put, here's what happens:

fig01.gif

Figure 1 PXE boot sequence (Click the image for a larger view)

Step 1 The system BIOS boots up and determines boot order.

Step 2 If the boot order puts PXE ahead of hard disks, or flash drives, or CD-ROMs, or if none of those devices are present, the Universal Network Driver Interface (UNDI) is loaded from the NIC. The NIC features an extremely small network device driver and a Trivial File Transfer Protocol (TFTP) implementation. (Some BIOS implementations require end users to press the F12 key in order to PXE boot. That isn't standard, and I appreciate the ability to disable it.)

Step 3 The system begins making a simple User Datagram Protocol (UDP) broadcast, looking for a DHCP server. This is actually the first step of the PXE boot sequence and is referred to as Discover. Notice that the protocol is UDP—meaning that if you haven't yet, you will need to spend some quality time with your routers and switches to ensure that the PXE communications can all make it across.

Step 4 If a DHCP server hears the broadcast, it responds accordingly with an IP address. This step is referred to as Offer. The important point to remember here is that PXE is stateless and the amount of system-unique state information the client has to offer at this point is pretty limited (the MAC address and, if available, the System Management BIOS GUID, also known as SMBIOS GUID).

Step 5 The client, after receiving the packet with the IP address, then states that it actually needs further information—namely the PXE server's address. Another broadcast occurs, which includes the information from the DHCP server that originally responded. Here the client is telling the DHCP server, "I need more information—specifically, I need the location of a Network Boot Program (NBP)." This step is called Request.

Step 6 The PXE server responds with the address of the PXE server and the location of the NBP, an extremely small boot executable that is required to be smaller than 32KB. This step is called Acknowledge. If you're playing along, you've probably noticed the acronym DORA (Discover, Offer, Request, Acknowledge), a good way to remember the sequence.

Note that if you have installed Microsoft DHCP and WDS (or used some other vendor's technologies), the request step doesn't occur and, in fact, the original Offer packet from the DHCP server already includes the location of the PXE server and the NBP program (thus removing two steps, as well as some time).

Step 7 The client, which has the small TFTP protocol stack mentioned earlier, downloads the NBP from the network location specified by the PXE server. TFTP is a dated, exceedingly small, stateless protocol. It wasn't chosen for its security or its performance—and many router administrators disable it by default as a result. It must be enabled for PXE to function.

Many PXE implementations (including RIS) include the ability to ask the user to press F12 to continue at this point, but typically this can be disabled by the administrator of the PXE server. Next month when I look further at WDS, I'll take a look at some of the enhancements Microsoft has put into the TFTPD (TFTP Daemon) in WDS for Windows Server 2008 to improve performance.

Step 8 The NBP is initialized. In the case of RIS, this starts a Windows boot loader that begins the process of taking deployment forward. PXE (at least the actual boot-level protocol) is no longer a component in the process.

It's important to remember that PXE (whether RIS, WDS, or any other infrastructure) doesn't work well over slow links (it can be transferring considerable amounts of data) or high-latency links such as satellites (the communication simply doesn't perform well and may not even survive).

You may notice in the PXE boot process that when the client sends the request, there is nothing that specifically asks, "Are you my mother?" There simply isn't a lot of state information for the PXE server to know one way or another. What usually occurs is a race condition—where the first server to respond to your client request will win. A couple of things can help reduce this problem:

  • Adjust the response speed of one PXE server or the other. Network latency and server horsepower will impact how fast the servers respond. In fact, at Microsoft it used to be that the servers used by Microsoft IT were so good that even if the PXE server was in your office, the corporate servers would sometimes win. In this case, you just set your local PXE server to have no timeout at all—for your prestaged clients.
  • Prestage the clients. This is very important if you are manipulating your PXE server to respond ahead of other corporate IT servers. By prestaging your clients, you allow Active Directory® to tell WDS or RIS that yes, in fact, "I am your mother." Note that the use of the SMBIOS GUID is far preferred as the unique identifier for your systems in Active Directory—but if an SMBIOS GUID is not implemented in the systems (more likely in relatively older hardware), you can (and will have to) use a GUID based on the MAC address. For more information, see the "Prestaging Clients" sidebar.
  • Don't allow PXE communications to cross switches or routers; put a PXE server on each side. This has the downside of being both expensive to implement and expensive to maintain (each server must have its own image maintained).

RIS (and now WDS) servers, like Microsoft DHCP servers, must be authenticated against the Active Directory implementation they are associated with. The goal is to reduce the issues that rogue PXE servers may cause (such as PXE broadcast storms) by letting the Active Directory know about all of those servers.

Note that this protects only against servers Active Directory knows about. If you set up your own domain, or a non-Microsoft PXE server, that won't be the case.

At Microsoft, an overzealous employee once configured a "zero-touch" deployment non-RIS PXE server. This implementation worked by completely erasing the hard disk and putting down a new image. That would have been fine if the deployment occurred in an isolated (off-network) lab, but unfortunately it didn't—and it wound up erasing the disk of a Microsoft executive who had PXE early in his boot order before the hard disk.

That hadn't been a problem, as Microsoft IT always required pressing F12 to PXE boot, but this PXE server did not have a delay, an F12 prompt, or any type of notification. This meant that the executive effectively lost his computer and any data not protected by his Roaming User Profile.

Let this story serve to highlight to you the necessity of isolating your PXE servers if you're going "zero-touch," or at the very least to require pressing F12.

Diving into RIS

I inherited RIS after Windows 2000 had shipped. Time hadn't been kind to Windows 2000 as far as RIS was concerned—and testing, performance, and other constraints led to RIS for Windows Server 2000 being used solely for Windows 2000 Professional deployment. The server products, unfortunately, couldn't be deployed via RIS. Windows 2000 was only available for x86 machines, so it proved to be a good test bed for RIS since it involved one product on one architecture. RIS included (and required) complete integration with Active Directory, integrated well into the Microsoft DHCP server, and included its own TFTPD.

RIS uses the NBP to continue the TFTP download—and bring down enough of the Windows kernel to begin the setup process. (At a certain point, once Windows has switched from TFTPD to a Server Message Block, or SMB,-based connection to the server, the literal codepath is actually shared with a traditional floppy-disk-initiated install of Windows 2000 or Windows XP.) Once native-mode Windows has been initialized, Windows setup begins the RIS OS Chooser (OSC) Wizard.

OSC screens are somewhat configurable, HTML 2.0-like pages. They are severely constrained and cannot contain images or the like and, in fact, can't contain non-ANSI characters (which made deploying certain locales of Windows complicated).

The end product of RIS is a txtsetup.sif file that sits on the RIS server. When the OSChooser Wizard completes, the client is "soft-rebooted," but the location of the RIS server and of the txtset­up.sif file are retained and reloaded after the soft reboot. This txtsetup.sif file is essentially the same as an unattend.txt file, with several additional fields included to help RIS complete the setup process.

RIS could also perform a setup that looked much like traditional unattended setup (RISetup), and it had a cloning-based infrastructure similar to Sysprep (RIPrep) and, in fact, shared code with it. But RIPrep could also upload an image of itself to a RIS server.

RIS had some fundamental limitations that became apparent, however. The first was the lack of support for server deployment. Certain exploits, such as Code Red and Sasser, combined with the IT complexities that several key customers experienced recovering directly from the September 11th tragedy in 2001, led us to actually fast-track a solution for existing Windows 2000 RIS servers to allow for Windows Server deployment. This was something we had been working on for Windows Server 2003 but had not formally released.

More on PXE-Related Technologies

Second, RIS lacked the ability to completely automate the OSChooser Wizard, which was later enabled with the <META ACTION="AUTOENTER"> element in Windows Server 2003. Finally, the OSChooser was unable to function properly with non-ANSI characters—a key weakness pointed out by several customers outside of the United States.

As a result, you could not complete a RIS installation with a French keyboard, for example. Getting non-ANSI characters working safely at the BIOS level on PCs from around the world was extremely complex and it just wasn't easy to accomplish.

With the release of Windows Server 2003, we formally added support for the Intel Itanium architecture and all server variants of Windows 2000 and Windows Server 2003. Windows Server 2003 took that one step further with support for the x64 architecture.

RIS also had a significantly rewritten TFTPD to increase performance. Windows Server 2003 supported booting up to 75 clients at the same time; bear in mind that the upper bound here is reached as the SMB pipe fills up with network traffic to the clients.

WDS—The Beginning

When we started working on RIS for "Longhorn" (the code name for what became Windows Server 2008), it became apparent that we needed to take a step back. As I've mentioned in my column before, we had already made big bets on image-based (WIM) setup from Windows PE. As a result, the key tenet underlying WDS was image-based deployment from a PXE-booted instance of Windows PE.

We also knew that Windows Server 2003 would be the common platform for Windows Vista® deployment, and that we would need an "out of band" solution for WDS downlevel. As a result, WDS was able to run on Windows Server 2003 SP1 and was built into Windows Server 2003 SP2.

Since it can operate as a RIS server (Legacy mode), a hybrid server (Mixed mode), or a WDS-only server (Native mode), WDS allows you to migrate to formal WDS-style deployments as warranted. I have heard customers asking if there is a way to install RIS on a Windows Server 2003 SP2 system. Yes, there is—you install WDS and run in Legacy mode. Figure 2 shows the supported operating systems.

Figure 2 Platforms supported for deployment

Operating System RIS (Windows 2000) RIS (Windows Server 2003)** WDS (Windows Server 2003)**** WDS (Windows Server 2008)
Windows 2000 Pro X X X X
Windows 2000 Server * X X X
Windows XP Pro   X (x86 and IA64)*** X X
Windows Server 2003   X (x86 and IA64)*** X X
Windows Vista     X X
Windows Server 2008     X X
* support.microsoft.com/kb/308508 and support.microsoft.com/kb/313069 added support for Windows 2000 Server via RIS.
** WDS Legacy and Mixed mode support this same matrix for legacy installs.
*** Windows Server 2003 SP1 added support for x64-based systems. IA64 systems only supported RISetup-based installation.
Native Mode support.

WDS in Windows Server 2003 SP1 and SP2 sought to begin the migration process to WDS. As I've mentioned before, the key features WDS added in Windows Server 2008 were a revised TFTPD, Extensible Firmware Interface (EFI) boot support, and of course, multicast-based deployment.

Other Players

Automated Deployment Services (ADS) was built by another team within Microsoft, primarily with the goal of rapid server provisioning. ADS featured formal sector-based imaging, its own boot agent (smaller than Windows PE but not as full-functioned), its own TFTPD, and very advanced multicast. The functionality built into ADS became available to a degree in System Center Configuration Manager (SCCM), though there is not 100 percent feature parity.

Windows XP Embedded featured full PXE-boot via its own TFTPD into a RAMDisk, but could not remote deploy that way. The technology was designed to support booting numerous systems from the same disk image at the same time via PXE.

Wrapping Up

So that's the history, in brief. To find out more, see the "More on PXE-Related Technologies" sidebar. Next month, I'll dive into WDS fundamentals, to be followed by columns on WDS advanced functionality (multicast and more) and, finally, using WDS without using WDS, by which I mean going beyond the existing WDS/setup experience to roll your own deployment techniques.

Wes Miller is a Senior Technical Product Manager at CoreTrace (www.CoreTrace.com) in Austin, Texas. Previously, he worked at Winternals Software and as a Program Manager at Microsoft. Wes can be reached at technet@getwired.com.

© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.