Windows 2000 Web Server Best Practices for High Availability

By Tim Hodgkins

Achieving high server availability on a Web server requires a combination of well designed and tested applications, thoroughly tested server hardware, and disciplined server monitoring and management. This article examines strategies and techniques that organizations can use to achieve 99.99% single server availability using Microsoft technologies.

On This Page

Select a Server Availability Goal
Web Application Design Best Practices
Web Server Deployment Planning
Managing Web Servers
Best Practices for Server Failure and Recovery


Achieving high server availability on a Web server requires a combination of well designed and tested applications, thoroughly tested server hardware, and disciplined server monitoring and management. Microsoft has been working with ten external high profile dot-com companies along with five Microsoft properties since the launch of the Windows 2000 operating system to measure and improve the availability of their respective Web sites. Of the many things learned from these valuable relationships, the most important was understanding how operational practices influence server availability.

This paper examines strategies and techniques that organizations can use to achieve 99.99% single server availability using Microsoft technologies. The first section defines common availability and reliability terms, describes the relationship between the variables of server availability, and lists the tools that Microsoft has created to help organizations measure their server availability and reliability.

The next section looks at the first step towards achieving 99.99% single server availability—best practices for designing and building your Web application. This section is written with the Web application developer in mind; however Web server administrators may find this section especially helpful when troubleshooting Web server problems. The focus of this section is sharing best practices for building great components and Active Server Page® (ASP) processes with additional discussion on database connectivity.

The third section of the paper looks at the next step towards achieving 99.99% single server availability— building and deploying high availability servers. This section is geared towards Web server administrators with a focus on what steps they can take to ensure the stability of the servers, before and after deployment. Extensive application and hardware compatibility testing and performance tuning will ensure that your Web servers will be able to maintain a high level of availability. This section ends with a discussion of fault tolerant and load balancing technologies.

The final section, targeted at Web server administrators, looks at the best practices that an organization can use to manage Web servers. It discusses the key server management tools available from Microsoft. It then discusses proactive best practices that an organization can use to minimize server downtime, such as debugging tips and hotfix and security updates.

Throughout this paper, real customer problem scenarios are discussed along with recommendations that your organization can use to identify and resolve these common problems.

Use the list below to learn the top ten recommendations your organization can use to improve the reliability and availability of your Web servers:

  • If you are developing you application in Visual Basic, use a utility called VBCHKW2K to verify proper compilation settings.

  • Thoroughly test your hardware and software for Windows 2000 compatibility.

  • Use pooled-process ASP pages as much as possible.

  • Use pooled-process COM components as much as possible.

  • Use the Web Capacity Analysis Tool and HTTP monitor to stress test your application.

  • Document and follow a Web server deployment plan.

  • Install and use the IIS 5 Recycle Tool to increase Web server availability.

  • Document and follow a process for evaluating and prioritizing the applicability of hotfixes and Security Updates as they are released.

  • Use HFCheck and QFECheck to ensure a standard OS installation across servers.

  • Select a server availability goal and use Microsoft tools to track your progress towards that goal.

The problems and recommendations mentioned here are discussed in great detail with customer scenarios and code samples in the rest of this paper and can be used by your organization to understand and improve your Web server availability and reliability.

Note: This paper focuses on server availability as opposed to availability of a Web site or other service. All tools and best practices discussed in this paper are focused on measuring and improving the availability of single servers. Adding fault tolerant configurations and Network Load Balancing (NLB) on highly available and reliable individual servers will result in a highly available Website.

Select a Server Availability Goal

Availability and Reliability Terminology

This section defines the availability and reliability terms used in this article.

Reliability — the probability that the product (system) will perform its intended function for a specified time period when operating under normal (or stated) environmental conditions.1

In this paper, reliability of an individual server is determined by calculating the mean time between outage of that server. Mean time between outages, also commonly referred to as mean time between reboots, is the average amount of time between outages on a server. Although all reboots are not the result of a failure, all reboots lead to downtime and from a reliability perspective can be considered a failure. Since servers run for many days without being rebooted, typically mean time between outages is reported in days.

Mean time between outages — the average amount of time between outages on a server. Calculated using the following formula (assuming a 24x7 production server):

Runtime — the sum of time covered by the Windows event logs while the server was running a particular operating system version. Since it is necessary to have long runtimes to estimate availability, runtime is often expressed in measurement units of years.

Availability — the probability that a server will be perform its intended function under normal operating conditions when needed, usually expressed as a percentage. Assuming a 24x7 production server, availability is calculated using the following formula:

Mean time to restore — the arithmetic mean of the downtime durations. The unit of measurement for mean time to restore is minutes.

Relating Server Availability to Clock-Time

In general, Web sites exist to serve the global audience of the Internet. This means that the servers hosting these Web sites must be available 24 hours a day, 7 days a week. Knowing this, it is possible to use the availability metric to calculate how much downtime a server can sustain and still achieve your required availability goal.

Table 1 shows the relationship between single server availability and downtime for commonly used availability levels. The downtime column lists the maximum amount of single server downtime permitted in a year that will still satisfy your single server availability goal. The last two columns display the maximum number of outages allowed per year to satisfy your single server availability goal given an average downtime of 5 minutes and 10 minutes. For example, to achieve 99.9% availability on a single server, the server cannot be down for more than 8.76 hours per year. In terms of outages, this equals a maximum of 50 outages per year or 25 outages per year given a repair time of 5 minutes and 10 minutes respectively.

Table 1 Comparing Availability and Downtime

Single Server Availability (%)

Downtime per year

Maximum Outages Allowed per Year (given 5 minute restore time)

Maximum Outages Allowed per Year (given 10 minute restore time)


8.76 hours




4.38 hours




52 minutes




5 minutes



Note: To achieve 99.999% single server availability you are permitted 1 outage per year given a 5 minute restore time. This goal will be difficult to achieve without incorporating fault tolerant hardware and software redundancy in your environment.

In the preceding section, you may have noticed that there are two main variables to the calculation of availability: mean time between outages and mean time to restore. Understanding the relationship between these two variables will help you achieve your availability goal.

In general, server outages can be one of two types: planned or unplanned. Planned outages are those that are budgeted by IT departments for routine proactive server maintenance. This could include installing the latest Windows Service Pack or installing a new hard drive to match storage demands on a server. Unplanned outages are those that are not budgeted by IT departments in advance. Examples include server crashes, emergency hardware installation to replace a failed component, server unresponsiveness, or emergency installation of an application hotfix to resolve a critical problem.

Many organizations choose to schedule monthly maintenance outages on their servers for needed hardware or software upgrades. If your organization schedules outages, it's important to understand how your monthly outage affects your ability to achieve your availability goal. Table 2 details the maximum mean time to restore per outage permitted to achieve your availability goal. The calculations use a mean time between outages of 60 days.

Table 2 Maximum Mean Time to Restore

Single Server Availability (%)

Mean time Between Outages (days)

Maximum Mean time to Restore (min)













Note: Achieving a single server availability goal of 99.999% is impossible with a mean time between outage of 60 days, since the server boot time would surpass the .9 minutes maximum for mean time to restore.

For the high profile Web sites monitored for this study, the mean time to restore for Web servers was between 5 and 10 minutes. This mean time to restore can be achieved if an organization is able to troubleshoot failures quickly and return their Web servers to production.

Table 3 details the minimum mean time between outage permitted to achieve your availability goal given a mean time to restore of 10 minutes. The calculations in Table 3 include planned and unplanned outages. If your organization chooses to discount outages that occur during maintenance windows, you will have to recalculate your availability metrics, removing the planned maintenance downtime.

Table 3 Minimum Mean Time Between Outages

Single Server Availability (%)

Mean Time between Outages (days)

Mean Time to Restore (min)













As you can see from Table 3, your servers will have to run without failure for more than 70 days to achieve a single server availability goal greater than 99.99%, assuming an average mean time to restore of 10 minutes. Achieving 99.999% single server availability will be impossible without using redundant hardware.

To summarize, there is a cost to achieving your desired single server availability goal. Tables 2 and 3 are included as an exercise to show the relationship between the two elements of server availability—mean time between outages and mean time to restore per outage. While we have found in our measurements that the mean time to restore is between 5 and 10 minutes, we do not mean to imply that all outages last 10 minutes. The distributions of measured server outages always contain a small amount of very long outages, even though the vast majority of them are shorter than 10 minutes. By examining the relationship between the components of server availability, you will see that your IT organization can influence your server availability by budgeting time to meet your respective availability goals.

If your single server availability goals differ from the examples given, you can use the formulas below to determine the necessary mean time between outages and mean time to restore values, in order to achieve your availability goal. The example given assumes a single server availability goal of 99.97% and a mean time between outages in the first example of 60 days and a mean time to restore of 10 minutes in the second example. These examples are included so that you understand the mathematical formulas.

Determining Mean Time to Restore if given Mean Time between Outages:

Mean Time to Restore(min) = Mean time between outages (min) * (1-Availability)/Availability

Mean Time to Restore(min) = 86400 minutes * (1-.9997)/.9997))

Mean Time to Restore(min) = 86400 minutes * (.0003/.9997)

Mean Time to Restore(min) = 25.93 minutes

Determining Mean Time between Outages if given Mean Time to Restore:

Mean Time between outages(min) = Mean time to restore(min) * (Availability/(1-Availability))

Mean Time between outages(min) = 10min * (.9997/(1-.9997))

Mean Time between outages(min) = 33323 minutes

Mean Time between outages (days) = 33323 minutes / 24 hours / 60 minutes

Mean Time between outages (days) = 23.1 days

Availability Measurement Tools

The Microsoft Reliability Group has developed a tool called Event Log Analyst (ELA) that gathers reliability information from remote Windows 2000 and Windows NT 4.0-based servers. ELA runs on a single computer, called the collection computer, and sequentially retrieves event log information from other Windows 2000 and NT 4.0-based servers. Once ELA data is collected, it can be analyzed using other tools.

ELA collects several types of basic information from the System and Application event logs. The data which is collected includes:

  • Basic information about a Windows 2000 event log: the number of system events and the number of application events.

  • Timestamp of all system reboots. On computers running Windows NT 4.0 Service Pack 4 or later, it also collects the timestamps of the system shutdown events.

  • Information about Windows 2000 system crashes (also referred to as either bug checks or blue screens).

  • Information about application crashes (also referred to as either user-mode dumps or Dr. Watson notification messages).

ELA is designed to be unobtrusive. Because ELA only uses publicly documented interfaces for remotely accessing the event log, it does not require any software to be installed on the systems where the event logs reside. ELA is simple to install. It is a single executable file that runs on any Windows 2000 or later system. ELA has a low impact on production environments, typically accessing a remote server for less than 30 seconds during the collection process. In tests using the Microsoft Corporate Data Center, ELA scanned the event logs of over 1500 servers in about 30 minutes using a collection system with a 100Mbps LAN connection.

The complete user's guide for ELA includes the list of default events collected as well as security configuration for collecting data, and can be found in the Appendix.

Web Application Design Best Practices

There are many design choices to consider when building a highly available Web application. Microsoft Internet Information Server (IIS) provides a wealth of opportunity for a developer in the form of tools, languages, database connectivity, and scripting options. It is important for developers to understand the benefits and drawbacks associated with each option. The next section discusses development best practices to help you improve the quality of your existing applications, as well as learn ways to build better Web applications in the future.


The articles listed in this section outline best practices for designing and configuring Components. Examples of Components include a Microsoft Transaction Server (MTS) component, a COM+ component, or a component that is called from Active Server Pages (ASP). When writing components with Visual Basic (VB), you must set particular properties in the VB project and should follow best practice recommendations to improve performance.

For example, developers often declare public variables in basic modules (.bas) using Microsoft Visual Basic. This practice can cause unpredictable behavior and potential memory corruption, which increases the possibility of application crashes. Use the articles mentioned in this section to gain a better understanding of what to do when designing and configuring components.

Avoid public variables in basic modules (.bas). This can cause unpredictable behavior and potential memory corruption as each thread has a separate copy of the variables while requests on the same thread share the same variables.

Make sure to select the 'Retain in Memory' and 'Unattended Execution' check boxes when writing a server-side component using Visual Basic. If these options are not selected, the Visual Basic runtime unloads custom and runtime DLLs unexpectedly, which may cause the computer to stop responding (crash or hang) under multithreaded scenarios.

Use pooled-process ASP pages and components as much as possible. COM components have three options for isolation: 'unconfigured', configured as Library Application, and configured as Server Applications. 'Unconfigured' and Library Applications run in the caller's process space or 'in-process', while Server Applications are configured to run in their own process space. Using process isolation in ASP and COM will prevent a crash in your application from crashing other installed applications.

The articles below describe best practices for creating components:

ASP Best Practices and Common Issues

ASP lock ups are one of the top 10 most common problem scenarios that Microsoft IIS Developer Support groups encounter. Although there can be a variety of causes, many times customer application code is the culprit or at least a major contributor to the problem. These types of problems can translate into prolonged downtime when trying to debug code on a production server. A simple example of code that can cause a lock up is making explicit calls to the Application.Lock and Application.Unlock methods from within a component that is called from an ASP page. This is illustrated below:


Application ("myvar") = "Hello World"


In the example above, the Application.Lock and Application.Unlock methods are not needed to assign a single Application variable because these methods are called implicitly.

To prevent crashes in one application from affecting other installed applications, configure your ASP applications to use process isolation. The three isolation levels available in IIS 5.0 are: Low, Medium, and High. Low Isolation, often referred to as 'in-process' isolation, runs your ASP applications within the process context of inetinfo.exe, which is the primary IIS process. Medium Isolation is referred to as 'out-of-process', since ASP runs outside of the IIS process. All ASP applications configured to run in Medium Isolation share a single process space. In High isolation, also referred to as 'out-of-process', all ASP applications run in their own process space, protecting ASP applications from each other.

Use the Web Capacity Analysis Tool and HTTP monitor tool from the IIS Resource Kit to stress test and measure your Web applications. Stress testing your Web application will not only be helpful in isolating coding errors, but will also allow you to determine if you have bottlenecks that you may need to tune.

The following articles describe best practices for writing ASP applications:

ASP-Database Best Practices

Retrieving data from a Microsoft SQL Server using ADO (ActiveX Data Objects) and ASP can be challenging to Web developers. As an increasing number of Web applications are serving as the interface to databases, it is important that you understand ways to maximize performance, scalability, and robustness in your development efforts. Use the following guidelines and best practices if you plan on calling database components from Web applications in IIS:

With ADO, always close recordsets and connections.

If you "open" a connection, use it and "close" it again. The connection can then safely be handed to another thread processing a different command. If the server load gets light, the connection pool is trimmed back automatically and others using that server get better performance. If the server load gets heavy, the pool can grow as needed. Choosing not to pool connections will result in idle connections, which waste server and network resources. In addition, you also may discover threading issues that can occur if multiple concurrent threads end up using the same connection.

Open late-close early.

Open ADO objects just before they're needed and close right after you're done. This shortens the time span in which the database must juggle resources for you, and releases the database connection to the connection pool as quickly as possible to allow new connections.

Do not pass parameters to the command object in the execute statement. Passing parameters to the command object forces ADO to handle extra processing duties as well as makes assumptions about the parameters you pass in.

The following code sample illustrates the poor practice of passing parameters to the command object.

Set DB = Server.CreateObject ("ADODB.Connection") 
DB.Open "Provider=SQLOLEDB;Data 
Set RS = DB.Execute ("GetCustomerByLastName @LastName='Smith'")
Set RS = Nothing 
Set DB = Nothing

A better practice is to explicitly declare parameters for the command object as displayed in the code sample below.

Set DB = Server.CreateObject ("ADODB.Connection") 
DB.Open "Provider=SQLOLEDB;Data 
Set cmdTemp = Server.CreateObject ("ADODB.Command") cmdTemp.ActiveConnection = DB 
cmdTemp.CommandText = "GetCustomerByLastName" 
cmdTemp.CommandType = adCmdStoredProc 
Set params = cmdTemp.Parameters 
params.Append cmdTemp.CreateParameter ("RETURN_VALUE", adInteger, 
adParamReturnValue, 0) 
params.Append cmdTemp.CreateParameter ("@LastName", adVarChar, adParamInput, 
cmdTemp ("@LastName") = "Smith" 
Set RS = cmdTemp.Execute
Set RS = Nothing 
Set DB = Nothing 
Set cmdTemp = Nothing

Always use Server.CreateObject.

Using Server.CreateObject allows ASP to track the object instance. The server portion causes the object to be created in a transaction server package so resources are pooled. Using the CreateObject and GetObject functions in server-side scripts rather than Server.CreateObject does not allow for access to ASP built-in objects or participate in transactions. Using CreateObject and GetObject will attach each new object to a separate thread which will consume available system resources much faster than using the connection pooling features available by using Server.CreateObject.

Do not re-use recordset or command variables; create new ones.

Re-using recordset or command variables may increase the risk of the code causing a failure within ADO. The Command Object is not designed or intended for this kind of utilization.

While configuring ODBC settings for your data source, use System DSN's as much as possible(rather than File DSN's). A System DSN is 3 times faster than a File DSN.

Don't put ADO connections in session objects.

When ADO objects are put in sessions, scalability limitations and threading issues are introduced, as well as unnecessary high stress on both the Web server and the database. If a connection is stored in a Session variable, connection pooling is eliminated because variables stored in the Session object persist for the entire user-session. Connection pooling is profitable when connections are shared across multiple clients and resources are in use only as long as they are needed. A Connection object stored in a Session variable will only serve the user for which the Session was created, and the Connection will not be released to the pool until the end of the Session.

Use TCP/IP sockets to connect to SQL Server if it is running on a remote computer.

TCP/IP sockets do not require an NT trusted connection and use standard SQL security, bypassing the authentication issues that are associated with using Named Pipes to a remote computer (see below). In cases where the SQL server is on another computer, TCP/IP sockets will offer a faster connection.

Use Named Pipes if SQL Server is running locally (on the same machine as ASP.)

By default, the use of network named pipes requires a trusted connection. When a client attempts a connection over network name pipes, the SQL Server Windows NT box performs a security check and must authenticate the client's computer account which requires a round trip to the appropriate domain controller. If the path between the SQL Server and the domain controller is unavailable, a connection may not be established.

In the case where SQL Server is running on the same machine running IIS and hosting ASP, use a local named pipe connection instead of a network named pipe connection. A simple way to do this is to change the keyword SERVER=machinename to SERVER=<local> in the SQL Server connection string of the global.asa file. This will prevent the round trip to a domain controller for authentication saving precious network bandwidth.

For more links and resources relating to Web application development, see the Appendix.

Web Server Deployment Planning

Once you have finished designing and building your Web application, it is time to build and test the Web servers that will host your new application. In addition to the necessary step of installing Windows 2000, you will also need to make sure any support applications you use as well as the physical hardware itself are compatible with Windows 2000. The next few sections describe important planning that must be done to ensure a highly available Web Server platform.

Application Compatibility

One of the first steps you should take when installing or upgrading to Windows 2000 is to ensure that the applications you use to support your business are compatible with Windows 2000. Microsoft has found by working with external customers that applications that are incompatible with Windows 2000 have been a major source of problems leading to unwanted downtime and decreased server availability.

There are a broad range of symptoms associated with incompatible applications, such as inability of the application to start, memory leaks, and access violations.

Microsoft provides an online tool called the Windows 2000 Readiness Analyzer that can be used to generate a report detailing known hardware and software compatibility issues on your specific servers. It is important to note that all third party compatibility issues are not addressed with this tool, especially in the case of applications you have built and deployed yourself.

Best Practice - Identify and resolve application compatibility issues before upgrading to Windows 2000 using the information in the articles below. The Search for Compatible Software Applications Wizard and Microsoft Readiness Analyzer will allow you to search for compatible applications for different versions of Windows 2000. The Windows Application Compatibility Toolkit contains documents and tools to help Microsoft Windows customers diagnose and resolve application compatibility such as a common compatibility issues white paper and several documents on best testing practices and tools to help fix compatibility issues.

Hardware Compatibility

It is important to also consider hardware compatibility when upgrading to or installing Windows 2000. Often Microsoft has found through root cause analysis that many failure events such as blue screens turn out to be due to a faulty filter driver or incompatible BIOS. Several dot-com companies that worked with the Microsoft Reliability Team reported that they encountered unplanned stoppage (blue screens) in certain situations when installing Windows 2000 because the driver for their SCSI devices was not present on the Windows 2000 media. Fortunately, Microsoft provides many resources to help you identify and resolve hardware incompatibilities before you install your Web servers in your production environment.

The Windows 2000 Readiness Analyzer tool mentioned in the preceding section can also help determine potential hardware compatibility issues on your servers. Alternatively, if you are upgrading from Windows NT 4.0 to Windows 2000, you can also identify compatibility issues with your server's configuration by running winnt32 /checkupgradeonly from the command line after inserting the Windows 2000 Setup CD-ROM.

In addition to the tools and resources from Microsoft, it is critical to contact your server vendors directly to determine product compatibility with Windows 2000. Often hardware vendors release BIOS updates as well as storage driver updates that have been tested and certified as compatible with Windows 2000. It is important to download, test, and install any driver updates from your hardware vendor prior to installing Windows 2000.

Best Practice – Ensure hardware and driver compatibility with Windows 2000 prior to installation using the links below. The Search for Compatible Hardware Devices and Computers Wizards will allow you to determine if your servers and hardware are compatible with Windows 2000 prior to upgrade or installation. Some search results will include links to download Windows 2000 drivers provided by manufacturers.

Windows 2000 Deployment Considerations

Once you are confident your server hardware and applications are compatible with Windows 2000, it is time to deploy Windows 2000 to your Web servers. There are several options to consider when deciding which deployment method is best for your business environment. The next few sections outline the server build process you should follow to guarantee a highly available platform.

In general – building servers should encompass the following steps:

  1. Determine hardware needs (disk setup, memory, etc.)

  2. Determine application needs (additional software, scripting environments, etc.)

  3. Identify installation options (image based, scripted install, RIS in whistler)

  4. Determine any OS configuration needed prior to disk duplication (if applicable)

  5. Document a consistent, repeatable installation method

These steps are explained in more detail below.

Step 1 - Determine hardware needs and standard platform

From working with many different customers, it became obvious to Microsoft that choosing and adhering to a standard platform made OS installation, application installation, and troubleshooting much easier for the system administrator. Examples of the type of decisions that need to be made relating to the common hardware platform decision are:

  • How should hard disks be partitioned?

  • Should hard drives or CDROM drives be re-mapped to different drive letters?

  • Do you need single processors or multiple processors in your Web servers?

Best Practice – Use the Web Server Capacity Planning document to help you answer the above questions and many more relating to hardware capacity planning. From a hardware perspective, the best capacity planning strategy is to observe usage carefully, monitor patterns, and increase or decrease resources based on your monitoring results.

Step 2 - Determine application needs

In most cases, a Web application involves multiple parts often hosted on different servers. This often makes the installation of additional software necessary on Web servers. For example, Web applications typically make use of databases in situations from e-commerce transactions to finding directions to checking your account balances in your bank accounts. To satisfy this need, you will need to examine such configurations as database connectivity methods as well as network optimizations to ensure successful integration between the software components.

Best Practice - It is important when planning server builds to note not only what applications are needed, but also the installation methods for those applications and what additional configuration is necessary to support these applications. For example, if you anticipate using a SQL database as part of your Web application, you may need to install and configure ODBC to suit your application needs. This information is critical when it comes time to automate as well as troubleshoot your installation.

Step 3 - Identify installation options

There are three main options when installing servers running Windows 2000 Server and Windows 2000 Advanced Server. These options are:

  • Unattended installation using a bootable CDROM and answer file

  • Unattended installation using network share and answer file

  • Disk image installation with Sysprep

Each one of these installation options has its pros and cons detailed in Table 4 below.

Table 4 Comparing Installation Tools

Installation Type



Unattended CDROM based installation

- Convenient for computers not connected to networks
- Easy to customize installation

- User intervention is required to eject floppy disk after GUI mode setup
- Potential security risk with company information on CD

Unattended network based installation

- Installation files secured on a central read only server
- No user intervention required during setup

- Difficult to create boot floppy to connect to network in a single floppy disk

Disk image installation with Sysprep

- Fastest option for automated installation
- No scripting required as image contains OS and applications

- Difficult to keep tight version control of OS files with many CD's
- User intervention needed in most cases to enter product ID

Unattended installation with CDROM

As mentioned above, this installation method is best for low network bandwidth scenarios such as a remote branch office.

In order to install Windows 2000 to meet your needs, it is necessary to use answer files. An answer file is essentially a script that answers questions during installation without requiring user input. A template answer file is included on the Windows 2000 CDROM called unattend.txt, which can be modified as needed (see for more details). In addition to the template answer file, it is possible to use the Windows 2000 Setup Manager to create an answer file. The Setup Manager tool can be found in the file located in the \Support\Tools folder on your Windows 2000 CDROM.

Unattended installation with network share

Like the unattended installation using CDROM, the network based unattended installation requires the use of answer files to automate the installation. It is possible to use either the template unattend.txt file available on the Windows 2000 CDROM or use the Windows 2000 Setup Manager (discussed above).

The difference between the network based unattended installation and CDROM based installation is the initial media needed to start the installation. In the case of a network-based installation, setup is started using a pre-configured boot floppy disk.

Best Practice - One of the reasons customers may choose to use this installation method is if they desire tighter control of the Windows 2000 installation files. With all installation files in a single network location, administrators will not have to worry about file versions being incorrect.

Image installation with Sysprep

Perhaps the fastest way to install and configure a Windows 2000 Server is to use imaging software to create a server image and Sysprep to customize the installation and give the server a unique security ID (SID). Symantec Ghost™ and Powerquest DriveImagePro™ are two popular choices for imaging software.

Using this method, a server is built by installing Windows 2000 and any necessary applications. Next, Sysprep is run with the appropriate command line options to configure the server for duplication. Keep in mind that Sysprep duplicates the original system by using third party disk duplication software such as Ghost or DriveImage. Finally, once the new system boots, a new security ID (SID) is generated for that server and another installation wizard is displayed. This final installation wizard can be automated using the sysprep.inf file.

Note: When using this installation method for servers that use different storage devices, use Sysprep 1.1 since you will only need to maintain a single image. Sysprep 1.1 supports the ability to include multiple mass storage device drivers in a single server image. This reduces the amount of time needed to create each server image as well as reduces the complexity of managing multiple server images.

For more information on Sysprep 1.1, see:

Determine constraints with each installation method

As detailed above, there are pros and cons to each installation method. It is important for the system administrator to determine any constraints in your specific environment before deciding on a particular installation method. Perhaps it is a good idea in your environment to use multiple installation methods: network based installations in the home office and CDROM based installation in remote offices without network access.

Here are some questions you should answer before choosing an installation method:

  • Are you limited by network bandwidth?

  • Will installations be necessary in areas with no network access?

  • Is operating system version control a high priority in your environment?

  • Will server operators be on hand to assist, should an installation fail?

  • Do you want only certain people able to install the OS?

  • What skill levels are required for the installation team? (scripting, etc.)

Step 4 - Determine any post-installation configuration needed

After an operating system installation that cannot be automated using Sysprep, there are often specific configuration changes needed, either due to your environment or due to the server role that must be applied.

Best Practice - Typically, the following configuration is done prior to installing a server into production:

  1. Install latest Windows 2000 Service Pack

  2. Install applicable hotfixes and Security Updates

  3. Disable services that are non-essential to the functioning of your Web application

  4. Ensure proper event log settings

  5. Add server to domain

  6. Change Administrator password

  7. Network configuration (adjust NIC bindings, etc.)

  8. Security configuration (create user accounts and NTFS partitions)

  9. Set IIS configuration parameters

Step 5 - Document a consistent, repeatable installation method

Once an installation method has been chosen, the next step is to document requirements to be completed for a successful installation.

Best Practice - These requirements should contain at the minimum a flowchart indicating each action throughout the installation process and what should happen in a failure situation. This will help you when you need to troubleshoot a failed installation or revise your installation policies in the future. Using this technique, a dot-com company that Microsoft has been working with has been able to define a process by which they can install a new Windows 2000 Web server from an image in less than 6 minutes. This efficiency is important not only for speed of initial installation, but also for recovery situations when restore time is critical.

Web Server Testing

It is critical to test your Web application and Web servers in a controlled environment before subjecting the application to the real-world load of the Internet. The purpose of testing is twofold. First it helps you find problems with your hardware and Web application. Second it isolates and protects your production environment from random problems your new application may cause while you are performing your tests. Before creating a Web application testing environment, several of Microsoft's dot-com customers discovered they had application compilation errors and memory leaks after they released their Web application on the Internet. These problems were a significant source of downtime and cost to these customers. They found that after instituting a formal application test process, they were able to understand and resolve these types of problems in their test environment, which ultimately led to a higher level of customer satisfaction and reduced costs.

Best Practice – Setup a controlled test lab to test your application design and server hardware while protecting your production environment. The link below contains detailed information on building a Windows 2000 test lab, such as assessing your testing methodology, determining prerequisites for designing a lab, and client computer configuration.

Stress testing is also an important part of ensuring your application is ready for deployment. Tools such as Microsoft's Web Application Stress Tool (WAS) make it possible to easily create automated test scripts based on usage goals you expect your application to handle. It is important when making the test scripts to stress all components of your Web application. You should create scripts to stress test static content as well as ASP, SSL, etc. Once you have started the stress tests, it will be important to monitor your Web server using the Performance tool (Start – Programs – Administrative Tools – Performance) to determine if the usage goals you set are possible given your application and server configuration. Detailed recommendations on performance counters to monitor while testing can be found in the section below Windows 2000 Core Counters.

Best Practice – Use the Web Application Stress tool (WAS) to stress test your application and help you with future capacity planning decisions. You should create a stress goal before testing that you should work towards. An example would be to challenge your application developers to run your Web application under stress for 1-2 weeks without memory leaks or application freezes. While it is a good idea to test the usability of your site, Microsoft has learned that stress and load testing all components of your application will ultimately help you achieve higher availability. Along with helping you identify and resolve poor or inefficient coding in your application, stress testing will also help you make smarter decisions about future scalability needs. Additional information on how to configure WAS can be found using the following link:

Web Server Performance Tuning

In addition to using your test environment to look for coding errors in your application, you should use testing to help identify and ultimately tune any performance bottlenecks on your servers. Tuning your Web servers is beneficial for many reasons, such as creating a better user experience due to decreased server latency, maximizing the use of the hardware you currently have, and budgeting properly for capacity increases in the future. It is important to note that there is a clear relationship between performance and server availability. For example, symptoms such as a sluggish application can often turn out to be due to a memory leak which—when left unchecked—may consume all system resources and cause a server outage.

Microsoft has determined that the following four critical resources should be monitored closely to determine if you have any bottlenecks on your Web servers: Processor utilization, Disk I/O, Network I/O, and Memory utilization. Measurements have shown that if utilization of these resources is very high for an extended period of time either resources need to be added or the resources need tuning.

It is important to stress that performance monitoring of your application and servers should happen in all phases of the deployment process. You should begin to monitor your performance in your testing environment through deployment and finally in the production environment to pinpoint any bottlenecks that may exist on your systems. Bottlenecks can be caused due to improperly configured hardware or software or in the worst case insufficient hardware to handle the load placed upon it. Fortunately Microsoft has provided tools to help you measure the performance of various aspects of your Windows server, including hardware monitoring to Web application monitoring. Use the Performance tool pictured below in Figure 1 to help you identify and understand any performance bottlenecks that may be affecting your Windows 2000 server. You can access the Performance tool through Start – Programs – Administrative Tools – Performance.

Figure 1: Performance Tool.

The Performance tool groups its contents by Objects, Counters, and Instances. Objects are the highest level of abstraction and are defined as a logical collection of counters that is associated with a resource or service that can be monitored. Counters are data items associated with a performance object. For each counter selected, the Performance Tool presents a value corresponding to a particular aspect of the performance that is defined for that object. Instances are the lowest level of abstraction and are used to distinguish between multiple performance objects of the same type on a computer.

Windows 2000 Core Counters

Before you examine your Web application for performance bottlenecks, it is necessary to look for performance problems at the operating system level. Often a resource issue such as insufficient memory can cause other performance problems on your system. To understand and ultimately improve system performance, you should examine every part of the system for potential bottlenecks.

Examine the following performance counters closely to determine if you have processor, memory, disk, or network bottlenecks on your server. Along with the description of each counter, measured thresholds and recommendations are included to help you resolve any performance bottlenecks on your Web servers. All measurements have been done on Web servers with 2 processors and 1 GB of physical RAM. The performance counters are presented in the following format Object:Counter:Instance (if applicable)

Memory: Available Bytes – This counter displays the amount of physical memory available to processes running on the computer, in bytes. This is calculated by adding space on the Zeroed, Free, and Stand-by memory lists. Free memory is available for use, Zeroed memory are pages of memory filled with zeros to prevent later processes from seeing data used by a previous process and Stand-by memory is memory removed from a process' working set on route to disk, but is still available to be recalled. This counter displays the last observed value only.

Low values for Available Bytes (4 MB or less) may indicate there is an overall shortage of memory on your computer or that a program is not releasing memory. When the 4MB threshold is reached, an Event 26 will be recorded in the System Event Log indicating your system is low on virtual memory. In addition to the event logged, low memory conditions will be manifested in general sluggish performance on an affected system.

To avoid memory related performance problems, reserve at least 10% of memory for peak load on your Web site.

Memory: Cache Bytes – This counter reveals the size of the File System Cache, which is set to use up to 50% of available physical memory by default up to a maximum of 900MB depending on the amount of physical memory installed. This is an important counter to measure as IIS automatically trims the cache if it is running out of memory. This counter can be used along with the Private Bytes referenced below to isolate an application with a memory leak.

Memory: Page Faults/Sec - This counter measures the average number of pages faulted per second. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant performance problems. If the values for this counter are low, your server will be responding to requests quickly. If the values are high, you may have dedicated too much memory to the file system cache, not leaving enough memory for the rest of the system. A sustained high number of page faults/sec may indicate the need to increase the amount of physical memory on your server.

Process: Private Bytes: (Inetinfo, Dllhost) – This counter measures the current amount of memory Inetinfo has allocated that cannot be shared with other processes. Monitoring this counter will be helpful in isolating memory leaks in your Web application as you will notice these processes allocating more memory without releasing the memory back to the system over an extended period of time. If the isolation level for your application is set to low, monitor the Private Bytes counter for the Inetinfo process. If the isolation level is set to medium or high, monitor the appropriate DLLHost processes.

System: Processor Queue Length – This counter displays the number of threads waiting to be run in the processor queue. There is a single ready queue for processor time even on computers with multiple processors. This counter counts ready threads only, not threads that are running. A sustained processor queue length greater than 2 threads per processor may indicate processor congestion which may appear to the user as a server being slow or non-responsive. If more than a few program processes are contending for most of the processor's time, installing a faster processor will improve throughput. An additional processor can help if you are running multithreaded processes, but it is important to note that scaling to additional processors may have limited benefits.

Note: In the case of long thread waits; the processor queue can be greater than 2 on an idle system.

Processor : % Processor Time - This counter displays the percentage of time that the processor uses to execute a non-Idle thread. This counter is the primary indicator of processor activity, and can be used to troubleshoot problems such as a server appearing sluggish or non-responsive. It is important to understand the role of the computer when analyzing this performance counter. For example, if you are monitoring a user's computer which is used primarily for a CAD application, the CAD application might easily use 100% of the processor time while it is running. On a server which processes many client requests, values around 100% indicate that processes are queuing up waiting for available processor time, and causing a bottleneck. Sustained processor utilization around 100% on servers is unacceptable for this fact and suggests the need for additional processors or by changing the workload. An acceptable threshold for processor utilization on Web servers is 70%.

Network Interface : Bytes Total/Sec – This counter measures the rate at which bytes are sent and received on a network interface. To determine if your network connection is creating a bottleneck, compare the Network Interface: Bytes Total/sec counter to the total bandwidth of your network adapter card. To allow headroom for spikes in traffic, you should usually be using no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection may well be the problem.

Physical Disk: %Disk Time - This counter measures the elapsed time that the selected disk drive is busy servicing read and write requests. If this counter is high (over 90 percent) you should check the Physical Disk: Current Disk Queue Length counter to see how many system requests are waiting for disk access. The number of waiting I/O requests should be sustained at no more than 1.5 to 2 times the number of spindles making up the physical disk (See your server manufacturer for spindle information). In general, most disks have one spindle, but RAID devices usually have more. A hardware RAID device will appear in System Monitor as a single physical disk while RAID devices created through software appear as multiple drives. You can choose to either monitor the Physical Disk counters for each physical drive (other than RAID), or you can use the _Total instance to monitor data for all the computer's drives.

Use the values from the Physical Disk: Current Disk Queue Length and Physical Disk: % Disk Time counters to detect bottlenecks with the disk subsystem. If these values are consistently high, you may want to consider upgrading the disk drive or moving frequently accessing files to another disk or server.

Note: If you are using a RAID device, the % Disk Time counter can indicate a value greater than 100%. If this happens, you should consider using the Physical Disk: Average Disk Queue Length counter to determine how many system requests on average are waiting for disk access.

Web Application Performance Counters

To aid the administrator in identifying problems with your Web application, Microsoft has provided several objects in the Performance tool. Use the below list of Web application performance counters to determine if you have a performance bottleneck with your application. Along with a description of each performance counter, thresholds and recommendations have been included to help you quickly identify and resolve any performance problems in your application. All thresholds and recommendations are based on measurements on Web servers with two processors and 1GB of RAM. The performance counters are presented in the format (Object:Counter:Instance (if applicable)

Active Server Pages:Requests Executing This counter measures the number of requests currently executing. This counter indicates whether the application is effectively executing one request at a time or not. If requests executing is just 1, requests are being serialized for some unknown reason. A common source of the serialization is if you have turned ASP debugging on through Internet Services Manager.

For more information and code serialization samples please see:

Active Server Pages:Requests Queued - This counter measures the number of requests waiting for service from the queue (an ideal value would be zero). If this number keeps increasing continuously, the ASPs are blocking threads and no threads are being released to service another request from the queue. If the number of Requests Queued fluctuates considerably during stress and processor utilization remains relatively low, this is an indication that the script is calling a COM object that is receiving more calls than it can handle. In this case, the COM object is the bottleneck. At this point, it may be necessary to increase the ASPProcessorThreadMax metabase entry which specifies the maximum number of worker threads per processor that IIS creates.

For more information on the ASPProcessorThreadMax metabase entry, please see:

Active Server Pages: Sessions Total – This counter measures the total number of sessions since the service was started. If monitoring the total sessions being created for a specific test script, stop and start the Web service before the test run to more accurately monitor the total. Make sure that while the script is running, the Sessions Total number keeps gradually increasing until it reaches the desired total. If the Sessions Total never reached the desired total, you may want to stop and restart the Web service and start another test run.

Web Service: CGI Requests/sec and ISAPI Extension Requests/Sec – These counters measure the rates at which your server is processing CGI and ISAPI application requests. If these values decrease while the load is increasing, you may want to have your application developers revisit their code.

Web Service: Get Requests/Sec and Post Requests/Sec – These counters reflect the rate at which these two common HTTP request types are being made to your server. POST requests are generally used for forms and are sent to ISAPIs (including ASP) or CGIs. GET requests account for almost all other requests from browsers and include static files, requests for ASPs and other ISAPIs, and CGI requests. These counters are important to understand general load characteristics of your site.

For more detailed information on Web server tuning, examine the following resources:

Achieving 99.999% Availability

As noted earlier, the information contained in this document is meant to inform the administrator how to design, build, and maintain 99.99% single server availability. It is often very difficult to achieve single server availability greater than 99.99%. Fortunately it is possible to employ the lessons learned through this document to achieve 99.999% service availability. 99.999% service availability means that your Web service cannot be unavailable for more than 5 minutes per year. While this is an aggressive goal, there are practical methods you can use to achieve this goal.

The most effective strategy to achieve 99.999% service availability is to incorporate network load balancing and hardware fault tolerance in your Web service. The following list details different components that you can employ in your environment to achieve 99.999% service availability:

  • Clusters Microsoft Cluster Server (MSCS) provides failover support for critical applications and services, such as SQL Server and Exchange. Microsoft Cluster Server is best suited for server applications designed to support fault tolerant transactions that maintain persistent, server-based state information. As opposed to NLB, when an application failure happens on MSCS, the Cluster service maps the published IP address of the virtual server to a remaining cluster node so the client can reestablish the connection without knowing the application is being hosted on a new physical node. Windows 2000 Advanced Server has support for 2 node clusters while Windows 2000 DataCenter Server has support for 4 node clusters. The graphic below summarizes the architecture of MSCS.

  • Network Load Balancing Microsoft Network Load Balancing provides failover support for TCP/IP-based applications and services, such as IIS and Windows Media Services. Organizations can use Network Load Balancing to cluster up to 32 computers to evenly distribute incoming traffic while also monitoring the health of servers and network adapters. NLB is an ideal solution for mission critical e-commerce and hosting applications. Unlike MSCS, NLB should not be used to scale applications such as Microsoft SQL Server that independently update inter-client state because updates made on one cluster host will not be visible to other cluster hosts. NLB has been provided by Microsoft since Windows NT 4 Enterprise Edition where it was referred to as WLBS (Windows Load Balancing Service). The following picture summarizes the architecture of NLB.

  • This is a feature available in Microsoft Application Center 2000 which allows COM+ components to be load balanced. In CLB, the COM+ components are maintained on servers in a separate COM+ cluster. Calls to activate COM+ components are load balanced to different servers within the COM+ cluster. More details on COM+ load balancing and Application Center 2000 can be found in the article, "Application Center 2000 Component Load Balancing Technology Overview".

  • Standby systems. Standby systems provide backup systems in case of total failure of a primary system. MSCS can offer this level of functionality if you choose to implement the cluster in an Active-Passive configuration. The Passive node will only be used when an application such as SQL 2000 or Exchange 2000 forces a failover or if the administrator chooses to fail over from the Active node.

  • Spare parts. Spare parts ensure replacement parts are available in case of failure. Hard drives, processors, and memory are examples of spare parts you can keep onsite to resolve hardware failures quickly.

  • Fault-tolerant components. Fault-tolerant components improve the internal redundancy of systems. Storage devices, network components, fans and power supplies are examples of components that can be configured for fault tolerance. For example, storage devices can be configured to provide fault tolerance by using RAID (Redundant Array of Inexpensive Disks).

Best Practice - Work with your server vendor to employ redundant hardware in your Web servers as well as use network load balancing service to achieve 99.999% service availability. The following articles describe the technical details behind NLB, MSCS and CLB and ultimately will help you understand the benefits associated with each technology.

Managing Web Servers

Once you have designed, built, and tested your Web applications and servers thoroughly for compatibility and performance bottlenecks, you must consider how you plan to monitor and manage these servers. An effective strategy for managing and monitoring Web servers will minimize the amount of time necessary to assess and understand root causes of failures as well as to ensure servers are performing as designed under real-world load scenarios

Microsoft Monitoring and Management Tools

This section begins with an explanation of management and monitoring tools available from Microsoft included in Windows 2000. The next section describes tools provided in Microsoft® Windows Server System™ applications that you can use to develop your Web server management strategy.

Management and Monitoring using tools included with Windows 2000

Microsoft Windows 2000 Server products give administrators the tools and technologies they need to efficiently maintain servers and networked users from a centralized location. In addition to using the Performance tool described earlier, the following tools should be used to efficiently manage and monitor your Web servers.

  • Microsoft Management Console (MMC) – MMC provides a consistent way to perform management tasks. Microsoft provides a series of snap-ins that administrators can use to manage all aspects of their Windows 2000 Servers. For example, Web server administrators can connect to and manage remote Web servers through the Internet Information Services snap-in.

  • Windows Management Instrumentation (WMI) – WMI is a standard which lets management applications from different sources manage an organization's devices such as services and applications in a consistent way. Web server administrators can use WMI to query the status of IIS services on remote servers.

  • Terminal Services – Terminal Services can be installed in 'Remote Administration' mode which will allow administrators to connect to the familiar Windows desktop environment from remote machines for administrative purposes such as starting and stopping a particular service.

  • Task Scheduler – Task Scheduler allows any script or program to be invoked at any time interval as well as on events like system boots and user logons. A Web server administrator could use Task Scheduler to create a task to copy IIS logs to a central management station on a scheduled basis.

  • Alerter Service – The Alerter service expands the capabilities of Performance Monitor by allowing the administrator to send alerts when one or more of the counters exceed a preset threshold. For example, a Web server administrator can configure an alert to be sent when the % Processor Time performance counter exceeds 70%.

For more information on these tools and other management tools included in Windows 2000, please examine the following articles:

Management and Monitoring using Windows Server System applications

In addition to the management and monitoring tools included in Windows 2000, Microsoft has developed management applications that help administrators monitor and manage the events and performance of Windows 2000 based systems.

Microsoft Operations Manager 2000

Microsoft Operations Manager 2000 provides event and performance management for the Windows 2000 Server family of operating systems and Windows Server System applications. Specifically, this application:

  • Provides comprehensive event management through an enterprise event log that collects and reports on problems and information generated from systems and applications across the network.

  • Provides proactive monitoring and alert messaging to pagers, through email or other external means. These alerts can also cause actions that repair the original problem.

  • Provides reporting and trend analysis that can be used to track problems over time and generate detailed reports on the overall health of your environment.

  • Interoperates with other management systems through its support for management technology standards such as SNMP, WMI, and CIM.

Microsoft Application Center 2000

Microsoft Application Center 2000 is Microsoft's deployment and management tool for high-availability Web application built on the Windows 2000 platform. Application Center 2000 allows Web developers and administers to deploy applications easily to multiple Web servers. Specifically this application:

  • Allows administrators to create logical groupings of applications including components and configuration which minimizes the complexity of installing and testing.

  • Allows administrators to manage a group of servers as a single entity.

  • Streamlines deployment of applications which will ensure consistency of applications across all Web servers.

  • Provides tools that monitor Web servers that enable viewing of performance and event log data from one server or the entire cluster through a remotely viewable Web interface.

  • Helps to eliminate manual tasks with the ability to automate event responses based on particular events and conditions.

For more details on these two application suites, see the following Web sites:

Now that you are aware of management and monitoring tools offered by Microsoft, the next several sections discuss best practices for server failure and recovery such as debugging Windows 2000 as well as your Web application, using the IIS Recycle tool, and recommendations for evaluating and installing security updates and hotfixes.

Best Practices for Server Failure and Recovery

In addition to performance monitoring and planning for your Web server deployment, it is also necessary to have a plan in case your Web server fails. There are many tools available from Microsoft that will help you determine the root cause of a failure and allow you to return your Web server to an operational state. As mentioned in the availability section earlier, to achieve 99.99% availability on a single server you cannot have more than 52 minutes of downtime in a year. To achieve this goal you will need to minimize the downtime associated with any failure events by quickly understanding root cause and working to restore the Web server to production service in a timely manner.

The following sections outline tools and processes you can use to help identify failure root causes in your environment as well as prevent the failures from happening in the first place.

Windows 2000 Debugging

Windows 2000 includes many debug tools you can use to isolate failures in your Web application. The debug tools are included on the Customer Support Diagnostics CD-ROM included with every copy of Windows 2000 Server and Windows 2000 Advanced Server. Included on the CD-ROM are the traditional Windows NT debugging tools such as i386kd, KD, Windbg, and CDB, as well as the symbols you will need to use the debug tools. These tools are instrumental in solving such issues as access violations and application hangs.

Best Practice – Download and install the latest debugger tools to assist you in diagnosing and resolving problems with your Web application:

Web Application Debugging

In addition to the standard debug tools mentioned above, there are additional resources available to debug Web applications. These tools and processes were developed as a result of direct customer feedback from many dot-com companies sites.


After analyzing customer dump files from crashing or hanging COM+ applications and Active Server Pages, we frequently find that the required settings for server-side components are not in place. Two of the required settings for server-side components are setting the Retain in Memory and Unattended Execution options. Visual Basic 6.0 ActiveX dynamic-link libraries (DLL) that are compiled without these options may cause COM+ applications and Web sites to crash. By default, ActiveX DLL's created using Microsoft Visual Basic 6.0 do not have the Retain in Memory and Unattended Execution options checked.

VBCHKW2K is a tool that will search for VB6 DLLs that were compiled *without* the "Retain in Memory" and "Unattended Execution" properties enabled.

You can find the VBCHKW2K tool documentation in the Appendix. In addition to the VBCHKW2K tool, the link below will help you debug your COM objects in Windows 2000.

Web Application Failure Prevention

Even after considerable time and effort spent designing and testing your Web application, you may notice problems after moving your application to a production environment, as load simulation and fault injection in a test lab do not create an environment identical to the Internet.

IIS 5 Recycle Tool

Microsoft has recently developed a utility that you can use to automatically recycle IIS processes if you notice problems with your application, such as a memory leak after moving your application into a production environment. By default, this tool can automatically recycle IIS based on virtual memory usage, total HTTP Get Requests served (determined from the Total Get Request Performance counter), schedule based, IIS Uptime based, and ASP Requests queued. IIS Recycle will recycle both in process Web applications (Inetinfo.exe) as well as out of process Web applications (dllhost.exe).

The IIS Recycle tool has integrated support for NLB. On an NLB enabled system, the IIS Recycle tool will remove the Web server from the NLB cluster before recycling the IIS process. After the IIS process has been recycled, the IIS Recycle tool will add the Web server back into the NLB cluster. Microsoft has also included a watermark feature as part of the NLB support in IIS Recycle. The administrator can configure a watermark level, or minimum number of active hosts in a NLB cluster, and if the current number of active hosts is below or equal to the watermark, the IIS Recycle tool will skip the recycle until the number of active hosts is above the threshold.

The IIS Recycle tool also includes the ability to run custom scripts before and after each IIS Recycle. Some customers have used this script feature to enable drain stops on 3 party load balancing solutions as well as send messages to administrators indicating the IIS Services were recycled successfully on a particular Web server. On a NLB enabled system, the scripts will be executed after the NLB drain and before the NLB start.

Figure 2 displays production Web server uptime improvements at a Microsoft property after installing the IIS Recycle tool. Before the installation of the IIS Recycle tool all Web servers were rebooted on a weekly basis, as shown at the far left hand side of Figure 5. After the installation of IIS Recycle, no server has been rebooted and as displayed in the below figure, Web server uptime is now greater than 60 days.

Figure 2: Production Web Server Uptime Improvements Using IIS 5 Recycle

Best Practice – Use the IIS 5 Recycle Tool to recycle your Web application and prevent future application failures per administrator defined rules. The IIS 5 Recycle Tool can be downloaded from the Microsoft Download Center using the link below. In the future the IIS 5 Recycle Tool will be available through MSDN, TechNet, and the IIS Resource Kit.

Download the IIS 5 Recycle Tool

Documentation for the IIS 5 Recycle tool including custom configuration settings is available in the Appendix.

Scheduled Upgrades

In addition to performance monitoring, another big challenge for any system administrator is managing software maintenance and updates. Software maintenance can take multiple forms, including applying hotfixes, Service Packs and Security updates. Each form may involve a separate maintenance strategy as well as separate installation mechanisms. It is important as a system administrator to be able to assess how critical a particular fix is for your environment as well as understand the most effective and efficient way to install those critical fixes.

One of the ways Microsoft has chosen to maintain their servers is by instituting a scheduled platform upgrade to be installed on all Microsoft managed servers. This platform upgrade is applied twice annually and contains any released Service Packs, tested hotfixes, and may also contain software updates to monitoring or management software and server specific hardware updates (firmware, BIOS, etc.). In the past, Microsoft had tested and deployed Service Packs and hotfixes as they became available. This became a complex management task as the time between Service Pack releases grew. By adhering to a standard bi-annual platform upgrade, Microsoft can ensure their customers (employees, partners, and business units) are running a supportable, reliable configuration of Windows.

Security Updates

While it is not always necessary to install every Security Update that becomes available, it is important to assess the applicability of every Security Update in your environment. For example, if you do not have Windows 2000 Active Directory deployed in your organization, you will find that the Microsoft Security Bulletin MS01-024 will not be applicable to you. Each Security Update contains detailed information which will help you determine whether or not it is applicable in your environment. Understanding your security risks and platform configuration will make the decision to install a particular Security Update easier for the system administrator.

Best Practice – Use the HFNetCheck tool periodically to determine which Microsoft Security Updates have not been applied to your Web servers. HFNetChk is a command-line tool that enables an administrator to check the patch status of all the machines in a network from a central location. The tool does this by referring to an XML database maintained by Microsoft. HFNetChk can be run on Windows NT 4.0 or Windows 2000 systems, and will scan either local or remote systems for patches available for Windows NT 4.0, Windows 2000, Internet Explorer 5.01 and later, Microsoft SQL Server 7.0 and SQL Server 2000 and all Windows system services including Internet Information Services 4.0 and 5.0.

Below is an example of the output you can expect to see after running HFNetCheck:

C:\downloads\temp>hfnetchk -h server1
Microsoft Network Security Hotfix Checker, 3.1
Developed for Microsoft by Shavlik Technologies, LLC (
** Attempting to download the XML from
download/xml/security/1.0/NT5/EN-US/ **

File was successfully downloaded. Attempting to load C:\downloads\temp\mssecure.xml. Using XML data version = Last modified on 8/21/2001. Scanning server1 ...............

Done scanning server1

Patch NOT Found MS00-077        299796
Patch NOT Found MS00-079        276471
Patch NOT Found MS01-007        285851
Patch NOT Found MS01-013        285156
Patch NOT Found MS01-025        296185
Patch NOT Found MS01-031        299553
Patch NOT Found MS01-036        299687
Patch NOT Found MS01-037        302755
Patch NOT Found MS01-040        292435
Patch NOT Found MS01-041        298012
Patch NOT Found MS01-046        252795
Internet Information Services 5.0
Patch NOT Found MS01-025        296185
Patch NOT Found MS01-044        301625
Internet Explorer 5.01 SP2
Patch NOT Found MS01-027        295106

To download HFNetCheck, see :;en-us;303215&sd=tech

Hotfix Management

Like the Security Updates, it is not always necessary to install every applicable Windows hotfix that becomes available. Hotfixes are created to resolve specific, reproducible errors with a particular component of Windows. As opposed to Service Packs, hotfixes are not fully regression tested, which means you must test hotfixes in a test environment before deploying on production servers.

In addition to the bi-annual scheduled upgrade described earlier, the Microsoft data center team has created and adopted a standard process for evaluating the applicability and priority of hotfixes and Security Updates as they are released. The following priority ranking system has been developed to help Microsoft's internal customers make intelligent decisions regarding installing fixes outside of the bi-annual scheduled upgrades:

  1. Critical - This fix should be on all systems as soon as possible. An example is a Security Update examined and found to be applicable to the Microsoft environment.

  2. Highly recommended – Considered necessary as part of the standard platform. If a server is going to be debugged, this fix must be installed prior to any debugging.

  3. Recommended - Considered necessary as part of the standard platform. This fix is not absolutely essential to every server but aids in keeping a stable platform on all servers.

  4. May be necessary - This fix is needed to fix specific cases and will not likely be included in the next scheduled platform upgrade.

  5. Probably not necessary - Only needed in very special cases. This fix will never be included in the next scheduled platform upgrade.

The Microsoft Windows Sustained Engineering team has put considerable effort into reducing the amount of reboots necessary when installing hotfixes. Often hotfixes as well as Security and Critical Updates are fixes to user mode components. In most cases, it is possible to apply a hotfix to a user mode component and enable the hotfix without a reboot. For example, it is possible in certain situations to stop a running service, copy the files in the hotfix onto your system and restart the service. Since the files were not locked when the service was stopped, the new files were able to overwrite the old files without a conflict. Once the service was restarted, the new files were used and the hotfix was installed. This functionality described will be included in an update to the Microsoft hotfix installer which will be released after Windows 2000 Service Pack 2. In addition to adding support in the hotfix installer to stop and start necessary services, Microsoft has added support for hotfix chaining. This feature is a tool customers can use to safely chain multiple hotfixes together with only a single reboot.

Use QFECheck to determine hotfixes installed

One challenge that bothers many system administrators is how to determine the OS and hotfix configuration of every server in their environment. Until recently, this task was often difficult and required custom development of tools and scripts to pull fragmented hotfix installation information from the registry and file system. A tool called QFECheck was included in a recent Microsoft Security Bulletin . Its purpose is to solve the problem of determining the hotfixes and OS version of a Windows 2000 computer.

Best Practice – Download and run QFECheck to determine the hotfixes and Service Pack installed on your Windows 2000 computers. QFECheck also can be used to validate the successful installation of any Windows 2000 hotfix (Security Update or other) and will offer instructions if it determines a configuration problem exists with a hotfix on your computer.

Here is an example of sample output of QFECheck:

Windows 2000 Hotfix Validation Report for \\Server1
Report Date: 3/1/2001  11:27am
Current Service Pack Level:  Service Pack 1
Hotfixes Identified:
259524:  Current on system.
280838:  Current on system.
282784:  Current on system.

QFECheck cannot be run remotely against a set of servers. It is possible to setup QFECheck to run via Scheduled Task and output the log to a single location, making hotfix administration easier.

Download QFECheck


Registry Script Used to Disable Non-Essential Services

Here is an example of how many customers have chosen to disable Windows 2000 services that are not essential to their Web application. The list below is not a recommendation by Microsoft of services that should be disabled, rather a list of what services many customers have chosen to disable.

[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \AppMgmt]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \ClipSrv]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Fax]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Messenger]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \NetDDE]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Spooler]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \UPS]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \W32Time]

Registry Script to Increase Event Log Size

In addition to using the .reg files to manage the list of running services, customers have also chosen to use .reg files to ensure the appropriate configuration of their event logs. Here is a typical way customers choose to configure their event logs by using the registry:

[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Eventlog \Application]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Eventlog \Security]
[HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Eventlog \System]

Event Log Analyst Tool

Event Log Analyst (ELA) is a tool that collects reliability information from WINDOWS NT event logs. ELA runs on a single server, called the collection server, and sequentially retrieves event log information from other WINDOWS NT servers. Once ELA data is collected, it can be analyzed using ancillary tools.

ELA collects several types of basic information from the System and Application event logs. Data collected includes:

  • Basic information about a Windows NT event log: the number of system events, the timestamp of the first system event, and the number of application events.

  • The timestamp of all system reboots. On computers running at least Windows NT 4.0 Service Pack 4, it also collects the timestamps of the system shutdown events.

  • Information about any Windows NT system crashes (bug checks or blue screens.

  • Information about any application crashes (user-mode dumps or Dr. Watsons).

For a complete list of events collected by ELA please see Events Collected by ELA.

ELA is designed to be unobtrusive. Because ELA only uses publicly documented interfaces for remotely accessing the event log, it does not require any software to be installed on the systems where the event logs reside. ELA is trivial to install. It is a single executable image that runs on any Windows 2000 or later system. ELA has a low impact on production environments. Typically, ELA accesses a remote server for less than 30 seconds during the collection process. In tests using the Microsoft Corporate Data Center, ELA scanned the event logs of over 1,500 servers in about 40 minutes using a collection system with a 100 Mbps LAN connection.

Installing ELA

To install ELA from floppy:

C:\> mkdir c:\ela
C:\> copy a:\ela.exe c:\ela

To remove ELA from a system:

C:\> del c:\atela\*.*
C:\> rmdir c:\atela

Configuring ELA for Data Collection

A file containing a list of servers tells ELA where to look for event logs. The server list is a simple text file. Here is an example server list to collect data from two Windows-based servers:

# My server list

Any server list line beginning with the "#" character is a comment line and is not processed by ELA. All other lines are assumed to be Windows NT server names. Typically, server lists use the .txt file extension.

A comment line should be used in your server list to identify the server's roles in your datacenter. These roles can include but are not limited to SQL Servers, Web Servers, File/Print Servers, and Domain Controllers. In order to maintain consistency in the data collection, it is recommended to contact Microsoft if you plan to change the names of the collection servers or if you plan to add or remove servers from your collection.

Here is an example server list to collect data from two Web Servers, 3 SQL Servers, and 1 Domain Controller:

 # Web Servers
# SQL Servers
# Domain Controller

Configuring Remote Servers for Data Collection

In general, there is no configuration necessary for remote servers. However, there are optional system settings can skew ELA results. By default, Windows-based servers are configured to make event log entries if system crashes (STOP errors or Windows NT blue screens) occur.

To check this setting:

  1. Click Start, click Settings, click Control Panel, click System, and click the Advanced tab.

  2. On the Startup/Shutdown section, click Settings. The setting Write an event to the system log should be enabled when STOP errors occur.

Additionally with ELA's reliance on the event logs for data collection, it is recommended to increase the remote server's event log size to 20 MB. To increase the size of the event logs on a Windows 2000 Server, you will need to perform the following steps:

  1. Open Event Viewer by clicking StartProgramsAdministrative ToolsEvent Viewer.

  2. Click an Event log.

  3. Click Action and select Properties.

  4. Increase Maximum Log Size to 200000 (20MB)

Collecting Data

ELA Usage:

Event Log Analyst, Version
Collects specified event log entries from a list of remote machines
(C) Copyright 1998-2001, Microsoft Corporation
Used by permission only - do not distribute.
ELA ServerList [/t:n /r:n /n:Name]
ServerList Text file containing list of servers to process,
each line of the file should contain a single servername
NOTE: Not used with /c switch.
/t:n Use n threads to process list, default is 40, max is 200
/n:Name Use Name to build the output filenames,
helps you to distinguish one collection from another
/r:<retry count> # of times to retry collecting from a particular server
/i:<init file>Gather custom events specified in <init file>
/b:<path>  Log to a database located at <path>
(/s switch not used with this switch)
/c Perform an incremental collection against the database specified  
with /B or /S  
(/b or /s switch required with this switch)
NOTE: Not used with the ServerList parameter.
/s:<string>Enter user-defined database connection string
(/b switch not used with this switch)
If not given the /b switch, ELA generates the following CSV files:
Autocheck All instances of automatic disk checking at startup.
BugChecks List of all STOP errors found.
DrWatsons Usermode access violations recorded by Dr. Watson.
ExchangeAll All exchange events.
OutOfVM All occurances of the Out of Virtual Memory pop-up.
RAW List of all events recorded by ELA on this run.
RebootReasonCollector All reboot reason events.
Reboots All detected reboots.
SCM All Service control manager events.
ServerErr List of servers not processed, with explanation.
Servers List of servers successfully processed, with additional
Summary Lists ELA start/stop/elapsed times and the number of
servers processed during each run.
ELA generates the following TXT files:
ServerList Lists each server and the last event collected. For
incremental collections
RunInfo Machine info and Time taken to process each server
Each filename is in the form ELAXXXXXX_Name_Day_Month_Year_OutputFilename.csv
For example: ELA060000_WebServers_1_Feb_1999_Reboots.csv

Once the server list file is created, you can collect event log data using ELA.

C:\>ela servers.txt
servers.txt: Starting the collection at 19-May-2000 15:54:17
Processing logs on ARAGORN from the beginning(1/43).
Processing logs on X1DOCSERV from the beginning(43/43).
Total time elapsed: 00:04:15
All output files written to ELA_19_May_2000_*.csv

ELA displays one line of output for each event log scanned. This line contains:

  • The server name, e.g. "SERVER-01"

  • The server count from the server list, displayed as "(<current server number>/<total number of servers>)". For example, a display of (4/55) means this entry is for the fourth server in a list of 55 servers.

By default ELA uses 40 worker threads to collect event data from up to 40 servers simultaneously. As each thread completes collection from a server, it is assigned the next server from the list. Once collection is started on the last server, ELA waits until all collection threads are completed. This can take several minutes.

You can specify the number of threads to use by using the command line switch "/t:n" where 'n' is the number of threads you wish to use. You may specify 1 to 200 threads; each thread you use will increase the amount of memory your system uses. Increasing the number of threads will only increase the speed of collection if your are not limited by your network connection. For most networks 40 threads is sufficient. If you have a 100 Mbps LAN connection, you may find 200 threads the most efficient.

Analyzing ELA Data

ELA produces 12 comma separated value files when run:

  • ELA_<your collection name>_<collection date>_Servers.csv

  • ELA_<your collection name>_<collection date>_ServerErr.csv

  • ELA_<your collection name>_<collection date>_Autocheck.csv

  • ELA_<your collection name>_<collection date>_Bugchecks.csv

  • ELA_<your collection name>_<collection date>_RAW.csv

  • ELA_<your collection name>_<collection date>_Reboots.csv

  • ELA_<your collection name>_<collection date>_DrWatsons.csv

  • ELA_<your collection name>_<collection date>_OutofVM.csv

  • ELA_<your collection name>_<collection date>_ExchangeAll.csv

  • ELA_<your collection name>_<collection date>_SCM.csv

  • ELA_<your collection name>_<collection date>_RebootReasonCollector.csv

  • ELA_Summary.csv

  • ServerList_<collection date>.txt

  • Runinfo_<collection date>.txt

These files can be imported into Excel and analyzed for various trends. Microsoft has some internal tools under development to perform further analysis, but those are not ready for distribution.

ELA Performance Impact

ELA was designed to cause minimal impact in production data centers. ELA uses the same application programming interfaces as the Windows Event Viewer application. ELA's performance impact is roughly the same as an operator using Event Viewer on a remote system and rapidly displaying pages of event information. ELA collection times vary due to the size of the event log and the bandwidth of the network connection.

Privileges Needed to Collect Events with ELA

It is strongly recommended that ELA be run from an account with administrative privileges on the target systems. The time zone of the target systems, for example, can greatly affect the calculations performed on data retrieve by ELA and is only obtainable if the user has administrative privileges.

Access to the event logs is determined by the account under which the application is running. The LocalSystem account is a special account that Windows NT services can use. The Administrator account consists of the administrators for the system. The Server Operator account (ServerOp) consists of the administrators of the domain server. The World account includes all users on all systems. ELA accesses the Application and System event logs, but not the Security event log.

The following table shows which accounts are granted read, write, and clear access to each log.


































ELA uses the OpenEventLog function, which requires Read access. A member of the ServerOp account can call OpenEventLog for the Application event log and the System event log, because ServerOp has read access for both of these logs. However, a member of the ServerOp account cannot call OpenEventLog for the Security log, because it does not have read access for this log.

Events Collected by ELA

Event Source

Event Id

Event Sub ID




System was abnormally shutdown



System Start



System Shutdown



Operating system version at boot time

Save Dump

1000, 1001

Blue screen event



Blue screen event



Application Failures

Application Popup


Out of Virtual Memory errors



Chkdsk was run on system boot to scan for errors

IIS 5.0 Recycle Tool

IIs5Recycle runs as a service on a Web server running Windows 2000 with Internet Information Server 5.0. The purpose of the tool is to automatically recycle IIS processes based on the recycle configurations stored in the registry.

Functionality Overview

  • The tool provides the ability to automatically recycle IIS process based on:

  • Virtual Memory Usage (InetInfo.exe + DLLHost.exe for out of process applications).

  • Total HTTP Request Served (Total Get Request Performance Counter).

  • Schedule Based (Recycle IIS at 1:00AM <local time> on daily or weekly basis).

  • IIS UpTime Based (Recycle IIS after it has been running for 240 hours <in hours>).

  • ASP requests queued (Recycle IIS if the ASP requests queued is more than the configured value in %ASPThresholdRetries% retries.

  • On a Windows Network Load Balanced (NLB) enabled system, it will remove the Web server from the cluster before recycling the IIS process.

  • Provide the ability to run a custom command(script) before and after each IIS recycle.

  • Forcefully stop IIS if a recycle request failed after a configurable amount of time. This will ensure that IIS will be recycled when a recycle threshold is met.

  • Ability to configure IIS process recycling settings via a UI.

  • Ability to run IIs5Recycle in audit mode without actually recycling the IIS process. This will enable the administrator to fine-tune the recycling settings based on application and environment specifics.

Recovery Feature

The Service Control Manager (SCM) available through the Computer Management MMC snap-in can be configured to take an action when a service fails unexpectedly. The IIS5Recycle service will use this feature as a fatal error notification/recovery method. If any of the following listed failures happened, the IIs5Recycle service will terminate itself to triage the SCM recovery action. For example:

After IIS process was stopped, IIs5Recycle could start it after several retries. On a WLBS enabled system, after the server was removed from the cluster, IIs5Recycle couldn't restart the WLBS.

Note: Regardless of whether or not the Recovery Feature is enabled, an event will be logged into Windows EventLog.

By configuring the recovery actions for the IIs5Recycle service, the administrator will be notified if an unrecoverable event occurred on the Web server.

Installing IIS 5.0 Recycle Tool

  1. Open a CMD window

  2. Navigate to C:\Program Files\Microsoft\IISRecycle directory

  3. Install the IIS5Recycle service by typing IIS5recycle /install

  4. Merge the IIS5Recycle.reg into the local registry (Note – This is for optional value ONLY)

  5. Configure the recycling conditions by typing IIS5Recycle /config

  6. Start IIS5Recycle service by typing net start IIS5recycle

IIS Recycle Registry Values

All the IIS Recycle configuration settings will be stored in the registry under:

HKEY_LOCAL_MACHINE \SOFTWARE \Microsoft \Windows NT\CurrentVersion\IIs5Recycle. Any registry changes will be picked up by IIs5Recycle automatically. For more information, see the table below.

Note: The first five UI configurable registry values set the recycle conditions for IIS, which IIS5Recycle uses to decide when to recycle the IIS process. The rest of the registry values are optional values that are made available by merging the IIS5recycle.reg file into your local registry.

IIs5Recycle will poll registry changes each time it wakes up.


This allows you to run the IIS5Recycle tool in an audit mode without interrupting the IIS service. When using audit mode, IIS Recycle events will be stored in the Windows Event log and can be used to assess the IIS Recycle conditions that should be configured for your Web server.

Advanced Control (the scripting feature)

"Advanced Control" UI allows customer to specify a command (script) to run before and after each IIS recycle. The return value of the "Before Command" will be verified if the "Verify return value" checkbox is checked, if the script doesn't return expected value, IIS5Recycle will treat it as a failure and skip the current recycle. The output of the script will be redirected to a log file under: %windir%\system32\LogFiles\IIs5Recycle.

If you didn't see expected output in the log file, PLEASE make sure the script name is correct and it is copied to %windir%\system32 directory. If the script needs to access other executable files, make sure all those files are under %windir%\system32 as well.

On a WLBS-enabled system, the script will be executed after the NLB drainstop and before NLB start.

WaterMark feature for WLBS

There are two requirements about using the WaterMark feature:

  • Enable the remote control on each WLBS node.

  • If password is required for remote control, each node needs to use the same password.

IIS Recycle Events

The following is the partial list of the events IIS Recycle will log.


VBCHKW2K Requirements

  • Supports Windows 2000 Systems only

  • Administrative privileges required

VBCHKW2K Features

VBCHKW2K.exe is able to perform the following when run from a command line:

  • A non-intrusive scan of the COM+ catalog (which can even detect uninstantiated components) for improperly compiled VB DLL's registered in COM+ applications.

  • Scan for any VB components running in IIS In-Process applications (inetinfo.exe).

  • Scan for any VB components running in IIS Out-Of-Process Pooled applications (dllhost.exe).

  • Scan for any VB components running in IIS Out-Of-Process applications (dllhost.exe).

  • Scan for any VB components running in their own COM+ servers (dllhost.exe).

VBCHKW2K will then produce a report which will be displayed on the screen, listing all of the DLLs that were found to be improperly compiled. Here is a sample report:

VBCheckW2k Version 1.2.1a
Microsoft Corporation - Dec 2000
Mon Dec 11 07:15:47 2000
Scanning IIS, all Medium & High Isolation Web applications and all running COM+ server 
packages on SIEWEB...Scanning dllhost.exe (PID:1376)  System Package
Scanning inetinfo.exe (PID:5436) IIS In-Process Applications
Retained In Memory   :OFF
Unattended Execution :OFF
C:\Inetpub\SIEWeb\vbrimcheck\VBRimCheck.dll <<< Please Recompile!
VBCHKW2K has detected one or more Visual Basic DLL's that have not been compiled properly 
for use in a server process such as IIS or COM+. Please make note of the DLL's listed 
above and re-compile these DLL's with 'Unattended Execution' and 'Retain in Memory'.

In the example report above, the tool has flagged a DLL called VBRimCheck.dll, which is running 'In-process' with IIS. When the tool encounters a DLL that is not compiled correctly, it displays an 'Important Note' at the end of the report notifying of the problem.


Use the list below to learn the top ten recommendations your organization can use to improve the reliability and availability of your Web servers:

  • If you are developing you application in Visual Basic, use a utility called VBCHKW2K to verify proper compilation settings.

  • Thoroughly test your hardware and software for Windows 2000 compatibility.

  • Use pooled-process ASP pages as much as possible.

  • Use pooled-process COM components as much as possible.

  • Use the Web Capacity Analysis Tool and HTTP monitor to stress test your application.

  • Document and follow a Web server deployment plan.

  • Install and use the IIS 5 Recycle Tool to increase Web server availability.

  • Document and follow a process for evaluating and prioritizing the applicability of hotfixes and Security Updates as they are released.

  • Use HFCheck and QFECheck to ensure a standard OS installation across servers.

  • Select a server availability goal and use Microsoft tools to track your progress towards that goal.


Katie Beers, Reza Baghai, Wael Bahaa-El-Din, Mario Garzia, Björn Levidow Matthew Kerner, Ram Papatla, Michael Risse, Microsoft Corporation.

1 Blischke, Wallace R. and Murthy, D.N Prabhakar "Reliability : Modeling, Prediction, and Optimization" p14