Dedicated Application Scenario (Web and Application Server Infrastructure - Performance and Scalability)
Applies To: Windows Server 2003 with SP1
The Dedicated Application scenario describes when a server is predominantly used for serving a single, large, specific application. Generally, this application serves some critical business purpose and typically has multiple servers set up as copies of one another with some software splitting out the load and distributing it throughout the physical servers running the application.
While many customers prefer this kind of administration approach where a single application is placed per server, this is not the preferred Microsoft architecture for application servers. Generally, we find that this approach leads to inefficient use of computing resources; most of the server deployment done in this design uses grossly under-utilized server hardware. Worse, this deployment style leads to higher costs in a data center, given TCO is directly mapped to the number of operating system instances being managed, regardless if they are running on the same hardware or virtualized in a host-operating system (VMWare).
Windows Server 2003 has been redesigned to inherently support many applications running side-by-side on the same hardware and in the same instance of the operating system, and if possible, should be configured in such a way as to minimize TCO costs. See the documents referred to in the More Information section for greater detail.
A defining characteristic of the Dedicated Application scenario is that all settings of the application environment, the application itself, and the operating system, can be tuned to get better performance out of the application. This is relevant in that there is no application sociability with other applications that requires consideration when tuning the application.
This sections covers relevant considerations for the Dedicated Application scenario.
Application and Content Types
Managed Code Applications
A managed code application refers to applications that are built onto the Microsoft Common Language Runtime (CLR). The CLR offers many services that allow application developers to build applications in a very efficient manner.
Examples are applications built in languages such as Microsoft Visual C# and VB.NET, and using the Microsoft Visual Studio .NET Tools.
ASP.NET Web Applications (ASPX)
ASP.NET Web applications are managed code applications that tend to be built as a thin client. For instance, a browser sends a request to a server, and the server platform performs business processing, application navigation, and generates the user interface in HTML. The HTML is then sent back to the client browser and the browser renders the interface for the application. The browser then accepts the next user request and sends a new request to the server.
In this scenario, the vast majority of the processing is being done on the server platform; therefore, it is vital that the request cost is minimized dramatically (in order to get substantial throughput on a server). It is also important that applications are further analyzed for scalability issues.
One of the tools available for additional scalability can be the Web garden feature, which is discussed in detail in the Software Locks (Resource Contention) section.
When Should I consider the use of a Web garden rather than a single worker process?
Whenever rich applications are being built, there is a chance that the application developer can inadvertently introduce software contention in an application. See the Software Locks (Resource Contention) section for a description of this effect.
To illustrate this effect and the assistance a Web garden can offer, an ASP.NET page was written where every request had to navigate a linked list with 2000 objects in it to manipulate a particular item in the list. When one request was accessing the list, another request had to wait. This application was placed on an 8-processor, 900-MHz server and had 2,600 simulated clients throw load at the applications. Each client had a think time of 50 milliseconds between each request. The results are below:
|Single Worker Process||8 Worker Process Web Garden||Difference|
Requests / Sec
As can be seen, when pushing the server as hard as possible, the test run with one worker process was only able to get up to 92% CPU utilization. This is an indicator of a lock contention problem, no matter how much load you throw at a server, you cannot get it to 100% CPU utilization. With the addition of extra load, IIS would have started throwing back server too busy status codes when the kernel queues became full, and requests would be rejected.
The 8-processor Web garden distributes the 2,600 connections to the 8 worker processes serving the application, 325 connections to each worker process. Because all of the clients were sending in the same load, the requests were evenly distributed across the 8 worker processes in the Web garden. Although still bad, the really bad contention (queuing) behind one lock was split out into 8 locks with smaller queues behind them, thus the 12% gain in performance when the same request load was attempted.
It is important to note that a Web garden is not a silver bullet solution. There is a substantial memory cost to initializing a Web garden. The non-Web garden case had a real memory usage of approximately 35 MB (typical memory usage of a worker process running managed code), while the 8 worker process Web garden had a memory usage multiplied by 8; close to 280 MB used for the application. On a large, 8-processor server that is dedicated to one application, this might be acceptable, but it is a necessary tradeoff that must be considered.
If the actual problem is not software contention within a process, a Web garden will likely have a negative effect on throughput. If, for example, the contention problem for the application is a lock to gain access to a database page on a remote database server, an increase in worker processes is unlikely to have a positive affect. In fact, the additional threads per process are likely to increase overall context switch rates and decrease overall performance. The bottom line is not to assume that a Web garden will help. There are instances where it will, but its utility is in the nature of the contention problem. The only way to know is to test the Web garden configuration in a stress environment with expected loads before going into production.
ASP.NET Threading Settings and High Latency Workloads
If an application accesses slow resources, like a mainframe doing large, complicated queries, the threads of the ASP.NET subsystem can get blocked and, for a busy application, ASP.NET itself can run out of threads. If all ASP.NET threads are blocked, performance suffers, as there is no ability for ASP.NET to pick up new work. The work gets queued until other requests complete, and the situation ends up affecting user response times.
It is possible to override the number of threads ASP.NET is using to process requests. The parameter is found in the ASP.NET configuration system and can be changed at the root of the configuration system (the machine.config file), or specifically for the one instance of the application (the Web.config file in the virtual directory of the application). The threads parameter is in the following section of the configuration:
<httpRuntime> minFreeThreads="[count]" </httpRuntime>
ASP.NET Applications and Queuing
This section identifies how ASP.NET requests are queued in Windows Server 2003. The pipeline is complicated, but useful to understand when performing performance tuning changes.
HTTP requests come in off the network and are received by HTTP.sys.
After looking at the request, HTTP.sys places the request in a queue for an application pool.
The worker process servicing that application pool picks up the request and makes a decision on what type of processing environment that request needs to be executed. For ASP.NET Web applications, the worker process hands them to the ASP.NET handler.
The ASP.NET handler places them on a queue, and ASP.NET threads pick up individual requests and process them.
As you can see, there are a few places where a request could end up being en-queued, depending on how capable of processing additional requests the subsequent layers are. The queue referred to in step 2 is controlled in the IIS 6.0 application pool dialog box (Figure 7).
Figure 9: Kernel-Mode Queue Dialog (application pool properties)
The queue referred to in step 4 is controlled through the machine.config / Web.config configuration system. For busy servers, it might be worthwhile to try increasing this parameter. The parameter is:
<httpRuntime> appRequestQueueLimit=[count] </httpRuntime>
The default queue length is set to 100.
This parameter also applies to ASMX (Web services).
ASP.NET Web Services (ASMX)
Rich client devices such as cell phones, PocketPCs, tablet devices, etc. have the ability to run rich applications, which communicate over public networks, using a Web service style of application integration. With this in mind, rich client applications connecting to Web services over HTTP have a few subtle differences to the traditional HTTP browser client. These differences can affect their performance.
If you were to trace the network packets associated with a Web service request over a physical network, and compare it to a standard ASPX Web application request made by a browser, you would notice a few differences:
The typical size of packet a browser would send is quite small (~400 - 500 bytes). It consists of the TCP/IP headers, an HTTP GET verb, the requested URI (/downloads/downloads.aspx), a few request headers that the browser uses to communicate content length, and a few other attributes of the request.
Given the fact that browsers typically display data and/or graphics, the browser request usually gets back a large response (text/HTML, tens of KB of data) with embedded references to other large content (.gif/.jpeg) files. The browser then requests these individually.
A Web service behaves a little differently. When a Web service client is communicating to a Web service server, the client is actually invoking a component, therefore the client end is supplying data (a set of parameters) to call the server program. The server program processes the request, generates a response, and sends the response back to the client. Therefore, with a Web service running over the HTTP protocol, you have a situation where the client is sending an HTTP POST (rather than a HTTP GET), and the client is sending an entity body that contains an XML serialized set of parameters, data structure, and other attributes of the request as part of the POST34. Therefore, the client side typically sends larger requests and does not typically get back huge responses.
The physical packets put on the wire, and the interaction between client and server, will differ. When a browser-style application makes a HTTP GET request, the request is in one packet, the server responds with a number of packets containing the response to the GET (the response is typically much larger than what can fit in one frame: 1,500 bytes).
As the Web service client is potentially going to be sending large amounts of data to the server, it behaves differently. Given that Web service clients are likely to be non-computer devices, running on high latency networks with the potential of a per-byte charge on the data sent and received, the Web service stack sends a packet with the HTTP POST, URI, standard headers, and a special header: the Expect100Continue header. This is the equivalent of the client pinging the server to make sure it is worthwhile to send the bulk of its call. If the server responds with 100Continue it is telling the client that everything is OK and the client should send the body of its call. The server then picks up the request, processes the request, and sends back a response. Figure 8 depicts the Web service interaction.
Figure 10 Expect100Continue packet interaction
These different behaviors can really make the Web service client interaction efficient, because the client does not spend noticeable amounts of time sending requests that would eventually fail and frustrate the user.
However, when running on an internal network, where cost per byte might not be a concern, the additional latency introduced by the default Expect100Continue behavior, can slow down an applications responsiveness and the overall throughput of a server. As an example, a Web service test was created to measure the difference between the Expect100Continue communication mechanism and the same request done as a single HTTP POST. The table below summarizes the results for one Web service invocation:
|Default (Expect100Continue) Behavior||All in one HTTP POST||Difference|
Total Data (bytes)
Total Ethernet Frames
The data numbers themselves are not necessarily a concern on a fast network, the bigger issue is the latency associated and potentially experienced at the client side, especially given that the TCP/IP client stack will, at times, invoke its nagling35 behavior and delay small packets from being sent, in the hope that the application will fill the buffer of a packet more and make the overhead of transmission smaller.
If there is a perceived slowness from users (perceived high latency in applications), the above discussion is worthy of investigation. First, a network sniff on the client side of the application should be performed, and the parameters manipulated accordingly.
How to Change Expect100Continue and Nagling Behavior
Both the Expect100Continue behavior and nagling behavior can be changed on a per server or per application basis in Windows Server 2003. By default, both behaviors are on, that is, the value in the configuration system is true. To change the parameters for a system, you need to edit the following file:
%SYSTEMROOT%\Microsoft.NET\Framework\vVersion of the framework\CONFIG\machine.config
In the following section you need to add the properties and override them.
<servicePointManager expect100Continue="true"|"false" useNagleAlgorithm="true"|"false"/>
To change for a specific application, you modify the app.config file in the same directory as the application. The same change as above is made.
Expect100Continue and useNagleAlgorithm are changed on the client side of the Web serviceor the consumer of the Web servicenot the server side. The server will behave accordingly to whatever behavior the client displays when interacting with the server.
.NET Remoting vs. Web Services
Applications will often have many component parts. For scalability reasons, these component parts are sometimes separated onto different servers. This predicates a need for some form of object-based communication mechanism to glue the component parts of an application together. In the managed code world, two possible options are .NET Remoting and the use of Web services.
There are some functional differences between the two, which might force the use of one over the other. Assuming no functional restrictions, the use of .NET Remoting, using TCP/IP as a transport and making inter-object calls using the binary formatter, is the most performant option. This is discussed in depth (along with operational and functional tradeoffs) in the document:
Unmanaged applications are those that do not run on the Windows .NET Framework. This is the previous generation of applications (circa Windows NT 4.0 and Windows 2000), which are comprised of binaries compiled directly from Microsoft Visual C++ and other compilers, and also includes applications that are hosted under the VBScript and other script engines (interpreted applications). The tools used to build unmanaged applications include the following, among others:
Visual C++ 6.0 and earlier
Visual Basic 6.0 and earlier
ASP (Script engine)
Microsoft Visual J++
The unmanaged code environment is much less forgiving for an application developer, because features such as automatic garbage collection, memory coalescence, pointers, and other productivity language features, are not available.
ASP technology (.asp) is the predecessor to the ASP.NET page (.aspx). A little known fact is that ASP pages are actually compiled. The high-level process for executing an ASP page is:
The ASP page (raw) is read by the ASP subsystem in IIS. That page is converted into what is called a template (the ASP page with all included code inserted and syntactically put together, and read for a script compiler to execute).
The template is then sent to the relevant script engine (VBScript or JScript) for compilation and execution.
The ASP subsystem caches both templates and script engines. Like any other cache, this avoids processing costs associated with executing a page again. For example, overriding the ASPScriptFileCacheSize metabase parameter, allows ASP to cache more of the templates referred to above, saving the cost of converting the raw script into a template.
Overriding the AspScriptEngineCacheMax metabase parameter allows ASP to store more instances of VBScript and JScript script engines in memory. If these engines had been used in the recent past, to execute a page for a different user, a further gain would have been realized, since the script engine does not need to recompile the page. The in-memory, already-compiled page is executed.
A single script engine can only execute one page at a time, therefore, if 10 concurrent users all requested the same page at the same time, the system would have 10 concurrent copies of the script engine in memory.
Persisted ASP Template Cache
In IIS 6.0, a further enhancement for ASP templates was undertaken. Templates can be written to disk to avoid the overhead of keeping them in memory and having to regenerate them.
If a site consists of numerous ASP pages, this cache de-allocates the oldest templates from memory to free space for new ones. If one of these ASP files gets requested again, the ASP engine loads the template instead of loading the ASP file and spending additional CPU processing.
Many ASP applications instantiate and call into COM+ components. The idea is that the ASP page provides HTML formatting, navigation, and can call into COM+ application components, which contain business logic. An additional advantage of using COM+ components, is that the COM+ components can be re-used in client and server-based applications, without modification.
One of the key things to be aware of in tuning an ASP/COM+ application on Windows Server 2003 is the threading model. The major applicable threading models are Apartment, Free, and Both. Apartment threaded components are COM objects that cannot function with two separate callers inside the component simultaneously; only one caller can be in the context of a method at a time. The converse is true for free threaded components. A free threaded component internally protects its data structures so that any client can call in at any time. Both threaded components can exist in other modes and will typically inherit the mode of objects that are calling into them.
As a typical guideline, COM components developed in Visual Basic 6.0 or Microsoft Visual J++, are all apartment threaded, while components built using Visual C++ 6.0 can be any of the threading models. These components should be inspected for appropriate threading.
The COM+ environment in which ASP executes is, by default, running in the single-threaded apartment (STA). Therefore, COM objects that are instantiated are placed in an apartment separate from ASP code execution. This isn’t generally a bad thing, except that for execution control to transaction to the COM+ object, the operating system must stop a thread and perform a context switch. This can become a performance issue when it occurs thousands of times a second.
On Windows Server 2003, a new feature for ASP has been developed for ASP/COM+ applications that only involve COM+ components that are both or free threaded. Windows Server 2003 allows ASP scripts themselves to run in the multi-threaded apartment (MTA). This means that the ASP script itself will call directly into a free or both threaded COM+ object without having to cause a context switch and add threads to the system. This helps the scalability and overall system performance.
The most important consideration here is that the components called by the ASP page must be free or both threaded. For example, an ASP script that calls into the ADODB.RecordSet object can do so, without problem, under the MTA, since the ADODB object has a threading model of both.
After checking the components an application uses for its threading model and ensuring theyre either both or free threaded, an administrator can enable all applications on the server to run in the multi-threaded apartment by performing the following:
cscript %SystemDrive%\Inetpub\AdminScripts\adsutil.vbs set W3SVC/AspExecuteInMTA 1
Administrators can also run the above command in the specific application directories (for example, W3SVC/1/AspExecuteInMTA) to make the setting for a particular application.
Given the nature of ASP, all execution is done synchronously, therefore, when an ASP application accesses resources off the server or resources that have a natural high latency, ASP can quickly run out of threads. Once ASP exhausts its thread allotment, the response time of a system can be drastically affected.
This is an area where an administrator or operations person should have knowledge of the application. Guidelines for how to handle highly synchronous applications can be found in the section entitled High Latency ASP Pages.
Large Multiprocessor Single Application Scalability
For the Dedicated Application scenario, the scale up of a single application is very similar to the considerations for scale up of applications on Windows platforms in general. Please see the Multiprocessor section for more information.