File Cache Performance and Tuning
By Mark Friedman, Odysseas Pentakalos
This is a reprint of Chapter 7 in the Windows 2000 Performance Guide, published by O'Reilly & Associates, Inc. (January 2002).
The built-in Windows 2000 file cache is essential to the performance of Windows 2000 file servers. It is also an important factor in the performance of Microsoft Internet Information Server (IIS) as well as many other applications. The file cache is a specific, reserved area of virtual storage in the range of system memory addresses, as discussed in Chapter 6. As the name implies, it operates on files, or, more specifically, sections of files. When file sections are referenced, they are mapped into this area of virtual memory by the Cache Manager. This mapping occurs transparently to the application that is trying to read or write the file in question. (The everyday meaning of the word cache refers to it being a hidden storehouse. Caching functions are hidden from the applications.) The memory used for the file cache is managed just like any other area of real memory allocated within the system working set, subject to the same Windows 2000 virtual memory management page replacement policy discussed in the previous chapter. Frequently referenced files tend to remain in cache memory automatically, so no external tuning parameters need to be set to initiate caching under Windows 2000 on either Professional and Server.
The idea behind memory-resident file cache is to speed up access to data that is otherwise stored permanently on disk. Access to any file segments that are resident in memory is much, much faster than having to retrieve the file from a disk or CD-ROM. It is important to realize, however, that caching does not speed up file access unconditionally. For example, the performance benefit of caching a file that is read once from start to finish by a single application is minimal. The data blocks associated with the file still have to be accessed serially from the disk, copied into RAM, and stored there. The Windows 2000 file cache does attempt to improve upon the apparent responsiveness of this serial, sequential disk file activity by prefetching blocks from the disk in anticipation of the next disk file I/O request. This helps the responsiveness of foreground applications like Microsoft Word because the Windows 2000 Cache Manager automatically reads ahead in the file being opened. By the time the user of the application presses the Page Down key, the file data needed to refresh the display has already been retrieved from disk and is resident in memory.
Once a file is read again, the benefits of memory caching become evident. As long as the data block requested remains resident in memory, subsequent file I/O requests are resolved directly from memory without ever having to access the disk. A file that is shared among multiple users (being accessed across the network, for example) is stored just once in the cache. The original user of the file responsible for initiating the file access causes the data to be staged from disk into memory. All subsequent users of the file benefit from the data being held in cache memory through faster access. Typical single-user applications like MS Office products do not normally derive a great deal of benefit from this aspect of the Windows 2000 file cache, but many multiuser server-based applications can and do. The Microsoft IIS application is a good example of a server application that benefits from the built-in file cache. HTML and GIF files frequently accessed by users of the IIS web publishing service, for example, tend to remain in memory, allowing IIS to process them without having to perform physical disk accesses.
Another way that application performance benefits from caching is by buffering writes in memory and deferring the update of the physical disk. This form of file caching is known as deferred write-back cache, also frequently referred to as lazy write. Deferring writes has two potential benefits. The first is that the application may re-write the same data block. If this occurs before the write is flushed from cache to disk, it is no longer necessary to perform the original write. Having been deferred, there is now an I/O operation that no longer needs to occur at all, resulting in a net savings. Consider, for example, an editing session using a typical word processing application like MS Word. Often the user edits and re-edits the same section of the file repeatedly. Deferred write-back cache reduces the number of physical disk operations that need to take place whenever subsequent updates overlay a current change residing in the cache.
The second benefit of deferred write-back caching is that by waiting, disk writes can often be flushed to disk in bulk I/O operations that may be more efficient than the way the original application specified. In Windows 2000, deferred writes are allowed to accumulate in cache memory until a threshold value for the number of dirty file pages in cache is reached. An operating system thread to flush dirty file pages resident in the cache to disk using efficient bulk I/O operations is then dispatched.
Using lazy write cache management means that Windows 2000 must be shut down gracefully. Pulling the plug on the computer abruptly strands numerous file updates parked temporarily in cache memory, which is why Windows NT issues that "Please wait while the system writes unsaved data to the disk" message during shutdown. During an orderly shutdown of the operating system, all dirty pages in the file cache are flushed to disk before the signal is sent that it is OK to power off the machine. This means that important Windows 2000 server machines should always be connected to an uninterruptible power system (UPS) to prevent data loss due to dirty file pages that have not yet been written to disk accumulating in the file cache.
The basic principles behind the use of a memory-resident file cache in Windows 2000 are no different from the other examples of caching discussed in earlier chapters. These include the Level 1 and Level 2 caches inside the processor, and the caching of active virtual address pages in RAM that is the essence of virtual memory management. When the active segments of frequently accessed files are held in RAM, access to them is considerably faster than having to retrieve them from disk (or from CD-ROM or from across the network). The size of the file cache in Windows 2000 is managed like any other process working set--it just happens to be part of the system working set and, in fact, is normally a major component of it. There are no tuning knobs or parameters in Windows 2000 to configure the minimum or maximum file cache beyond the architectural limitation that it can be no larger than 960 MB. If more memory is available and file access demands it, the file cache will expand in size. If there is a shortage of real memory available and application processes are more demanding, the file cache will shrink. In Windows NT, the file cache is limited to 512 MB of virtual memory.
The size of the file cache is adjusted dynamically according to load, using the basic page trimming algorithm discussed in Chapter 6. One tuning knob, LargeSystemCache, is available that affects cache sizing. What it does is quite extreme, making it effective only in isolated instances. In Windows NT, the tuning knob available produces even more extreme behavior, rendering it almost useless. We discuss this tuning knob near the end of this chapter.
On This Page
File Cache Sizing
Cache Performance Counters
How the Windows 2000 File Cache Works
File Cache Sizing
The Windows 2000 file cache is similar in function to the file caches built into most commercial versions of the Unix operating system. The Windows 2000 file cache stores segments of active files in RAM in anticipation of future requests. This function speeds access to files that are used by multiple users and applications. Besides accelerating access to shared files, the Windows 2000 file cache also performs anticipatory read aheads for sequentially accessed files. In addition, the file cache provides for buffering writes in memory, using deferred write-back caching, or lazy write. When writes are issued for cached files, data is first written into memory-resident file buffers. The permanent update to disk, requiring a time-consuming physical disk operation, is deferred until some later time. The caching read ahead and lazy write functions are typically effective at improving the performance of most Windows 2000 applications.
Transparently to applications, I/O requests to files are diverted to check the file cache prior to accessing the disk. If the data requested is already resident in the file cache memory, it is not necessary to access a relatively slow disk drive to read the data requested. The operation becomes a cache hit. If the section of the file requested is not in memory, there is a cache miss, which ultimately is passed back to the file-system to resolve by reading the information requested from disk into the cache. The best single indicator of cache performance is the cache hit ratio, reported as the percentage of total file I/O requests satisfied from the cache.
The single most important issue in file cache performance is sizing the cache so that it is large enough to be effective. If the cache is too small, access to disk files is slower. On the other hand, having too large a file cache means that the machine is configured more expensively than necessary.
It is difficult to determine in advance what size cache will deliver what kind of hit ratio for your workload. In general, as cache size increases, so does the cache hit ratio, usually in a nonlinear fashion. Figure 7-1 illustrates the theoretical relationship between cache size and hit ratio for a given workload. The curve in Figure 7-1 is broken down into three distinct regions. Cache performance can usually be characterized according to these three regions. Initially, very small increases in the size of the cache result in very dramatic increases in cache hit ratios. This area is indicated on the chart as Region 1. Here, performance is quite sensitive to minor changes in the cache size (or minor changes in the workload). Because cache effectiveness is very volatile in this region, this is not a desirable configuration. Eventually, caches are subject to diminishing returns as they get larger and larger, as illustrated by the area of the curve marked as Region 3. Here, adding more memory only marginally increases cache effectiveness, so this is not desirable either. That leaves Region 2 as the most desirable configuration, probably leaning toward Region 3 for the sake of stability, but not too far for the sake of efficiency. Understanding this trade-off between cache size and cache effectiveness in the context of your specific workload is crucial.
Figure 7-1: The theoretical relationship between cache size and cache effectiveness (hit ratio)
The actual amount of memory the Windows 2000 file cache consumes is determined dynamically based on system activity, so this theoretical model of cache effectiveness does not provide much practical guidance for the Windows 2000 systems administrator. Windows 2000 provides no external parameters that control the size of the file cache directly. (As mentioned, there is one cache size tuning parameter available that must be used very carefully.) In Windows 2000, the current size of the file cache is a function of the amount of system RAM installed, the number and size of the files currently in use, and the contention for memory from other applications that are also running. It is important to remember that the file cache competes for access to the same real memory resources as all other applications.
Cache Performance Counters
For performance monitoring, there is a Cache performance object and a full set of performance measurement counters that provide a great deal of information about the Cache Manager and related cache activity. To monitor cache effectiveness, there are counters that report on cache memory size, activity, and the various cache hit ratios. As we describe the various file caching mechanisms, we also discuss the relevant performance counters that are available to measure cache effectiveness.
Following Figure 7-1, it is important to monitor both the size of the file cache and the cache hit ratio. The real memory pages that the file cache occupies are counted as part of the system working set, subject to normal page trimming. The Memory object performance counter System Cache Resident Bytes reports the amount of real memory currently in use by the file cache. As the number of System Cache Resident Bytes increases, we normally expect that the various measures of hit ratio will also increase. Moreover, the cache size can grow simply as a function of the size of the files that are currently in use and their pattern of access. If there is little contention for real memory, the file cache grows in direct proportion to the rate of requests for new files (or new sections of currently open files). However, since the file cache must compete for real memory with other applications, what the other applications running in the system are doing at the same time influences the growth of the file cache. As other applications make demands on real memory, relatively inactive pages from the file cache are subject to Windows 2000 page trimming.
The Windows 2000 file cache is allocated within the system's virtual address range. This range of addresses spans only 960 MB of virtual storage (512 MB in Windows NT 4.0), setting an upper limit on the size of the file cache that is available in Windows NT and Windows 2000. Buy as much RAM as you want, but it is not possible for the file cache in Windows 2000 to be any larger than 960 MB. The 960 MB range of virtual storage reserved for use by the cache is divided into consecutive 256 KB sections. Files are mapped into the cache region using 256 KB logical segments defined from the beginning of each individual file's byte stream. The file cache interacts with the Windows 2000 Virtual Memory Manager using the generic mapped file support described in Chapter 6.
As file segments are cached, the Cache Manager simply allocates the next available 256 KB slot until the cache is full. The Cache Manager keeps track of which files are actively mapped into the cache virtual memory so that when a file is closed (and no other application has the same file open), the Cache Manager can delete it from virtual storage. In case more than 960 MB worth of file segments ever needs caching, the Cache Manager simply wraps around to the beginning of the cache to find inactive segments. In this fashion, a new file request replaces an older one.
The Cache Resident Bytes counter reports the amount of real memory the file cache is currently occupying. The Cache Bytes counter, which sounds like it might tell you the size of the cache, actually reports the full system working set, which includes Cache Resident Bytes and several other real memory areas. In a Windows 2000 file server (remembering Windows 2000's heritage as the follow-on to the joint IBM/ Microsoft-developed OS2 LAN Manager), the file cache so dominates the system working set that internal documentation frequently refers to the entire system working set as the cache. This usage carries over to tools like Task Manager, which labels the system working set as the System Cache in the Performance tab, illustrated in Figure 7-2. The Windows NT version of Task Manager called this field File Cache, which is probably just as misleading. Curiously, the number of bytes in the System Cache reported by Task Manager does not correspond exactly to the Cache Bytes counter in the System Monitor.
Figure 7-2: The memory that Taskman reports is allocated for the file cache is not limited to the file cache
The file cache relies on the standard Windows 2000 page replacement policy to control the amount of real memory available for caching files. As you might expect, this has its good and bad points. The fact that system administrators do not have to tweak a lot of cache sizing parameters is a benefit. When the rate of file access increases, the dynamic nature of virtual memory management allows the file cache to expand accordingly. On the other hand, there is no way to set a minimum cache size, below which Windows 2000 will not steal file cache pages. This means that when real memory is under stress, it is entirely possible for the Windows 2000 file cache to be squeezed out of real memory entirely. This means that you must monitor Windows 2000 file servers and IIS machines to ensure that they always have enough memory to cache their file workloads effectively.
Chapter 6 discussed the virtual memory management page replacement policy implemented in Windows 2000 that relies on an operating system component called the Balance Set Manager to trim the working sets of active processes. The resident pages in the file cache are considered part of the system working set for purposes of page trimming. By default, the system working minimum and maximum working set values are approximately 4 and 8 MB, respectively. (Refer back to Chapter 6.) These values are used to regulate the size of the file cache in Windows 2000 since the file cache is part of the system working set. The system maximum working set is also subject to gradual upward adjustment whenever the working set is at its maximum value but there is ample free memory available (Available Bytes > 4 MB). In the "Cache Tuning" case study reported later in this chapter, you will be able to see quite clearly this upward adjustment mechanism in action.
One final sizing consideration is that the Windows 2000 file cache is not limited to use by disk requests. Files stored on CD-ROM, a DVD disk, or a networked disk are all diverted to use the same built-in file cache.
Cache Hit Ratio
Most file I/O is directed through the Cache Manager. Later, we discuss two sets of Cache Manager interfaces specifically designed for Windows 2000 system applications. For instance, one of the major users of the file cache is the file server service, which utilizes a special interface that was defined with it in mind. Only a few applications that run on Windows 2000 take advantage of the two special interfaces.
The default behavior of Windows 2000 subjects all files to cache management, although it is possible to turn off caching for specific files. At the time the file is opened, it is possible to specify that caching should be disabled. Bypassing cache management, however, forces the application to code its own low-level I/O routines. Consequently, bypassing the Cache Manager is done only by some server applications specifically designed to run on Windows 2000, like MS SQL Server and MS Exchange. On the other hand, anyone developing an application intended to run under both Windows 9x and Windows 2000, for instance, is unlikely to choose to perform this extra work. This extends to applications like Microsoft's Office suite, whose applications utilize the Windows 2000 file cache in the normal manner.
There are four Cache performance object counters that report the cache hit ratio based on the different filesystem Cache Manager interfaces. The hit ratio counters are Copy Read Hits %, Data Map Hits %, MDL Read Hits %, and Pin Read Hits %. The differences between the Copy, Mapped Data, and MDL interfaces to the Cache Manager are discussed in the later section "How the Windows 2000 File Cache Works."
The file cache is built into both Windows 2000 Professional and Server, and it functions identically in either environment. Caching is everywhere! An interesting aspect of the Windows 2000 file cache stems from this ubiquity. When Windows 2000 Server file servers are accessed by Windows 2000 Professional clients, files accessed across the network are subject to memory-resident caching on both the server and the client side! With caching everywhere, frequently accessed files are likely to be resident in memory on multiple machines. From one standpoint, this leads to duplication and adds to the memory requirements of Windows 2000 machines. On the other hand, all this file caching is very effective from a performance standpoint, so there is an overall benefit.
On systems configured to run as file servers or Internet web servers, for example, one of the major consumers of real memory is the file cache. Since even large-scale PC servers generally have limited I/O bandwidth compared to enterprise-scale servers, configuring an adequate amount of memory for use as a file cache is important. You should consider that any memory-resident disk cache is an explicit trade-off of memory for disk activity. Normally, in PC workstations and servers, this trade-off yields significant performance benefits. When we discuss disk I/O performance in the next three chapters, we quantify the performance benefit of cache versus disk access more definitively.
Caching and Disk Activity Statistics
Another aspect of the file cache that should be noted is its relationship to the performance counters that measure physical I/O operations to disk. Whenever there are performance problems and you find an overloaded disk, it is natural to want to know which application processes are responsible for that physical disk activity. The operation of the Windows 2000 file cache makes it very difficult to determine this.
The cache statistics that are maintained count logical file requests issued by various applications, encompassing all the different Windows 2000 filesystems at the system level. Operations that are cache hits eliminate the need for physical disk operations to occur. The cache statistics are global; how individual applications that are running are using the cache cannot be determined. Meanwhile, logical (and physical) disk statistics count physical I/O operations. Since the great majority of logical file requests are handled through the file cache, logical and physical disk counters reflect those logical file requests that miss the cache. Cache misses generate synchronous physical disk operations.
One additional process at work here needs to be understood. Caching transforms some logical file I/O operations from synchronous requests into asynchronous disk requests. These transformations are associated with read ahead requests for sequential files and lazy write deferred disk updates. As the name implies, read ahead requests are issued in anticipation of future logical I/O requests. (These anticipated future requests may not even occur. Think about the number of times you open a document file in MS Word but do not scroll all the way through it.) Lazy write deferred disk updates occur sometime after the original logical file request. The update needs to be applied to the physical disk, but the required physical disk operation is usually not performed right away. So what is happening at the physical disk right now, as expressed by the current logical and physical disk statistics, is usually not in sync with logical file requests. This is the influence of caching. Caching makes it almost impossible to determine which applications are causing a physical disk to be busy except under very limited conditions (when very few applications are using the disk, for example).
Windows 2000 introduces a series of per-process file I/O counters that keep track of Reads, Writes, and Bytes at the process level. These new counters can be accessed from both the System Monitor, as illustrated in Figure 7-3, and Task Manager (not illustrated; the Windows 2000 Task Manager reports only cumulative values for these counters, so its usefulness in debugging a performance problem that is occurring right now is limited). As illustrated, per-process I/Os are separated into three basic categories: Reads, Writes, and Other Operations. The fourth category, Data Operations, is derived from the sum of the Reads and Writes. Both I/O operations and bytes are counted.
These new counters account for I/O operations (and bytes) at the process level, prior to the influence of the cache. While they certainly provide interesting and useful information on running processes, the new counters do not solve the problem of being able to relate physical disk activity back to the individual process. Since the new counters count logical file operations, because of cache effects, it is still not possible to see the load that individual applications are putting on the disk hardware. Nor do we expect that this problem will be alleviated in future releases of the Windows 2000 operating system. Due to lazy write caching, for example, it is never easy to associate an application process with the physical disk activity that occurs because cached operations decouple these otherwise logically related system events.
Figure 7-3: Per-process I/O counters track operations at the process level
Cache Loading Effects
One final point about the ubiquitous Windows 2000 file cache is that whenever there is any significant interaction with the file cache, it is difficult to run repeatable benchmark tests. Cache loading effects are notorious for their impact on benchmarks. At the beginning of a benchmark when the cache is empty, most of the file accesses are direct to disk. (This is known as a cache cold start.) Later on, when all the files have been loaded into the file cache, very few accesses are direct to disk. (When the cache is already populated with the working set of objects being managed, it leads to cache warm starts.) Two otherwise identical benchmarks runs, one executed from a cache cold start and the other from a warm start, will yield very different results.
Both the pervasiveness of the Windows 2000 file cache and its impact on local and networked requests make it difficult to factor in the effect of caching when you attempt to run repeatable benchmark workloads. Commercial-grade benchmarking programs are aware of these difficulties and provide specific mechanisms to allow for cache loading effects. For any benchmark runs you decide to initiate yourself, caveat emptor. The examples discussed later in this chapter should provide you with adequate guidance to help interpret the various cache object statistics, and will help you understand the nature and extent of any cache loading effects present during your specific benchmark test.
How the Windows 2000 File Cache Works
The Windows 2000 file cache works by converting (most) normal file I/O requests into requests for virtual memory mapped files. The Cache Manager interfaces to applications in several different ways. The standard interface is the Copy interface. Because it is entirely transparent to applications, the Copy interface is used by most applications. As the name implies, the Copy interface copies file data from a buffer inside the file cache to an application data file buffer on a read, and copies data in the opposite direction on a write.
Two other standard interfaces to the Windows 2000 file cache use memory more efficiently than the Copy interface: the Mapping interface and the MDL interface. Applications that want to take advantage of these Cache Manager interfaces must contain significant amounts of Windows 2000-specific code. The Windows 2000 file server service, Redirector, NTFS, and IIS use these more efficient interfaces.
The Copy Interface
Figure 7-4 diagrams in simple terms how the Copy interface functions for read hit requests. An application file read request calls the appropriate filesystem driver in Windows 2000, where the request is immediately routed to the Cache Manager. The Cache Manager maps each open file into the virtual memory reserved for the cache. This mapping is performed on 256 KB sections of a file at a time. Responding to the read request, the Cache Manager locates the block of data specified and copies it into an application-provided data buffer. This satisfies the original file request, and the application thread resumes its normal processing. At this point, the file data requested resides in two places concurrently, which is an inefficient use of computer memory. However, this is the only way to plug the file cache function into existing applications transparently, without requiring extensive modifications to allow the application to accept a pointer to file data resident in the system cache.
Figure 7-4: The Copy interface copies data from the system cache into an application file buffer
Figure 7-5 illustrates what happens when a cache miss occurs. The Cache Manager incurs a page fault when it attempts to access a file segment that is not resident in the cache. The Windows 2000 Virtual Memory Manager is then invoked to resolve the page fault. VMM determines that the page fault falls within the scope of a mapped file virtual address range. VMM accesses the virtual address descriptor (VAD) associated with the virtual address, which, in turn, points to the file object mapped into that specific 256 KB cached file segment block. VMM then issues a callback to the appropriate filesystem driver, which generates and processes a physical disk I/O request. This disk request copies data into Cache memory, not the application's virtual address space. At this point, the page fault is resolved and the Cache Manager processing resumes. As on a hit, the data in cache is then copied into the application data buffer provided.
Figure 7-5: A page fault occurs when the Cache Manager tries to access data not in cache memory
The Copy Reads/sec and Copy Read Hits % counters measure file cache activity that uses the Copy interface, which applies to most normal applications, such as MS Word.
There is a variation of the Copy interface called the Fast Copyinterface that bypasses the filesystem and goes directly to cache. The Redirector service used to resolve networked file requests uses the Fast Copy interface routinely, as does the file server service for smaller sized requests. NTFS requests can use the Fast Copy variation, while FAT16 requests cannot. The Fast Copy interface is more efficient that the plain Copy interface because it avoids the processing overhead associated with the initial call to the filesystem. It still maintains two copies of the file data in memory, so it is not any more efficient in that respect. The Fast Reads/sec counter measures file cache activity that uses the Fast Copy interface. Fast Reads are included in the Copy Reads/sec counter, and the Copy Read Hit % counter also includes Fast Read Hits.
File write requests across the Copy interface are subject to deferred write-back caching, or lazy write. The Copy interface copies data from the application's file buffers into virtual storage locations associated with the cache. Dirty pages in the cache are backed with real memory, as illustrated in Figure 7-6, which again means that a copy of the current data resides in two separate memory locations: in the application and in the system working set. Sometime later, dirty pages in the system cache are flushed to disk by lazy write system worker threads. Cached file write requests subject to lazy write mean that the logical I/O request returns almost immediately after copying data into the system cache. Windows 2000 counts these logical write operations on a per-process basis (as illustrated in Figure 7-3). In earlier versions of Windows NT, the rate of logical write operations per process is unreported.
Figure 7-6: Dirty pages in the system cache are written to disk by lazy write system worker threads
Physical disk updates
The file cache lazy writer must apply updates to disk eventually. In Windows 2000, there are three mechanisms that accomplish this. The first mechanism writes to disk when the disk is idle. If the disk is very busy and too much changed data backs up in the file cache, it eventually triggers the second mechanism, a lazy write flush. This attempts to write data back to disk in bulk. A third mechanism is required in case the file cache lazy writer does not keep up with the demand for virtual memory. It depends on the normal real memory page trimming algorithm used in Windows 2000. File cache pages residing in the system working set are subject to page trimming, like any other hunk of virtual memory. When the demand for virtual memory exceeds the supply, file cache pages from the system working set can be stolen. A dirty file cache page that is stolen must then be processed by a mapped page writer kernel thread before it can be made available for reuse. We discuss these three mechanisms in the following sections
Disk Idle writes
If the disk is idle, the lazy writer leisurely writes back earlier write requests to disk. The file cache can determine that the disk is idle when there are no file cache requests that trigger synchronous disk operations. Since the disk update must be applied eventually, it makes sense to send write commands to the disk if it is idle anyway. In practice, it is never easy for the Cache Manager to determine when the disk is idle. About the only way for the lazy writer to determine this is to schedule an I/O to the disk and see what happens. (There may be applications using the disk that bypass the Cache Manager. Resolving hard page faults is one such application.)
An older Cache Manager utility, which can be downloaded from http://www.sysinternals.com/ntw2k/source/cacheman.shtml, documents some of the internal constants that the Windows NT Cache Manager uses. (Be careful. Cache Manager does not work on current systems, including Windows 2000. It only executes on NT 4.0 at the level of service pack 3.0 and below.) The Cache Manager program documents a series of constants that are used in the lazy write algorithm. These include CcFirstDelay, which delays writes three seconds after their first access; CcIdleDelay, which triggers writes one second into an idle period; and CcCollisionDelay, which triggers a 100-millisecond delay if a speculative lazy write encounters a disk busy condition. As of this writing, it is not certain if these parameters that control Cache Manager operation were carried forward into Windows 2000, but it seems likely they were.
Threshold-triggered lazy write flushes
Just in case the disk is never idle, it would not do to allow dirty file cache pages to accumulate in the cache forever. At some point, lazy write flushes are triggered by a threshold value when too many dirty file cache pages have accumulated in RAM. Threshold-driven lazy write flushes are the second mechanism used to update the physical disks. The System Internals Cache Manager utility reveals a constant called CcDirtyPageThreshold, which is set by default to 1538 pages. When the number of dirty pages in cache reaches this threshold, it triggers a lazy write flush. A second constant called CcDirtyPageTarget is set to 1153 pages, 75% of the value of CcDirtyPageThreshold. When a lazy write flush is triggered, Windows 2000 attempts to flush one quarter of the current dirty pages to disk each time. In Inside Windows 2000, Solomon and Russinovich report that the dirty cache page threshold that triggers a flush to physical disk is equal to 3/8 the size of RAM. It makes sense that as the potential size of the file cache grows, this threshold value is adjusted upwards proportionally.
The activity from both idle disk flushes and threshold-triggered lazy write flushes are grouped together in a single set of performance counters available for monitoring file cache performance. Lazy Write Flushes/sec is the rate of both idle disk and threshold-triggered lazy write flushes. Because threshold-triggered dirty file pages are flushed to disk in bulk, be sure to monitor Lazy Write Pages/sec, too. This counter reports the total number of physical disk transfers associated with file cache-initiated writes. In lightly loaded file servers, file cache pages are flushed one at a time using the idle disk mechanism. But in more heavily accessed systems, the impact of threshold-driven bulk lazy write flushes should be apparent, as illustrated in the performance monitor data displayed in Figure 7-7.
Figure 7-7: Two Cache object performance counters report on lazy write physical disk activity
Stolen mapped file pages
A third mechanism Windows 2000 uses to copy back changed pages to disk is the mapped page writer thread. During Balance Set Manager trimming, if Windows 2000 steals a page from the system cache (or a private address space page mapped to a file), the system checks to see if the page is dirty. A page becomes dirty when it is written into and remains dirty until the disk is updated. If a page is not dirty, it can be stolen and added immediately to the Standby list. If the page is dirty, the page cannot be used for any other purpose until the disk-resident copy is current.
The mapped page writer is a kernel thread with responsibility for sending changed file data to disk. There is more urgency associated with the mapped page writer than the lazy write thread because Windows 2000 has stolen a number of pages from the system working set that turn out to be mainly pages containing mapped file data from the cache, and the OS is unable to utilize them because they also contain updates. The mapped page writer is invoked on demand when the number of modified file pages in the system cache or associated with memory mapped files becomes too large.
For the sake of data integrity, there are some applications that need to write through the cache to disk synchronously, specifically instructing the Cache Manager that certain dirty file pages in the cache be written to physical disk immediately. The standard Copy interface supports two different mechanisms to accomplish this. When a file is opened, the calling application program can specify that write-through semantics are to be used throughout for all write operations. Or, the application can open the file normally, but then issue a standard call to fflush (or its Win32 equivalent, FlushFileBuffers) to commit all dirty pages resident in the file cache associated with that file immediately. The number of explicitly requested file cache flushes, if any, are reported in the Data Flushes/sec and Data Flush Pages/sec counters.
The Mapping Interface
The standard file cache Copy interface has some obvious drawbacks in terms of memory efficiency. The Copy interface causes data to be copied to and from buffers in the application process virtual address space and the system cache. A more efficient approach would allow the application to access data in the system cache directly, passing a pointer to the file data residing in the cache back to the application. Windows 2000 supports two additional Cache Manager interfaces that return pointers to the calling application. These pointers reference file data residing in cache memory directly.
One problem with passing a pointer to the cache back to an application is that the Cache Manager no longer understands how file buffers are actually being used. Consequently, the Cache Manager cannot replace file segments that are referenced by pointers until those applications have signaled they are done with that specific portion of the file. Furthermore, the Cache Manager must lock the virtual memory areas pointed to by file segment pointers in real memory to prevent the Virtual Memory Manager from trimming those pages from the system working set. This ensures that the application using the file segment pointer is always pointing to the right file segment, but it does complicate the interface between the applications and the Cache Manager. Applications can take advantage of the more efficient Cache Manager interfaces, but you must develop extensive Windows 2000-specific code to do so. Normal processes cannot access virtual addresses in the system cache range, so any application that wants to take advantage of the more efficient file cache interfaces has to be rewritten specifically for Windows 2000. Currently, the only applications that take advantage of these Cache Manager interfaces today are NTFS (the native Windows 2000 filesystem), the network Redirector, the Internet Information System (IIS), and the Windows 2000 file server service that implements file sharing on Windows 2000 workstations and servers. As we have seen, existing Win16 and Win32 applications do not need to be rewritten to enjoy the benefits of the Copy interface.
Both the Mapping interface and the MDL interface pass file cache pointers back to the calling application. The Mapping interface returns a virtual address pointer to the calling application, which allows it to access file data in the system directly. The MDL interface returns real addresses to the calling application. It is used by IIS and the file server service. We discuss the Mapping interface first.
The Windows 2000 Redirector uses the Cache Manager Mapping interface when a client application attempts to write or update a file stored on a remote machine. This allows Redirector to cache the remote file locally and to exercise more control over the manner in which dirty file pages are managed in local memory. This is part of the Common Internet File System (CIFS) file sharing protocol that Microsoft calls op locking or opportunistic locking.
The Mapping interface is also used by NTFS to manage file metadata structures associated with its Master File Directory, change log, and other filesystem files. (Metadata refers to the data the filesystem stores about the files it holds, e.g., their names, size, or last modified date.) Since ntfs.sys runs in privileged mode, it can access virtual addresses in the system cache directly.
Besides being more efficient, the Mapping interface provides a method for controlling the order and sequence of dirty page lazy write flushes. A request to pin a mapped page prevents that file page from being trimmed from the system cache. It instructs the Cache Manager and VMM to keep the referenced cache pages resident in memory until they are unpinned. While a page is pinned, the application using the Mapping interface must specify to the Cache Manager that the page has been changed. (Having a direct pointer to the file page in the system cache, the application is free to manipulate the data stored there directly without going through the Cache Manager interfaces.) Later, when an application like NTFS needs to commit filesystem changes to disk, it calls the Cache Manager to unpin the changed pages. Unpinning signals the lazy writer that dirty pages need to be flushed immediately. The Mapping interface also allows the application to mark a series of unpinned mapped pages that need to be written to disk together. NTFS uses the Mapping interface to cache filesystem metadata, taking advantage of this capability to ensure that certain physical disk write operations are executed synchronously. This is the mechanism NTFS uses to guarantee the integrity of the filesystem metadata stored on the physical disk media, while still reaping the performance benefits of the memory-resident file cache.
Several performance counters track the usage of the Mapping interface by the network Redirector and NTFS. Pin Reads/sec are calls to the Cache Manager to pin mapped data in the file cache. This is typically done prior to updating the information stored in the pinned page. Pinning normally occurs right after the unpinned page was flushed to disk so the page referenced is often still in the file cache. The Pin Read Hits % counter reports the percentage of pin reads that were satisfied from cache without having to access the physical disk. As Figure 7-8 illustrates, NTFS pinned reads are subject to extremely bursty behavior. The burst of activity illustrated was generated by emptying out the Recycle bin on our desktop machine. This initiates a flurry of NTFS filesystem metadata changes. If NTFS is going to work efficiently, the value of Pin Read Hits % should be close to 100%.
Figure 7-8: NTFS pinned reads are subject to "bursty" behavior
Be careful when interpreting the average values for the Pin Read Hits % counter as calculated by Microsoft's System Monitor. We discussed this problem back in Chapter 2. Instead of calculating an average hit % value based on the total number of hits compared to the total number of requests, System Monitor simply averages individual interval counter values. The average value for Pin Read Hits % (in light gray) that Perfmon reports in this case was 92.0% (not shown in this illustration). The actual average over the interval can be calculated by accessing the Report View or importing the chart data into Excel and computing a weighted average hit %, for example:
SUM(Pin Reads/sec *Pin Read Hits % / 100) / SUM(Pin Reads/sec)
The actual average hit % calculated using Excel in this case was 98.7%. (All the Cache object counters reporting file cache hits % are subject to similar problems due to Sysmon calculating an averages of averages, a well-known statistical no-no.)
Notice that the Pin Reads/sec counter in Figure 7-8 reports calls to the Cache Manager Mapping interface. These correspond to NTFS operations on file metadata buffered in cache memory--they do not represent either logical or physical disk operations. The Data Maps/sec and Data Map Pins/sec counters are similar. They also report the number of calls to the Mapping interface API, as shown in Figure 7-9. (Data Maps/sec is highlighted.) Having a direct pointer to the file cache data, NTFS is free to manipulate the data stored in cache as frequently as it needs to. Notice in Figure 7-9 that the scale for Data Maps/sec has been adjusted by a factor of 0.1. The maximum rate of Data Map requests reported was 1929 per second. The maximum value for Data Map pins was only 64 per second over the same interval. The Data Map Hits % counter is subject to the same averaging of averages reporting flaw just discussed.
Figure 7-9: Data Maps/sec and Data Map Pins/sec report the number of calls to the Mapping interface API
MDL stands for Memory Descriptor List, which refers to a set of one or more real address file cache pointers. Similar to the Mapping interface, the MDL interface is more efficient than the generic Copy interface because it stores file data in only one spot in memory. Any application that uses the MDL interface must be prepared to deal with all the complexities of managing real addresses.
This interface is used by both the Windows 2000 file server service and IIS to send real address cache pointers directly to a direct memory access (DMA) network interface card (NIC). Peripheral cards that support DMA can access PC memory directly without having to otherwise engage the CPU. The trick is that DMA cards can operate only on real addresses; using virtual addresses would require virtual-to-real address translation, which only the processor can do. To stay out of the CPU's way, DMA cards must operate exclusively on real memory locations.
Once a DMA device is passed a valid real address, it can read or write data at that real memory location without further intervention from the processor. DMA devices then raise an interrupt to indicate that the operation requested has completed. Nearly all NICs support DMA. You will sometimes see specifications like "Busmaster DMA" to indicate that the card uses DMA. Sometimes the card documentation won't specify DMA, but will otherwise indicate that it is able to read or write PC memory directly.
Because the MDL interface deals with real addresses, not virtual ones, the Cache Manager ensures that these cache pages remain in memory until the file server service marks them as complete. As in the Mapping interface, the file server service must signal the Cache Manager that it is done with the file cache pointer. To do this, the NIC generates an interrupt when the DMA operation is complete. This interrupt is processed by the file server service DPC, which is responsible for notifying the Cache Manager that the application is finished using the file segment pointer.
Let's look at an example of how the file server service uses the MDL Cache Manager interface. Say you open an MS Word document file stored on a remote machine. The Windows 2000 Redirector intercepts the file I/O request. Redirector translates your request into server management block (SMB) commands that it sends to the remote machine to request opening and reading the file. On the machine where the file resides, the Windows 2000 file server service fields these SMB requests. The Server service actually opens the file locally, enabling it for caching using the MDL interface. This part of the process is illustrated in Figure 7-10.
Figure 7-10: Caching using the MDL interface, part 1
In response to specific Server file access requests, the MDL interface accesses the requested file segment from disk and brings it into the virtual memory associated with the system cache. The MDL interface then returns the real address of the data stored in the cache to the Server application. Server constructs an SMB reply message, which includes the real address pointer for use by the NIC. This request is passed down the network protocol stack to the NIC. Ultimately, the network device driver builds a request for the NIC that copies data directly from the cache buffer into packets that are returned over the wire to the sender.
Meanwhile, once the file is opened and caching is initiated, the Cache Manager on the file server begins to perform read-aheads automatically to stage the file in cache in anticipation of future requests. At the same time, the network Redirector also invokes the Cache Manager using the Copy interface as it begins processing SMB reply messages to return the data requested to the application. Figure 7-11 shows the file Server service replying to the client Redirector, which then copies the file data requested from the client cache into the application's file buffer. Notice that by the time the data requested is successfully passed back to MS Word, there are three copies of the data in memory: two on the client machine and one on the file server, as depicted. The MDL interface's more efficient use of memory is manifest only on the file server side, where just one copy of the file data is resident in memory.
Figure 7-11: Caching using the MDL interface, part 2
Several Cache object performance counters measure MDL activity. MDL Reads/sec is the rate of file server (or IIS) reads from cache in reply to SMBs representing networked file requests. This counter is an excellent indicator of overall file server activity (see Figure 7-12). MDL Read Hits % is the percentage of MDL reads satisfied without having to access the physical disk. Again, Performance Monitor (and System Monitor) calculates an average of averages. Instead of the 37% MDL Read Hit % average that Perfmon reports (the lighter of the two gray lines), the overall hit ratio over the interval (recalculating the weighted average) was actually 68%. This calculated value allows you to measure the physical disk load associated with file server service requests that miss the cache. The file server service disk load corresponds to the MDL misses (1 - MDL Read Hits %) times the request rate (MDL Reads/sec). There is no current user of the MDL write interface, so no corresponding performance statistics are maintained. The MDL write interface was probably conceived for use by SCSI miniport drivers for disk and tapes, but never fully implemented.
Figure 7-12: Monitoring MDL Read Hits % and MDL Reads/sec
Performance Monitoring Considerations
On Windows 2000 machines configured to run as file servers using NTFS, it is wise to monitor cache effectiveness across all three Cache Manager interfaces. Local applications running on a file server use the Copy interface. Network requests handled by the file Server service use the MDL interface. NTFS filesystem metadata is cached using the Mapping interface. As Figure 7-13 illustrates, there are a total of four Read Hit % counters worth tracking: Copy Read Hits %, Data Map Hits %, Pin Read Hit %, and MDL Read Hits %.
Unfortunately, calculating an overall cache read hit ratio for the Windows 2000 file cache is not easy. Calls to the Copy read interface and the MDL read interface correspond directly to logical disk and networked disk I/O requests, respectively. However, the Mapping interface counts calls that NTFS makes to the Cache Manager interface API, which relate to the status of a mapped data buffer. This count reflects more than just logical I/O requests. In constructing an overall weighted cache read hit %, a high rate of NTFS metadata Mapping interface requests would bias the average.
A reasonable alternative for monitoring cache effectiveness on a Windows 2000 file server is to calculate a combined average for logical disk and networked disk I/O requests, simply ignoring NTFS metadata requests to the Mapping interface. Given these difficult matters of interpretation, perhaps it is best to simply report the four hit ratio counters separately from our basic Windows 2000 performance reporting set, as shown in Figure 7-13.
Figure 7-13: It is best to report the four hit ratio counters separately
About the only cache tuning action that can safely be conducted on most Windows 2000 machines is monitoring cache size and cache effectiveness, and adding memory to machines displaying signs of diminished cache effectiveness. In this section, we examine a series of file caching experiments that compare and contrast effective versus ineffective Windows 2000 file caching. These experiments were conducted using the Windows 2000 Resource Kit's Performance Probe program for generating artificial file I/O workloads. In the first scenario, we show how effective the Windows 2000 file cache can be when the cache size is adequate. In the second scenario, we observe the Windows 2000 Cache Manager under stress, attempting to manage a file access pattern that resists caching. Finally, we examine the impact of the one Cache Manager tuning parameter that is available for controlling the size of the Windows 2000 file cache. All tests were conducted originally on a Windows NT Server running Version 3.51 on a Pentium Pro 200 machine with 64 MB of RAM installed. Subsequently, we verified that the identical behavior occurs under both Windows NT 4.0 and Windows 2000. The tests were conducted on a standalone machine. The only other application running concurrently was Perfmon, which was logging performance monitor data to disk during the test runs.
Scenario 1: Effective Caching
In the first scenario, we ran the Probe program, available in the Windows NT Resource Kit, defining a 16 MB file to be accessed randomly. A read/write ratio of 1:3 was specified--there are three logical write operations occurring for each logical read request, all pointing to random 4K blocks within the file. The reason we used a logical I/O workload heavily skewed towards write activity is that we had initially planned to use the Probe program to test the speed of various hardware and software RAID configurations. RAID 5 performance, in particular, is very sensitive to a write-oriented workload. The set of benchmark runs reported here was initially conceived because we wanted to understand the Windows 2000 file cache loading effects first. As noted previously, cache effects make benchmarking notoriously difficult.
We were also motivated to test the very aggressive claims made in Russ Blake's original book Optimizing Windows NT, packaged as part of the Version 3.51 Resource Kit, advertising the NT file cache as "self-tuning." To test Blake's claims, we specified a file cache workload in Scenarios 2 and 3 that would defeat all but the most intelligent file caching algorithms. Needless to say, the Windows NT and 2000 file cache is not nearly as intelligent as Blake's book claims, although you can see for yourself how effective it can be under the right circumstances. Subsequent versions of the Resource Kit documentation, written after Blake retired from Microsoft, expunged his more extravagant suggestions that the Windows 2000 operating system was "self-tuning." By the way, when we found flaws in the Resource Kit's Probe program that made it difficult to interpret our disk benchmark results, we decided to switch and use the Intel Iometer disk I/O testing program instead. Some of the many disk performance tests we made using Iometer are reported in Chapters and .
As the Perfmon report in Figure 7-14 illustrates, the Cache Manager is quite effective when the files being accessed fit nicely into the real memory available for caching. The Probe program uses the Fast Copy interface, which reports an average of 186.25 logical read requests per second during the test. This is the rate of read operations that the cache handled. Notice that these were all synchronous operations. The number of fast reads per second was almost identical. The Copy Read Hits % is reported as 99.9%. Pretty effective cache utilization.
Figure 7-14: Overall cache statistics for Scenario 1
The lazy writer in this scenario operates at a leisurely pace of 1.883 flushes per second, writing 7.639 pages per second. Lazy write is using a burst size for writes of 7.639/1.883, or about four 4K pages at a time on average. The rate of data flushes is slightly higher because this counter includes lazy writes, but also counts pages flushed by the mapped file writer thread. Altogether, there are about 8.5 file pages being written through cache to disk per second. Since the test was set up to process 75% writes, if the Probe is generating almost 190 reads per second, it should be generating three times that number of writes, about 550 writes per second. Where did they all go? Windows 2000, in effect, absorbs these writes into memory. This shows how effective deferred write-back caching can be when an application overwrites the same file location locations repeatedly.
The Performance Probe is not the only application writing data to disk during the interval. Perfmon is writing data to its log file at the same time. In the 10-minute test, the Perfmon log grew to approximately 9 MB, so Perfmon is writing about 15 KB per second, too. Data Map Pins/sec is 43.854 per second. The ntfs.sys filesystem I/O drivers use the Data Map service to cache directories, the NTFS Master File Table, and other filesystem metadata. Windows 2000 pins an NTFS mapped file metadata page in cache in preparation for an update. As the Perfmon log file grows, NTFS filesystem changes reflecting the current size of the file must be written back to disk, too.
Figure 7-15 shows a Perfmon chart that displays a few of the important cache statistics from the test over time. The top line at the far left edge of the graph is the cache Copy Read Hits %. Within seconds, the file cache read hit ratio rises to almost 100%. (The scale is set to 200, so 100% cache hits is in the middle of the chart.) It takes a little bit of time at the start of the test to load the cache with data, which is evidence of a cache cold start. The Cache Bytes counter providing the total size of the system working set is also shown, highlighted as a white line. At the beginning of the test, Cache Bytes was 6 MB. The amount of memory the file cache consumes rises steadily over the course of the test, to about 13 MB by the end of the run. Since there is no other process running on the system making demands on memory, the file cache can acquire pretty much all the memory it wants. Because of the way the Copy interface works, the Probe process working set (not shown) expands, too.
Figure 7-15: Performance Monitor displaying some important cache statistics
Another point of interest is Lazy Write Pages/sec. This is the saw-toothed line in the lower portion of the screen in Figure 7-15. The peaks represent large bursts of lazy write activity triggered by a threshold at regular, predictable intervals. Lazy write is deferring I/Os to disk as long as possible. Eventually, the number of dirty pages in the file cache exceeds a threshold value, and the lazy write thread generates a spurt of activity. During peaks, more than 50 pages per second are written to disk. These spikes are followed by longer periods where the disk appears to idling along at a leisurely pace.
Scenario 2: Overrunning the Cache
Figure 7-16 shows data from a second test, identical to the first except that this time we used a 128 MB file that overflows available cache memory. In this test scenario, it is impossible to buffer the entire file in memory, so the random access pattern forces many more cache misses. In fact, the overall Copy Read Hits % is only about 48%. Because there are many more misses to process, the Performance Probe program is able to perform far fewer cached file I/O operations, slightly under 70 reads per second. Lazy write activity increases sharply to about 30 lazy writes per second, flushing about 50 I/Os per second from the cache to disk. Similar numbers of data flushes and data flush pages per second are also occurring.
Figure 7-16: Overrunning the cache
Figure 7-17 shows the Copy Read Hits % over time, denoted by a narrow dotted line starting at near zero and increasing to about 50% after the first 30 seconds of the test. The dark, heavy line at the top of the screen is the number of Cache Bytes. It rises rapidly, only to level off at about 26 MB. This illustrates why it is not necessary to size the cache manually in Windows NT and 2000: the cache will grow depending on how much memory is available and how much it needs. The gray line in the lower half of the screen represents lazy write activity. Notice that there are bursts when the number of lazy writes exceeds 80 pages per second. The bursts are much more erratic than in the previous scenario, a sign that the operating system is having trouble managing in the face of this workload. This is not totally unexpected. Scenario 2 is accessing a file that is significantly larger than available RAM, a worst case for almost any kind of cache management.
Figure 7-17: Cache statistics for Scenario 2
Meanwhile, we are starting to accumulate some evidence about how intelligent the caching algorithms incorporated in Windows NT and 2000 are. In Figure 7-17, we changed the chart scale to make it possible to view the trend in the growth of the size of the cache (thick gray line at the top of the chart). We also track Available Bytes, highlighted in white. VMM allows the cache to grow rapidly up to the point where Available Bytes falls below 4 MB. With approximately 4 MB available, the system cache can no longer expand unrestrained. At this point, the system does not have many other applications from which to steal virtual memory pages, so evidently it begins to steal from the system cache. As the cache adds more pages to the system working set, they become subject to page trimming. Over time and in fits and starts, the file cache manages to pick up a few more pages at the expense of other processes, but overall its glory days are over. This system shows the classic symptoms of thrashing, but there is evidently no capability in the Cache Manager to detect that virtual memory thrashing is occurring and to take remedial action to minimize the damage.
Summarizing Scenario 2, this workload stresses the Windows NT and 2000 file cache, which expands to try to encompass the random disk I/O request pattern. The memory-resident file cache is managed like any other piece of real memory associated with the system working set. As free memory is absorbed (and the pool of Available Bytes is depleted), the memory allocated to the file cache becomes subject to vigorous VMM page trimming. The random file references continue relentlessly, so finally something has to give. That something is the lazy write process, which grows noticeably more irregular and less efficient under stress.
Under the circumstances, we should not judge the performance of the Windows 2000 file cache too harshly. We deliberately selected a workload designed to defeat most caching schemes. Analyzing the test scenario measurements provides empirical evidence to support the following rule of thumb that should be incorporated into your daily performance monitoring regimen: any system that is observed consistently at or below 4 MB of Available Bytes and is paging heavily requires more RAM.
Scenario 3: Favoring the Cache
We performed one final experiment in this series, repeating the large file random access pattern of Scenario 2, but changing a performance parameter to favor system cache memory over application process working sets when it comes to page trimming. We changed a memory management parameter associated with file server optimization from its default value in Windows NT Server (Maximize Throughput for File Sharing) to Maximize Throughput for Network Applications. To access this parameter on a Windows NT Server, activate the Network applet from the Control Panel and access the Properties for Server under the Services tab, as illustrated in Figure 7-18. On Windows 2000 Server, you can find the same tuning knob from Settings [right] Network and Dial Up Connections [right] Local Area Connection. Click on Properties, highlight File and Printer Sharing for Microsoft Networks, and then click on Properties again. To take effect, the system must be rebooted after the change is made. After rebooting, we reran the Scenario 2 workload.
Figure 7-18: NT file server optimization parameters
The effect of this change is to set two registry-resident tuning parameters: Size, which is associated with the Server process address space, and LargeSystemCache, which is associated with Memory Management. LargeSystemCache is used to change the default maximum working set size of the system cache from its default of about 8 MB to a value equal to the size of main memory, minus 4 MB on Windows NT. It operates slightly differently under Windows 2000, as we explain shortly.
In Windows NT Server, clicking on the Maximize Throughput for Network Applications radio button sets LanmanServer Size to Large and LargeSystemCache to 1 from its default of 0, indicating that you want to run with a large system cache. The location of this parameter is shown in Figure 7-19, as the documentation is confusing. The regentry.hlp documentation file that comes with the NT Resource Kit explains that this setting "[e]stablishes a large system cache working set that can expand to physical memory minus 4 MB if needed. The system allows changed pages to remain in physical memory until the number of available pages drops to approximately 250." The regentry.hlp documentation file also suggests, "This setting is recommended for most computers running Windows NT Server on large networks." You might take exception to that advice when you see what happens in Scenario 3. The regentry.hlp documentation file also says that LargeSystemCache defaults to 1 on NT Server, which is incorrect. The value of LargeSystemCache is set to 0 (no large system cache) on both NT Server and Workstation.
Figure 7-19: The location of the LargeSystemCache parameter in the Windows 2000 Registry
Setting LargeSystemCache to 1 has somewhat less extreme behavior in Windows 2000. It sets the system cache maximum working set to 80% of the size of main memory, so that the file cache cannot overrun all the memory. (If you think reserving 20% of memory for other applications is a bit arbitrary, you are right. More on this subject later.) There is no trace of either the LanmanServer Size parameter or LargeSystemCache in the Windows 2000 Resource Kit documentation, but you can still find LargeSystemCache in its customary spot in the Registry. In Windows 2000, setting the Maximize Throughput for File Sharing radio button turns LargeSystemCache on. You still need to reboot to have this change take effect. (If you are confused about what radio button sets which option, as nearly everyone is, just check the Registry.)
We reran Scenario 2 to see what transpired after the change, and summarized the results in Figures 7-20 through 7-22. Figure 7-20 suggests that file cache performance improves slightly following the change. The Copy Read Hits % improves to 61%. Consequently, the read hit I/O rate improves to almost 74 I/Os per second. To put that number in perspective, that is an improvement of less than 10%. The rate of lazy write activity also decreases modestly. Flushes per second falls to about 21, and the number of pages flushed per second is reduced to just under 35.
Figure 7-20: Setting the value of LargeSystemCache to 1 improves the cache hit ratio marginally
Figure 7-21 shows the Copy Read Hits % trending upward over time. There is a spike about one minute into the test--the explanation for this anomaly is that after rebooting and rerunning the test, we discovered that we had not restarted MS Word. The spike in the Copy Read Hits % value corresponds to loading MS Word from disk. Besides MS Word, the Probe program, and Perfmon, there was no other activity on the system during the test. Outside that one blip, the trend is steadily up, although it levels off over time. The last measurement interval shows that the system reaches a read hit % of 69%, a distinct improvement over Figure 7-17.
Figure 7-21: LargeSystemCache allows the file cache to grow at the expense of all other virtual address spaces
Judging from just the Copy Read Hits % counter, it looks like the operating system is handling this workload better when LargeSystemCache is set to 1. However, the increase in I/O throughput is not as high as you would expect, given the improvement in the hit ratio. Figure 7-22 shows the cost at which this improvement is achieved. The heavy line trending upward to the top of the display is Cache Bytes. Previously, the Virtual Memory Manager refused to allow the cache to grow much beyond a limit equal to about one half the total amount of real memory. Evidently, the parameter change relaxes that limit and allows the cache to grow and grow and grow. By the end of the 10-minute test run, the cache has increased in size to over 40 MB of the 64 MB total installed on the machine. These results show that the LargeSystemCache tuning knob allows the file cache to grow until it effectively takes over all of memory (minus 20% in Windows 2000) at the expense of any other applications.
We added one more detail to Figure 7-22. Highlighted in white is the Pages/sec counter, the system's hard paging rate. This is the total number of pages written to and read from disk. With file cache memory ballooning in size to over 40 MB, paging levels increase because there is so little memory left over. For the sake of comparison, the paging rate averaged 96 pages per second during this run, and only 45 pages per second during the previous run. During a peak interval, the total paging rate reached 235 pages per second, evidently an upper limit on what the disks in this configuration can deliver.
In this scenario, the system is subject to erratic spikes of paging activity, an additional element of instability and unpredictability. The additional paging load provides a clue to understanding why overall I/O throughput is not much improved in Scenario 3 when the LargeSystemCache parameter is changed. The cache hit ratio improves a bit, but I/Os that miss the cache have to contend with busier devices due to the extra paging, and therefore take longer.
Figure 7-22: With LargeSystemCache set to 1, Pages/sec begins to spike
The LargeSystemCache Tuning Knob
Now that you understand what the one tuning parameter available does, it seems appropriate to assess its usefulness. Under what circumstances should you change the default setting of LargeSystemCache? LargeSystemCache forces Windows 2000 to ignore the system working set where the file cache resides and look elsewhere when it needs to trim working sets. As described in the regentry.hlp documentation file, setting LargeSystemCache to 1 sets the system working set maximum in NT 4.0 to the size of RAM minus 4 MB--the designated target for the size of the Available Bytes pool. In Windows 2000, the system's maximum working set size is set to 80% of the size of real memory. Under the default value, when LargeSystemCache is set to 0, the system working set maximum is set to approximately 8 MB in both Windows 2000 and Windows NT. When set to 1, LargeSystemCache preserves and protects the file cache, which is considered part of the system working set, from page trimming. Turning the LargeSystemCache on forces Windows 2000 to trim excess pages from other process working sets.
In the context of configuring a large Windows 2000 file server, setting LargeSystemCache to 1 may not be a bad idea. The regentry.hlp documentation file explains that both the Maximize Throughput for File Sharing and Maximize Throughput for Network Applications buttons set the value of the file server Size parameter to 3, which sets a large file server service working set. Presumably, the file Server service running in services.exe issues an appropriate call to SetProcessWorkingSetSize based on this Registry setting. In other words, the working set of the file Server service running in services.exe also must be protected from page trimming when the LargeSystemCache option is enabled. Remember that the file Server service uses the more efficient MDL Cache Manager Interface that stores file data in RAM only once. Meanwhile, the file Server service address space itself is also protected from excessive page trimming, while the file cache is allowed to expand until it fills the remainder of the system's memory.
The problem is any other applications running on the server. If you are trying to configure a consolidated Windows 2000 Server running more than file and print services and LargeSystemCache is set to 1, the new behavior in Windows 2000 is to preserve 20% of RAM for other applications (including the file Server service running inside services.exe). If you are not careful, Windows 2000 may trim back the working sets of other applications too much with the LargeSystemCache setting in effect. Some applications may become slow and unresponsive due to excessive page stealing directed at them. If file cache activity heats up, there may be a noticeable delay when desktop applications are swapped back into memory following a period of inactivity. Due to high paging rates, any application that suffers a hard page fault may encounter delays at the busy paging disk.
The fact that the file cache can map a maximum of 960 MB of RAM (or 512 MB in Windows NT) does establish an upper limit to the amount of memory the file cache will use, even when LargeSystemCache is set to 1. This limit suggests that when you configure a very large server with, say, 2 GB of RAM, setting the value of LargeSystemCache to 1 will not squeeze out other applications once the file cache grabs what it can. A final consideration is that server applications like MS SQL Server that make an explicit call to SetProcessWorkingSetSize to set their working set minimum and maximum are afforded a measure of protection from page trimming even when LargeSystemCache is set to 1.
This drastic behavior of the LargeSystemCache parameter was modified in Windows 2000. Instead of increasing the system working set maximum to the size of RAM minus 4 MB, turning on the LargeSystemCache in Windows 2000 sets the system working set maximum to 80% of available RAM. The intent of this change is to dampen the extreme behavior of this tuning knob, making it easier for a system running with a LargeSystemCache to run some other mission-critical applications inside the same box. Unfortunately, reserving 20% of the remaining RAM for other applications is a purely arbitrary partitioning of available memory resources. It is not clear why the Windows 2000 developers are not willing to accept the inevitable and provide a more flexible tuning knob to specify the desired size of the file cache.
No Minimum File Cache Setting
The corollary of not having a tuning knob to set a ceiling on the size of the file cache (or the system working set) is that there is no knob to establish a floor under the file cache either. Because of this, it is possible for memory contention due to other applications to shrink the size of the file cache. Figure 7-23 is a chart illustrating how other applications can drastically impact the size of the file cache. It shows a Windows 2000 Server running an application with a memory leak inside an infinite loop. The Paging File % Usage counter, charted on the right-hand Y axis, shows a steady increase in virtual memory allocations, with a dip at one point where Windows 2000 automatically extended the paging file to increase the system's Commit Limit. Finally, toward the end of the performance monitoring session, we used Task Manager to cancel the offending application. (It was one of those 3D graphics rendering applications where it takes so long to rotate a 3D image that you cannot be sure that is working correctly or looping uncontrollably. In this case, the app, which shall remain nameless, went haywire.) Canceling the application results in Windows 2000 being able to recover all the excess committed virtual memory that this application acquired.
Figure 7-23: An application with an infinite loop can drain resources from the file cache
Notice the impact on System Cache Resident Bytes. The file cache size is a modest 2 MB to begin with, but when the memory leak loop occurs, the file cache is squeezed even further. The System Cache Resident Bytes is virtually at zero by the time we killed the looping process.
The implication of Figure 7-23 is that with no way to establish a minimum file cache size, there is no guarantee that any real memory will be available for the file cache when other applications are stressing real memory resources. It is no wonder that applications like MS SQL Server and Exchange, which rely on a memory-resident disk cache for performance, bypass the Cache Manager and develop their own virtual memory management routines under Windows 2000.
The CacheSet Utility
For those who have a tin ear for Microsoft's "no knobs" mantra, you can always turn to Mark Russinovich and his trusty http://www.sysinternals.com shareware web site. At http://www.sysinternals.com/ntw2k/source/cacheset.shtml, you can download Russinovich's Windows 2000 CacheSet utility, which provides a simple control for setting the minimum and maximum system working set size (see Figure 7-24). Unlike most of the other http://www.sysinternals.com freeware utilities, CacheSet has a command-line interface, which makes it suitable for use in automating parameter changes.
Figure 7-24: . CacheSet's defaults for a 256 MB machine with LargeSystemCache enabled
Because CacheSet deals with the operating system, it cannot call SetProcessWorkingSetSize to set minimum and maximum working set size parameters. The CacheSet program that Russinovich wrote calls a Windows 2000 native API routine called NtQuerySystemInformation that resides in NTDLL.DLL to obtain information about the file cache settings (actually the system working set). It then calls NtSetSystemInformation to set the new sizing parameters. Of course, these minimum and maximum working set size parameters serve as guidelines for Windows 2000's Memory Manager; they are not definitive. When the system is under stress, which generally means having 1 MB or less of Available Bytes, the Memory Manager can and will shrink the system cache working set.
Using the CacheSet utility begs the question of how to set a minimum and maximum system working set to control the size of the file cache. The only method that works is trial and error, setting the file cache size minimum and maximum values and observing the cache hit ratio that results. The theoretical relationship between cache size and cache effectiveness postulated in Figure 7-1 suggests how to proceed. Iteratively, you can establish individual points on this theoretical curve, eventually filling in enough detail that you can extrapolate the rest. Keep in mind the normal variations in hourly or daily file server activity will add a good deal of noise to the tidy relationship hypothesized in Figure 7-1. Several measurements at different cache sizes that show great variation between the measured hit ratios strongly suggest Region 1 of the theoretical curve. On the other hand, several measurements at different cache sizes that show little or no variation between the measured hit ratios suggest Region 3.
If the file access patterns are relatively consistent, and you eventually accumulate enough reliable measurements by changing the cache size parameters and measuring the cache hit ratios that result, you may be able to profile enough of the relationship between file cache size and cache effectiveness for your workload to make effective use of the CacheSet utility. Of course, this is exactly the sort of systems administrator tuning activity that Microsoft hopes its "no knobs" approach eliminates. But until Microsoft delivers an operating system that really is self-tuning, this work cannot be avoided, especially on systems and critical applications where performance counts.
1 In the case of these two applications, another important factor in the decision to bypass built-in Cache Manager functions is that these rely on database management system (DBMS) objects (rows, columns, tables), which the OS does not understand. DBMS entities are not objects like files that the OS can cache.
© 2002 O'Reilly & Associates, Inc. All rights reserved.
We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as-is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice.