Investigating Bottlenecks

Investigating performance problems should always start with monitoring the whole system before looking at individual components. In precise terms, a bottleneck exists if a particular component's limitation is keeping the entire system from performing more quickly. Therefore, even if one or more components in your system is heavily used, if other components or the system as a whole show no adverse effects, then there is no bottleneck.

For example, suppose that a process had 10 threads, each of which used exactly 0.999 seconds of processor time once every 10 seconds. If each thread made a request exactly 1 second after the previous one in perfect sequence, the processor would be 99.9 percent busy, but there would be no queue, no interference between the threads, and, technically, no bottleneck, although the system probably could not support any increased load or variation in its request scheduling without creating one.

Factors involved in the development of a bottleneck are the number of requests for service, the frequency with which requests occur, and the duration of each request. As long as these are perfectly synchronized, no queue will develop and no bottleneck will arise. The device with the smallest throughput ratio is probably the primary source of the bottleneck.

It is difficult to detect multiple bottlenecks in a system. You might spend several days testing and retesting to identify and eliminate a bottleneck, only to find that another appears in its place. Only thorough and patient testing of all elements can ensurethat you have found all of the problems.

It is not unusual to trace a performance problem to multiple sources. Poor response time on a workstation is most likely to result from memory and processor problems. Servers are more susceptible to disk and network problems.

Also, problems in one component might be the result of problems in another component, not the cause. For example, when memory is scarce, the system begins moving pages of code and data between disks and physical memory. The memory shortage becomes evident from increased disk and processor use, but the problem is memory, not the processor or disk.

If you identify a resource that is out of range for your baseline or based on the recommended thresholds discussed in the preceding section, you need to investigate the activity of that resource in greater detail. This includes the following steps:

  • Analyze your hardware and software configurations. Does your configuration match Microsoft recommendations for the operating system and the services you are supporting?

  • Review entries in the event log for the time period when you begin seeing out-of-range counter values; these entries might provide information on problems that might result in poor system performance.

  • Examine the kinds of applications you are running and what resources they demand, to determine their adequacy.

  • Consider variables in your workload, such as processing different jobs at different times. For more efficient analysis, when you are looking for a specific problem, limit your charts and reports to specific events occurring at known times.

  • For immediate diagnosis and problem solving of situations such as shutdowns and logon failures, log or monitor for a shorter time. Sampling should be frequent when monitoring over a short period. Similarly, for long-term planning and analysis, log for a longer period and set the update interval accordingly.

  • Consider network or disk utilization or other activities occurring at the times that you see increasing resource utilization. Try to understand the usage patterns. Are they associated with specific protocols or computers?

  • Approach bottleneck correction in a scientific manner. For example, never make more than one change at a time, always repeat monitoring after a change to validate the results, eliminate results that are suspect, and keep good records of what you have done and what you have learned.

When investigating bottlenecks in specific resources, focus on the performance objects and counters that pertain to the specific resource that appears to be your bottleneck. Your reference for information about these counters and how to detect and correct bottlenecks should be the chapter of this guide that refers to the resource you are investigating. These chapters also discuss how to use other Windows 2000 tools and utilities on the Windows   2000 Resource Kit companion CD for bottleneck detection and tuning. The chapters are as follows: