SQL Server 2000 Operations Guide: Capacity and Storage Management

Capacity and Storage Management

Updated : October 26, 2001

Abstract

This chapter describes the database administrator's (DBA's) responsibilities with regard to configuring and maintaining physical storage components of the data tier to meet requirements of capacity, throughput, and performance.

On This Page

Introduction
Process Flowchart
System Classification
Capacity Management
Summary

Introduction

A significant aspect of database administration is balancing data and data access traffic over the server layer (files and hardware, referred to as the database storage components). The database administrator (DBA) who maintains a database is involved in a continuous cycle of architecture and operations. Because of this, this document gives some attention to both evaluating and optimizing your Microsoft® SQL Server™ 2000 system at the storage-component level of the data tier.

Resource Requirements

The guidance provided in this chapter assumes a competent staff of network administrators is available to assist you with your network capacity planning. The guidance also relies on published or communicated hardware specifications. Both staff and specifications are required to properly plan for and execute capacity improvements.

These resources are relied on because of the concentration of knowledge they contain. It would not be reasonable to attempt to build out your application infrastructure without access to these resources. Network administration and vendor resources, as described here, can provide the understanding of critical technologies that you will need in order to meet the demands of your application.

Note: It is assumed that all hardware that you purchase or consider purchasing is on the Hardware Compatibility List (HCL). You should carefully consider the consequences of maintaining any existing non-HCL hardware, as well as the ramifications of purchasing non-HCL hardware, in terms of cost effectiveness, support, and information about potential problems.

Network administrator: This administrator needs to understand the bandwidth provided by your current network infrastructure and the options provided by alternate technologies. He or she should also be aware of your organization's schedule for upgrading and expanding the network.

Vendor representative: This representative can provide guidance on the optimal use of his or her company's products. It is possible for the vendor's Web site to stand in for the actual representative, if the site provides white papers on the best practices associated with the vendor's products.

Application developer: For each new release, an application developer needs to provide the DBA with a profile of the application's footprint and expected usage characterization, either personally or by providing the appropriate documentation. It is also possible for the developer to provide a load tool that would allow you to see the effects of load on a test environment.

Monitoring data: It is assumed that the DBA regularly monitors storage resources and has the information necessary to begin a new cycle of proposed changes. You should review your capacity every six months to a year. If you anticipate a change in usage of your production environment, if there are changes to hardware or an application, or if your organization takes a long time to fulfill requests for hardware, you should consider performing a capacity planning review on a more frequent basis. As soon as it appears that the load handled by your data center has grown somewhat, go through the capacity planning exercise. When you expand the capacity of your system, always plan a minimum of six months in advance.

Process Flowchart

Cc966510.sqlop601(en-us,TechNet.10).gif

Figure 6.1: Capacity Planning Flowchart

The capacity planning flowchart (Figure 6.1) details the steps you should walk through each time you review your infrastructure. This review should be done on a regular basis. How frequently this review is done, however, depends on your environment.

One of the steps worth noting in the flowchart asks, "Is the application anticipating similar growth in future?" This attempts to determine if the historical growth your application has seen will continue. This is one of the most difficult things to determine as a DBA. Since you have no control over the forces that bring about load, how could you possibly determine what the future load might be like? You will need to read the signs.

Ask the application development team what releases are planned over your capacity planning timeframe. Will these releases introduce new functionality that could draw more users to the application? Will this functionality use more resources than previous procedures? Will any optimization be added to the application?

Evaluate all the server resources when you are planning, including CPU, memory (RAM), disk, and network. You will need to take all of these factors into account when you determine the state of your application over the capacity planning timeframe.

System Classification

Table 6.1 delineates a standardized classification system. Intended to create a common reference point in dealing with the transactional volume, read/write activity, and data mass (size), these classification codes can be used to determine whether a specific piece of advice applies to you or not. So, rather than describing a system as a "small reporting system," you can more accurately identify your system in relation to others.

Table 6.1 Standard System Classification

Cc966510.sqlop602(en-us,TechNet.10).gif

The generic codes used here are simply for the purpose of providing an example. To create a classification code that is specific to your system, use the following measures:

  • Transactions per second (TPS) (TPS is measurable, for the purposes of this paper, through the \SQLServer:Databases (Total) \Transactions/sec counter in System Monitor. You can also observe transaction statistics by analyzing a trace containing Transaction Event Class data, or by creating a customized counter specific to your system)

  • The read/write ratio

  • The total size of the related database files (mass)

The measurement of the database size (mass) can be applied at any level in the chart. If you have a terabyte of data, you would indicate this by adding the prefix "1T" to the classification code: 1T 7000 R100. If you have 100 G of data, the code would look like this: 100G 5 R20. You should also consider the rate of growth (also called the percent delta "%Ä") as a factor in making decisions about your system. This measurement is based on the percentage of increase in the transactions per second and/or the total mass, whichever is more relevant to your system.

Capacity Management

The key points of capacity planning are as follows:

  • Be familiar with your system's use of hardware resources

  • Maintain a tangible record of your system's performance over time

  • Use this information to plan for the future hardware needs or software projects required

Capacity planning does not need to be a complicated process, but it does involve meticulous work, requiring numeric precision and documented proof. The complexity involved in capacity planning increases as the magnitude of the system increases. Capacity planning becomes critical when expanding capacity is an expensive endeavor. Expenses will be different for each organization.

Traditional capacity planning will tell you how much hardware you need to support a specific load on the server (assuming that the application takes full advantage of the hardware). As your system grows, you will have to do this evaluative process more than once, and as you gain experience with it, you will undoubtedly refine and enhance the process for your individual system. This is going to depend on your maintaining a historic baseline record of your system.

Three things drive the type of hardware and configuration you need:

  • Performance required

  • Volume of usage

  • Application design (the data access methods).

Of these three, the last has the most influence. Essentially, you have two paths to choose from:

  • Concentrate on designing the best data access methods for your application to get the most out of your hardware, which results in higher cost for the short term, or

  • Concentrate on the best configuration for your hardware to get the most out of your application, which results in higher cost for the long term

For maximum scalability and performance, you must do both and then reevaluate your system regularly to find improvements that can be made over time. This provides the best value overall.

If interaction with data is the foundation of your system, particular emphasis should be placed on the architecture and maintenance of the database server, as it generally requires more centralized resources than other parts of the system. Design architecture and optimization should be regarded as an iterative process, and should take a high priority in database administration group, until system growth and usage have completely stabilized.

Important: You must have a test environment that is separate from both production and development, in order to have a stable production environment. Ideally, it should be a complete and exact match of the production environment and be controlled in the same way. Minimally, you must have enough equipment to simulate the production environment and an impact analysis of all known variances between the systems.

If you are putting together a smaller system, it is usually best to work with the hardware vendor to determine your hardware needs. For a smaller system, the level of detail in a thorough capacity planning process may seem like overkill. It can still be a good idea to go through the steps of capacity planning, as the process itself reveals much about your system, but for systems in the range of (for example) 1000 G100 R80, a smaller scale of planning can be applied.

The best practice for capacity planning is to monitor your system periodically, making certain that you communicate the disparity between what the server is capable of and what the application can support (see Chapter 5: Monitoring and Control). Then, give your requirements and findings to your hardware vendor, who is in the best position to judge what you should consider purchasing.

For detailed coverage of capacity planning techniques, see Microsoft SQL Server 2000 Administrator's Companion, an MS Press book.

CPU and Memory Management

Two of the main components that you will manage are the CPU and memory. These components are managed much as they would be for any Microsoft Windows® 2000 server.

CPU Planning

Processor planning is fairly straightforward. Monitor your current CPU utilization (\Processor(_Total)\% Processor Time). If the average is over 50 percent, if you have frequent peak usage periods when the current CPU utilization spikes over 90 percent, or if you have a situation in which the usage spikes and stays up for a while, then you should consider adding either additional processor(s) or faster processors.

In general, the processors you choose should be able to deliver the speed implied in your other system purchases. If your system is highly specialized and filled with processor-intensive activities, you will become aware of that as you observe the system over time. Examples of such activities include extensive or frequent usage of Data Transformation Services, or anything involving many calculations (science, accounting, and so on). SQL Server is a CPU-intensive application, so look for processors with a large high-speed cache. Always get the fastest and newest when it comes to processing power. The processor allows the rest of the server to do its job well.

If you have a dedicated SQL Server computer, use all of the processors for SQL Server. If your system is running applications in addition to SQL Server (such as Microsoft Commerce Server), then consider restricting SQL Server from using one or more processors. Otherwise, allow SQL Server and Windows to balance across all processors, as they were designed to do.

Memory Planning

While the sum of all hardware together dictates the capacity of a system, memory serves mainly to optimize data access. SQL Server uses memory to store execution plans, store data pages between uses, and so on. Without enough memory, you will incur more disk I/O in reading data. If your system does many reads, you might reduce disk I/O by significantly increasing your memory, because the data will then remain in cache. Insufficient memory, or over-allocation of memory, can result in paging. Memory plays an important role in SQL Server, and it is a resource you should carefully monitor.

For systems for which reads are the highest priority (decision support systems or DSS), more memory is better. Memory can be used to compensate for disk I/O, and large amounts of memory can significantly decrease the number of disks (spindles) you will need to achieve high performance.

For systems for which writes are the highest priority (on line transaction processing or OLTP), memory is still an important part of the system, but you may benefit more from the addition of disk spindles, and more or faster controller channels, rather than memory. To be sure, you will need to carefully monitor your system to see which resources are in highest demand.

Disk Planning

The important point to remember about data storage is that the number of disks is far more important than the total storage size of the disks. One big physical disk may hold all your data, but it still has only one disk arm to execute, individually, each data request. The more disk arms you have, the better off you will be. So, when you size for new disks, do a quick check to be sure this is enough drive space; but spend more time on analyzing how many spindles you really need. For example, if your system performs a lot of transactions, you will enhance performance by adding more spindles (provided there is sufficient memory and CPU to support the system).

When you are ordering your hardware, request a specific number of disks, rather than a specific amount of disk space. You want to have many smaller disks rather than fewer larger disks. If you have external storage, go for the fastest array controller card, and one that has multiple channels. Look at this card as a potential bottleneck: If you have many spindles, you need to invest in a card that can support them. How much performance you get will be directly proportional to the quality of the controller, and the type of I/O your system produces.

In OLTP, you can have more disks per controller card, which means the disk spends more time looking for the data, and the controller channel will not become so saturated.

In DSS, where more of the queries are apt to be sequential reads, you will need more controller channels for a smaller set of disk drives. Also remember that memory is a good way increase I/O in a DSS system.

For more information on all of these topics, please see SQL Server Books Online. Also see Inside Microsoft SQL Server 2000 and Microsoft SQL Server 2000 Administrator's Companion.

Tips

In a data-centric environment, the database server must support interaction with many clients and systems, and will take a proportionally higher percentage of the total system hardware investment in relation to its importance and usage rate. The best practice is to do one of the following:

  • If you expect sudden growth, buy hardware you can add on to.

  • If you expect slow steady growth (or none), buy what you need, and then when you upgrade to new hardware, transfer the current hardware into the test or development environment.

This rule of thumb applies to most situations, with the notable exception of systems of Class 5000 and above, which require a larger data storage system that is less likely to be cycled out of production.

Disk Controllers

Cache

Not all write caching is safe for use by a database server. You should make sure that your disk controller has features such as safeguards to avoid uncontrolled reset of the caching controller, on-board battery backup, and mirrored or ERC (error checking and correcting) memory. Check with your hardware vendor to ensure that the write cache includes these and any other features necessary to avoid data loss. Do not implement write caching unless the hardware vendor guarantees that their write cache includes these features and any others required to prevent data loss.

Array accelerator cache settings can be left at the default value, which is typically 50:50 read:write. These settings can also be adjusted to favor reads or writes if you know which your system requires. Note that if you are using a write setting above zero here, you have enabled write caching.

If you are using failover clustering, it is a good idea to consider using mirrored cache.

For more information on SQL Server and caching controllers, see Knowledge Base article Q86903 at https://support.microsoft.com.

Channels

If your array configuration controller supports using more than one channel, make sure you take advantage of it. Fast channels have a tremendous effect on I/O performance.

On a SCSI array controller, the only way to check this is to open the configuration tool that was provided by your controller manufacturer, and look at the settings. Use caution when doing this, because if you press OK or Save at any point while viewing the controller settings, you may inadvertently reconfigure your array (which would result in immediate loss of any data stored on that drive).

Windows NT File System (NTFS) Allocation Unit

SCSI Drives: When you format the new drives in Disk Administrator, you should consider an allocation unit, or block size, that will provide optimal performance. Significant performance gains may be obtained by sizing this to a larger value in order to reduce disk I/Os; however, the default value is based on the size of the physical disk. The best practice for SQL Server is to choose 64 KB, because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os. Keep in mind that although this information can be useful, the type of storage you are using (and in some cases also your backup software) will drive the format of your disks. If you are changing the block size on an existing system, be sure to run a baseline in your test environment and another after you have tested the changes.

For more information on planning and configuring hardware, see Inside SQL Server 2000 by Kalen Delaney. For specific information on your hardware, refer to the customer service site of your hardware manufacturer.

Data Storage Subsystem Management

When you are ready to configure the disks, you should make a diagram of the hardware to help you decide how to configure the hardware in accordance with the data file layout strategy you need. You should be aware of a few concepts. Except for the amount of total disk space you need, you should give less attention to the size of the hard drives than you give to the number of hard drives in a set. This was important in ordering the hardware, and it is important now as you lay out the data files.

RAID

RAID (Redundant Array of Independent Disks) protects data by keeping redundant copies of it on different physical disks, so that if one disk is lost through hardware failure, enough information exists on the remaining disks to reconstruct the original data. For a database server, always choose hardware-level RAID rather than software RAID. Software RAID uses CPU cycles, and this prevents SQL Server from performing at the maximum level. Two core RAID levels are of value for a database server: striping with parity (RAID 5) and striped mirror (RAID 0+1). The best overall option is to choose RAID 0+1 (also called RAID 01 or "striped mirror"). RAID 5 can be used in certain circumstances, but is generally more expensive in the long run, and less reliable.

Important: Do not use software RAID on a SQL Server unless you can spare the CPU usage. Also, do not use file compression on a database server—this is not supported.

Keeping redundant copies of data on different physical disks protects the data, so that if one disk is lost through hardware failure, enough information exists on the remaining disks to reconstruct the original data. In RAID 5, each time data is written to disk, it actually takes four I/O operations to create the read data and parity blocks, and the write data and parity blocks. This is slow for two reasons: First, each process is consecutive, so they must wait for each other; second, this operation occurs while many other transactions are vying for the disk resources. RAID 0+1 writes to the primary disk and the mirror in one operation. Although you do have to wait for the write to complete on both drives, both writes are simultaneous.

The hidden cost of RAID 5 is in the lower write performance rate, which is a result of the number of records that can be entered, orders that can be processed, and data requests that can be returned. In deciding to go with RAID 5, which seems more economical at first glance, you are actually balancing a few thousand dollars in hard drives against system productivity.

The only advantage the older RAID 5 has is that you get more storage space. So, the question to ask is: Do you need to store a great deal of data on as little hardware as possible (but still have some fault tolerance), or do you need fast access and higher fault tolerance? Take a hard look at the size of your data files before making this decision, even on a small server with internal-only drives.

Given the same number of physical disks, RAID 0+1 will be faster than RAID 5. An increase in the number of drives in RAID 0+1 provides a linear increase in I/O capacity. A read can be obtained from either the primary drive or its mirror, so as I/O increases, the drives will become faster as a set because they can perform multiple reads simultaneously.

Although data can be restored from backups of the database, it is important to note the effect that failed drives can have. If any two disks fail in RAID 5, the database will stop (unless you have a hot standby disk that has been synced within the chain, but in any case, you cannot lose more than one disk from the whole working set). RAID 0+1 will stop the database only if a disk fails in both sides of a mirrored set at the same time, and the odds of that occurring based on random factors are about 5.3 percent. RAID 5 imposes a significant penalty for losing even one disk. When one drive is lost on a RAID 5 system, the read performance of the system immediately decreases. Every read or write request to the failed drive initiates a verification process against all other drives in the parity group. This performance degradation will exist until the drive is replaced and completely rebuilt by the system. During the rebuild process, the system will be more sensitive to system load due to the considerably heavier I/O requirements of the failed system as described previously. This can be a critical consideration.

RAID 0+1 will see minimal loss of performance in a failed state where the hardware allows reads from both disks in a set. In this case, read performance would be slightly reduced, but only for data stored on that particular set. RAID 0+1 can actually read simultaneously from both drives in a mirrored set. This is not a simultaneous read for the same I/O operation, but for different ones. So when you have multiple read requests for the same physical disk, the I/O operations are spread over the two disks in the mirrored set.

Standardize Your Configuration

Although this is vital in an environment with many servers, standardizing configurations across servers is useful in any environment. To do this, create a list of all available settings on the server that you are not leaving at the default value, and document them. Then label this as the standard configuration for every new machine. Obviously, many of your systems may immediately deviate from the standard. To handle this, you simply document the difference for that machine between your standard configuration and that of the server in question. These documents, stored as read-only files in an easily accessible area (a Web repository is good), can help you better support a large number of servers.

For the storage subsystem, standardizing drive letters is very helpful. Table 6.2 provides an example.

Table 6.2 Standard Drive Letter Assignments

Logical drive letter

Description

C

Operating system, SQL executables

D

Usually a CD-ROM drive

E

Reserve for another system drive, if needed

F through H

Tempdb

I through P

Data files

Q

Quorum drive

R, S

SQL executables and system databases

T through V

Transaction log files

X, Y, Z

Backups, or imported data for bulk loading

\SQLAdmin

A standard admin directory to store reports such as logs, trace results, sqldiag output, and so on

For network drives, you should consider using universal naming conventions (UNC), to allow maximum portability of code relying on these network drives.

No matter how meticulous you are, there will be times when you must make exceptions. Simply document them, and move on; the goal is to make 90 percent of it standard, and then the remaining 10 percent will be easier to remember.

Changing the Subsystem

It is important to remember that once a RAID level is selected, it can only be changed by complete reconfiguration at the hardware level. This will destroy all the data on that drive, so if you decide to change your RAID, make sure that you have verified backups of the system, and that you also either have a sufficient window of agreed-upon downtime to rebuild it, or you have a standby server to take over during the implementation and a verified (tested and approved) failback plan which includes all potential requirements, such as how you would re-synchronize the data from the failover system. Before you begin to reconfigure the storage subsystem, carefully plan exactly how you want your data files laid out. Then, according to those requirements, determine which array each disk will be allocated to, how the arrays will be configured, and how the logical drives will be formatted and laid out.

Database File Placement

Here are a few tips and best practices regarding the placement of your database files:

SCSI, SAN (storage area network): The most important considerations in determining where to place your files on the server are the number of disks (spindles) available to a particular drive and the speed of the drives involved. For this reason, it is good to design the server hardware requirements and/or layout with your database needs in mind. Be careful about buying hardware before you have a firm design plan.

If you have a set of tables that is used together frequently, you should consider putting these tables on separate filegroups on separate physical drives, to balance I/O between them. In a larger, more heavily used system, this could be a significant difference.

If disk I/O is a problem, and you cannot add more spindles to the set of disks, consider putting non-clustered indexes in a separate filegroup on a separate disk, in order to split I/O between filegroups.

Group your tables based on usage, in order to generate as many simultaneous reads to different filegroups (and therefore disks) as possible. Grouping tables into filegroups based on a maintenance need for convenient backup plans will not generate as much performance as separating the tables and indexes by usage.

If you have more than enough spindles for your data performance, then you might consider breaking the data across filegroups on this set of disks, for the purpose of speeding up some administrative tasks that you might be doing, such as reindexing. The main reason to do this, however, is to speed up any restores that might be necessary.

For systems smaller than Class 1000A, you could use Auto Grow for your database files. For systems that fit in this class or above, keep in mind that when a "grow" is initiated, transactions must wait while the database grows. In a small database or lightly queried system this is not a big issue, but if you have a 100 GB OLTP database set to grow in 10 percent increments, and it runs out of space during peak times, the online users will be held up while the 10 GB is allocated. (Allocation speed per GB can be measured by copying a 1 GB file to the data drive on that server.

For these systems, the best practice is to anticipate database growth, and manually increase the database at a scheduled time. Or, choose a reasonable amount to grow by that is neither too cumbersome nor so small that it will initiate expansion too frequently.

Ideally, the best plan is to expand your database for six to twelve months' growth, or whatever seems feasible for the size of your data and the rate of growth compared to the budget and hardware available. Although it is administratively easier to let the system autogrow, there are two risks associated with this:

  • If you autogrow to the point that the disk fills beyond 80 percent, you will experience performance degradation as the disk fills beyond that.

  • If the disk fills completely or to the point that the database cannot grow by the allotted amount or percentage, the database will stop. If this occurs, the only option is to make more physical space available by adding storage:

    • If the data disk is configured as a dynamic disk, you can add more disks from your emergency spares and expand the array and the logical disk.

    • If it is configured as a basic disk, such as in the case of a failover cluster, then you can add sufficient disks to make a new RAID set, and create another file on that drive and add it into the database file group.

If the transaction log disk fills completely or to the point where it cannot autogrow in the amount of space left, the database will stop. To rectify this, you must analyze whether you have an unusual problem that can be resolved by dumping and shrinking the transaction log

If you have multiple files in your filegroup and you add another one, you will need to expand each of them in order to re-establish proportional fill.

For more information see "Using Files and Filegroups" in SQL Server Books Online.

Log File Placement

Here are a few tips and best practices regarding the placement of your log files:

Create the transaction log on a physically separate disk or RAID array. The transaction log file is written sequentially; therefore, using a separate, dedicated disk allows the disk heads to stay in place for the next write operation. For this reason, smaller systems will do well by using a single mirrored disk for the transaction log. A single mirrored physical disk should support up to approximately 1,000 transactions per second, depending on the speed of the disk itself. Systems requiring more than that should stripe the transaction log across a RAID 0+1 array for maximum performance. For highest bandwidth, the RAID controller on this array should have a (battery-backed) write-back cache to speed log writes.

Set your transaction log to autogrow, but try to size it so it should not need to grow. The optimal size should be based on your recovery model, the level of logged activity in the database, and the interval of time between backups. Set the growth increment to a reasonable percentage, but try to anticipate when the log should be resized. If the transaction log expands too frequently or takes a long time to expand, performance can be affected.

The size of the log should be based on your current recovery model and your application design. If you find that you need to shrink the log periodically, you should further investigate what is causing the log to fill up, in order to fix the problem at the root rather than simply fixing the symptom.

tempdb File Placement

Here are a few tips and best practices regarding the placement of your tempdb files:

Place the tempdb database on a fast I/O subsystem to ensure good performance. Stripe the tempdb database across multiple disks for better performance. Move the tempdb database to disks different from those used by user databases.

The tempdb database can be located with the data in most situations. Larger systems that make heavy use of tempdb should consider putting tempdb on a set of disks by itself, to achieve extra performance. It is not a good idea to co-locate any database files with the page file of the operating system.

Other File Placement

Here are some tips and best practices regarding the placement of other files:

The operating system should be created on a single mirrored disk (RAID 1). The page file performs well on the operating system drive, and may be left there for a database server. If you must move the page file, make sure that you do not locate it on any drive that contains a data file, a log file, or tempdb. This provides resilience to disk failure (but have a boot disk ready to boot off the mirror).

As a matter of common sense, if you are storing your backups on the same server with your data, make sure not to store them on the same disks with your data files or log files.

The number of files in your filegroups has nothing to do with how much I/O SQL Server issues. Use them for your convenience, not as a tuning tool.

In short, make files for your convenience; they are not a performance knob that allows you to tune performance by changing configuration.

SQL Server issues read-ahead in proportion to how big your buffer pool is. The presumption is that if you have a lot of memory, you have the I/O capacity to go with it. For instance, if SQL Server decides to keep 400 pages of outstanding reads and you only have one spindle underlying the file, you'll end up with most of the reads queued up in Windows. If you have 10 spindles underlying the file, there is probably enough outstanding I/O to keep them all busy. You could get the 10 spindles either through striping or multiple files. Of course, multiple files on the same spindle aren't a great performance idea.

Both table scans and index ranges will issue read-ahead.

In short, make files for your convenience; they are not a performance knob.

Monitoring for Capacity Management

To begin monitoring for capacity management, choose a set of measurable objects that relate directly to the hardware performance of your system. This list provides some basics that need to be included:

  • Disk drive capacities

  • Size of databases, size of free space inside the database

  • Difference between the database size compared to the disk drive space.

  • The rate of database growth

  • Location (drives) where data files are located (*.MDF, *.NDF, *.LDF).

Essential System Monitor Counters

Collect these counters together in a way that will provide a graph over time. One way to do this is to store the counters, sampled every minute to every three minutes over the period of one hour, during pre-defined monitoring hours. If you are storing more counter data, extract this data and analyze it separately for the purpose of capacity management. (For more information see Chapter 5, Monitoring and Control.)

Note on Disk Queue Length

This counter no longer yields meaningful results in interpreting disk I/O problems with SQL Server 2000. This is because the SQL Server engines dynamically manage disk I/O, which is why the Max Async I/O setting no longer exists. A simplified explanation is that SQL Server issues an order for disk I/O, but does not wait for that process to complete before issuing another request. When the I/O request has completed its work, the database engine is notified and processes the results. SQL Server monitors the throughput on these requests, and will manage the amount of I/O that the disks can effectively handle.

Monitoring for Problems

Hardware problems can be observed in either the Event Viewer or in System Monitor.

The Windows 2000 Event Viewer contains any messages logged by your system as it encounters hardware-related errors.

In System Monitor you should look at the following counters for hardware errors:

  • Network Interface()\Packets Outbound Errors – the number of outbound packets that could not be transmitted because of errors.

  • Network Interface()\Packets Received Errors – the number of inbound packets that contained errors preventing them from being delivered to a higher-layer protocol.

  • Server\Errors System – the number of times that an internal server error was detected. Errors can reflect problems with logging on, security, memory allocation, disk operations, transport driver interface operations, communication such as receipt of unimplemented or unrecognized server message blocks, also known as SMBs), or I/O Request Packet stack size for the server. Many of these errors are also written to the System log and the Security log in Event Viewer. The server can recover from most of the errors displayed by this counter, but they are unexpected and should be reported to Microsoft Product Support Services.

Summary

This chapter has described the database administrator's (DBA's) responsibilities with regard to configuring and maintaining physical storage components of the data tier to meet requirements of capacity, throughput and performance. It has included information on the process of deciding where there are hardware bottlenecks, or where those bottlenecks are likely to develop in the future. This process includes gathering information from throughout the organization, in particular from developers and network administrators, among others. Tools used to detect problems were examined, including Event Viewer and System Monitor, and suggestions were made as to which counters would be most useful. After implementing these processes, the DBA should be able to meet storage requirements as they evolve.