Microsoft SQL Server 7.0 Storage Engine

Article
01/28/2010

Updated : August 2, 2001

Introduction

Customer Requirements

Ten years ago, it was not uncommon for database application development to take months or years. When databases were built, everything was worked out in advance—the database size, schema, number of users, and so on. This philosophy has changed dramatically in the past few years. Database applications are now developed in weeks or months, evolve during the process, and are put into production before all issues are fully understood.

This rapid deployment of mission critical applications places rigorous demands on the storage engine. It needs to be highly available, have a fast recovery system and automatic management utilities. Administrators expect to make changes quickly without bringing down their applications. Databases are growing at a much higher rate than anticipated. Overnight backup windows shrink to a few hours.

There are many requirements that Microsoft hears from customers. The storage engine team has worked hard to deliver Microsoft® SQL Server™ version 7.0 as a scalable, reliable and easy-to-use product that will provide a solid foundation for application design through the next 20 years.

Storage Engine Release Goals

There are several important goals for the storage engine for this release. The defining strategy is to further improve ease of use so that applications using database technology can be deployed widely. Ideally, the database becomes completey transparent to end users and nearly transparent for database administrators.

Ease of Use

Customers are looking for solutions to business problems. Most database solutions bring multiple layers of cost and complexity. SQL Server versions 6.0 and 6.5 defined "ease of use" as an RDBMS feature. SQL Server 7.0 takes this concept to the next level, firmly establishing the product as the easiest database for building, managing, and deploying business applications.

For the SQL Server 7.0 storage engine, ease of use includes many innovative features, such as:

Eliminating the requirement for a DBA for standard operations. This enables branch office automation, and desktop and mobile database applications.
Hiding the complexity of server configuration, DBCC, maintaining index statistics, and database backups.
Configuration options have been streamlined and simplified, and adapt automatically to the specific needs of the environment.

Scalability

Customers need to protect investments in applications, and as businesses grow, the database must grow to handle more data, transactions and users. Microsoft delivers a single database engine that scales from a mobile laptop computer running the Microsoft Windows® 95/98 operating system to terabyte symmetric multiprocessor clusters running Microsoft Windows NT® Server Enterprise Edition. All these systems maintain the security and reliability demanded by mission-critical business systems.

Features of the storage engine are the foundation for scalability, including:

The new disk format and storage subsystem provide storage that is scaleable from very small to very large databases.
Redesign of utilities to support terabyte size databases efficiently.
Large memory support reduces the need for frequent disk access.
Dynamic row level locking allows increased concurrency, especially for OLTP applications.
Unicode support enables multinational applications.

Reliability

SQL Server 7.0 eliminates many concurrency, scalability and reliability problems by replacing complex data structures and algorithms with simple structures. The new structures scale better, do not have concurrency problems, are very simple, and therefore more reliable.
SQL Server 7.0 eliminates the need to run database consistency checks (DBCC) prior to every backup. The simplifications described above, plus run time checks of critical data structures make the database more robust. In addition, we have dropped the recommendation that DBCC be run prior to every backup. DBCC is now significantly faster; when you run it you see much faster results.

Goals and Features Summary

Storage Engine Design Goals

Database applications can now be deployed widely due to intelligent, automated storage engine operations. Sophisticated yet simplified architecture improves performance, reliability and scalability.

Feature	Description and Benefits
Reliability	Reliability is improved with "fast-fail" philosophy. Problems are caught when encountered, and many consistency problems are corrected automatically. The need for consistency checks is minimized.
Scalablility	The new disk format and storage subsystem provides storage that is scaleable from very small to very large databases. Specific changes include: Simplified mapping of database objects to files eases management and enables tuning flexibility. DB objects can be mapped to specific disks for load balancing. More efficient space management including increasing page size from 2K to 8K, 64K I/O, lifting of the column limit, variable length character fields up to 8K, and the ability to add and delete columns from existing tables without a unload/reload of the data. Redesigned utilities supports terabyte size databases efficiently.
Ease of Use	DBA intervention is eliminated for standard operations. This enables branch office automation, and desktop and mobile database applications. Many complex server operations are automated.

Reliability

Reliability is improved with "fast-fail" philosophy. Problems are caught when encountered, and many consistency problems are corrected automatically. The need for consistency checks is minimized.

Scalablility

The new disk format and storage subsystem provides storage that is scaleable from very small to very large databases. Specific changes include:

Simplified mapping of database objects to files eases management and enables tuning flexibility. DB objects can be mapped to specific disks for load balancing.
More efficient space management including increasing page size from 2K to 8K, 64K I/O, lifting of the column limit, variable length character fields up to 8K, and the ability to add and delete columns from existing tables without a unload/reload of the data.
Redesigned utilities supports terabyte size databases efficiently.

Ease of Use

DBA intervention is eliminated for standard operations. This enables branch office automation, and desktop and mobile database applications. Many complex server operations are automated.

Storage Engine Features

Feature	Description and Benefits
Data Type Sizes	Size limitations of data types are increased dramatically.
Databases and Files	Databases creation is simplified, and now resides on operating system files instead of logical devices.
Dynamic Memory	Improves performance by optimizing memory allocation and usage. Simplified design minimizes contention with other resource managers.
Dynamic Row-Level Locking	Full row-level locking is implemented for both data rows and index entries. Dynamic locking chooses the optimal level of lock (row, page, multiple page, table) automatically for all database operations. This feature provides improved concurrency with no tuning. The database also supports the use of "hints" to force a particular level of locking.
Dynamic Space Management	Allows the database to grow automatically and shrink within configurable limits minimizes the need for DBA intervention. It is no longer necessary to pre-allocate space and manage data structures.
Evolution	The new architecture is designed for extensibility, with a foundation for object-relational features.
Large Memory Support	SQL Server 7.0, Enterprise Edition supports memory addressing greater than 4 GB, in conjunction with Windows NT Server 5.0, Alpha processor based systems and other techniques.
Log Manager	Simplified design improves performance for truncation, online backup, and recovery operations.
Read Ahead	Smart read-ahead logic improves performance and eliminates the need for manual tuning.
Text and Image	Text and image data are stored separately in an optimized format.
Unicode	Native Unicode, with ODBC and OLE DB Unicode APIs, improves multi-lingual support.

Storage Engine Architectural Overview

Overview

Microsoft is designing SQL Server to scale upward to handle large enterprise applications, and also to scale downward for desktop applications. The foundations of this growth are based on a completely new set of on-disk structures that can handle application designs for the next 20 years. The changes are being made because the current design is too restrictive.

The original code was inherited from Sybase and designed for 8 megabyte (MB) UNIX systems back in 1983. Microsoft programmers enhance the code as much as possible, but SQL Server needed a better foundation for the future. These new formats improve manageability, scalability, and allow the server to scale from low-end to high-end systems, improving performance and manageability.

Benefits

There are many benefits of the new on-disk layout, including:

Improved scalability and integration with Windows NT Server
Better performance with larger I/Os
Stable record locators allow more indexes
More indexes speed decision support queries
Simpler data structures provide better quality
Greater extensibility, so that subsequent releases will have a cleaner development process so new features are faster to implement

Storage Engine Subsystems

Most relational database products are divided into relational engine and storage engine components. This document focuses on the storage engine, which has a variety of subsystems:

Mechanisms that store data in files and find pages, files and extents
Record management for accessing the records on pages
Access methods using B-trees that are used to find records quickly using record identifiers
Concurrency control for locking, used to implement the physical lock manager and locking protocols for page or record level locking
I/O buffer management
Logging and recovery
Utilities for backup and restore, consistency checking and bulk data loading

Databases, Files and File Groups

Overview

SQL Server 7.0 is much more integrated with Windows NT Server. Databases are now stored directly in Windows NT Server files. The old UNIX legacy of database devices and segments has been replaced with a simple system that maps each database to its own set of files.

SQL Server is being stretched toward both the high and low end. Some of our competitors started in the mid-range and pushed up to the high end. They have introduced different products with different data formats, languages and programming APIs to meet the needs of desk top applications. We are not ignoring the low end. Many Microsoft Access customers are moving to SQL Server, and we need to pay attention to capabilities needed by desktop and mobile applications.

Files

SQL Server 7.0 creates a database using a set of operating system files, with a separate file used for each database. Multiple databases can no longer exist in the same file. There are several important benefits to this simplification. Files can now grow and shrink, and space management is simplified greatly.

All data and objects in the database, such as tables, stored procedures, triggers, and views, are stored only within these operating system files.

File Type	Description
Primary data file	This file is the starting point of the database. Every database has only one primary data file.
Secondary data files	These files are optional and can hold all data and objects that are not on the primary data file. Some databases may not have any secondary data files, while others have multiple secondary data files.
Log files	These files hold all of the transaction log information used to recover the database. Every database has at least one log file.

When a database is created, all the files that comprise the database are zeroed out (filled with zeros) to overwrite any existing data left on the disk by files deleted previously. Although this means that the files take longer to create, it prevents Windows NT from clearing out the files when data is written to the files for the first time (because they are already zeroed out) during normal database operations. This improves the performance of day-to-day operations.

File Groups

A database consists of one or more data files and one or more log files. The data files can be grouped together into user defined file groups. Tables and indexes can then be mapped to different file groups to control data placement on physical disks.

File groups are a convenient unit of administration, greatly improving flexibility. With a terabyte database, regardless of backup speed, it is impossible to back up the entire database within a reasonable window. SQL Server 7.0 allows you to back up a different portion of the database each night on a rotating schedule.

File groups work well for sophisticated users who know where they want to place indexes and tables. SQL Server 7.0 can work quite effectively without filegroups, so many systems will not need to specify user-defined filegroups. In this case, all files are included in the default filegroup and SQL Server 7.0 can allocate data effectively within the database.

Log files are never a part of a filegroup. Log space is managed separately from data space.

Using Files and Filegroups

Using files and filegroups improves database performance by allowing a database to be created across multiple disks, multiple disk controllers, or RAID systems. For example, if your computer has four disks, you can create a database that comprises three data files and one log file, with one file on each disk. As data is accessed, four read/write heads can access the data simultaneously in parallel, which speeds database operations.

Additionally, files and filegroups allow better data placement because a table can be created in a specific filegroup. This improves performance because all I/O for a specific table can be directed at a specific disk. For example, a heavily used table can be placed on one file in one filegroup, located on one disk, and the other less heavily accessed tables in the database can be placed on the other files in another filegroup, located on a second disk.

These are some general recommendations for files and filegroups:

Most databases work well with a single file and a single log file.
If you plan to use multiple files, use the primary file only for system tables and objects, and create at least one secondary file to store user data and objects.
To maximize performance, create files or filegroups on as many different local physical disks as are available, and place objects that compete heavily for space in different filegroups.
Use filegroups to allow placement of objects on specific physical disks.
Place different tables used in the same join queries in different filegroups. This improves performance due to parallel disk I/O searching for joined data.
Place heavily accessed tables and the nonclustered indexes belonging to those tables on different filegroups. This will improve performance due to parallel I/O if the files are located on different physical disks.
Do not place the log file(s) on the same physical disk with the other files and filegroups.

Space Management

There are many improvements in the allocations of space and the management of space within files. The data structures that keep track of page to object relationships were redesigned. Instead of linked lists of pages, bitmaps are used because they are cleaner, simpler, and facilitate parallel scans. Now each file is more autonomous; it has more data about itself, within itself. This works well for copying or mailing database files.

SQL Server now has a much more efficient system for tracking table space. The changes enable:

Growing and shrinking files.
Better support for large I/O.
Row space management within a table.
Less expensive extent allocations.

In earlier versions of SQL Server, the allocations could cause blocking when large amounts of data were added. The new allocation algorithms and data structures are simple and efficient and will not cause blocking. SQL Server 7.0 tracks the free space on a page. As rows are deleted from a table that does not have a clustered index, we can now reuse that space for new inserts. This gives a much more efficient use of disk space and speeds table scans by packing data more densely.

SQL Server is very effective at allocating pages to objects quickly and reusing space freed up by deleted rows. These operations are internal to the system and use data structures not visible to users, yet these processes and structures are referenced occasionally in SQL Server messages. This topic is an overview of the space allocation algorithms and data structures to give users and administrators the knowledge needed to understand references to the terms in messages generated by SQL Server.

SQL Server version 7.0 introduces some significant changes in the internal data structures used to manage the allocation and reuse of pages. These data structures are not visible to end users, so these changes do not affect users other than by improving speed.

File Shrink

Laptop and desktop systems may be restricted on disk space, so database files are allowed to shrink automatically if the option is enabled. The server checks the space usage in each database periodically. If a database is found with a lot of empty space, the size of the files in the database will be reduced. Both data and log files can be shrunk. This activity occurs in the background and does not affect any user activity within the database. You can also use SQL Server Enterprise Manager or DBCC to shrink files individually or as a group.

Shrink works by moving rows from pages at the end of the file to pages allocated earlier in the file. In an index, nodes are moved from the end of the file to pages at the beginning of the file. In both cases, pages are freed up at the end of files and then returned to the file system. Databases can be shrunk only to the point that no free space is remaining, there is no data compression.

File Grow

Automated file growth greatly reduces the need for database management and eliminates many problems that occur when logs or databases run out of space. When creating a database, an initial size for the file must be given. SQL Server creates the data files based on the size provided by the database creator, and data is added to the database these files fill. By default, data files are allowed to grow as much as necessary until disk space is exhausted. Alternatively, data files can be configured to grow automatically when they fill with data, but only to a predefined maximum size. This prevents disk drives from running out of space.

When creating a database, data files should be as large as possible, based on the maximum amount of data expected in the database. Permit the data files to grow automatically, but place a limit on the growth. If the initial data file size is exceeded and the file starts to grow automatically, reevaluate the expected maximum database size and plan accordingly by adding more disk space (if necessary), and creating and adding more files and/or filegroups to the database.

Databases can be prevented from growing beyond their initial size, and if the files fill, no more data can be added unless more data files are added. Allowing files to grow automatically can cause fragmentation of those files if a large number of files share the same disk. Therefore, it is recommended that files or filegroups be created on as many different local physical disks as available. Place objects that compete heavily for space in different filegroups.

Physical Database Architecture

SQL Server version 7.0 introduces significant improvements in the way data is stored physically. These changes are largely transparent to general users, but do impact the setup and administration of SQL Server databases.

Pages and Extents

The fundamental unit of data storage in SQL Server is the page. In SQL Server version 7.0, the size of pages is 8K, increased from 2K. The start of each page is a 96 byte header used to store system information such as the type of page, the amount of free space on the page, and the object ID of the object owning the page.

There are seven types of pages in the data files of a SQL Server 7.0 database. Here are the types:

Page Type	Contains
1. Data	Data rows with all data except text, ntext, and image
2. Index	Index entries
3. Log	Log records recording data changes for use in recovery
4. Text/Image	Text, ntext, and image data
5. Global Allocation Map	Information about allocated extents
6. Page Free Space	Information about free space available on pages
7. Index Allocation Map	Information about extents used by a table or index

Data pages contain all the data in data rows except text, ntext, and image data, which are stored in separate pages. Data rows are placed serially on the page starting immediately after the header. A row offset table starts at the end of the page. The row offset table contains one entry for each row on the page, each entry records how far from the start of the page the first byte of the row is. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

In SQL Server 7.0, rows cannot span pages and the maximum amount of data contained in a single row is 8,060 bytes, not including text, ntext and image data.

Extents are the basic unit in which space is allocated to tables and indexes. An extent is 8 contiguous pages, or 64K. To make its space allocation efficient, SQL Server 7.0 does not allocate entire extents to tables with small amounts of data. SQL Server 7.0 has two types of extents. Uniform extents are owned by a single object; all eight pages in the extent can be used only by the owning object.

SQL Server 7.0 introduces a new mixed extent concept that works well for small applications. In SQL Server 7.0 and all earlier releases, space is added to tables one extent at a time. This could lead to very large overhead for tiny tables now that pages are 8K in size. A mixed extent allows allocation of a single page to a small table or index. Only when the table or index has allocated more than 8 pages will it begin to allocate uniform extents. Mixed extents are shared by up to eight objects. A new table or index is allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it is switched to uniform extents.

Physical Database Files and Filegroups

Microsoft SQL Server 7.0 maps a database over a set of operating system files. Data and log information are never mixed on the same file, and individual files are used by one database only. This eliminates the need for managing logical devices, offers flexibility in the placement of database files and enables new backup and restore capabilities.

SQL Server 7.0 databases have three types of files:

The primary data file is the starting point of the database and points to the rest of the files in the database. Every database has one primary data file. The recommended file extension for primary data files is .mdf.
Secondary data files comprise all of the data files other than the primary data file. Some databases may not have any secondary data files, while others have multiple secondary data files. The recommended file extension for secondary data files is .ndf.
Log files hold all of the log information used to recover the database. There must be at least one log file for each database, although there can be more than one. The recommended file extension for log files is .ldf.

SQL Server 7.0 files can grow automatically from their originally specified size. When you define a file you can specify a growth increment. Each time the file fills it increases its size by the growth increment. If there are multiple files in a filegroup, they do not grow automatically until all the files are full. Growth then occurs in a round-robin algorithm.

Each file can also have a maximum size specified. If a maximum size is not specified the file can continue to grow until it has used all available space on the disk. This feature is useful when SQL Server is used as a database embedded in an application where the user does not have ready access to a system administrator. The user can let the files grow automatically as needed to lessen the administrative burden of monitoring the amount of free space in the database and allocating additional space manually.

Torn Page Detection

Torn page detection helps insure database consistency. In SQL Server 7.0, pages are 8K while Windows NT does I/O in 512 byte segments. This discrepancy makes it possible for a page to be written partially. This could happen if there is a power failure or other problem between the time when the first 512 byte segment is written and the completion of the 8K of I/O.

If the first 512 byte segment is written, it will appear that the page has been updated when in fact it may not have been. (The timestamp for the page is in the header in the first 96 bytes of the page.) There are several ways to deal with this. One way is to use battery backed, cached I/O devices that guarantee all or nothing I/O. If you have one of these systems, Torn Page detection is unnecessary.

SQL Server can detect incomplete I/O by creating a mask of bits, one bit from each segment in the page. Every time a page is written, the bit is flipped from its previous state (as it was on disk) and the actual state is saved in the page header. If a page is read and a bit is in the wrong state, it indicates that an I/O did not complete and there is a torn page. This mechanism is less expensive than computing a checksum.

You can turn torn page detection on and off because the page header is marked when the bit is flipped. If torn page detection is turned on and then back off, the state in the pages that have been bit flipped is observed and corrected the next time they are read.

Locking Enhancements

Row Level Locking

SQL Server 6.5 introduced row locking on inserts. SQL Server 7.0 now supports full row-level locking for both data rows and index entries. Transactions can update individual records without blocking the pages. Many OLTP applications can experience increased concurrency, especially when applications append rows to tables and indexes.

The lock manager dynamically adjusts the resources it uses for larger databases, eliminating the need to adjust the locks server configuration option manually. It chooses between page locking (preferable for table scans) and row-level locking (preferable for inserting, updating, and deleting data) automatically.

Dynamic Locking

SQL Server 7.0 has a superior locking mechanism that is unique in the database industry. At run time, the storage engine dynamically cooperates with the query processor to choose the lowest cost locking strategy, based on the characteristics of the schema and query.

Dynamic locking has the following advantages:

Simplified database administration, because database administrators no longer have to be concerned with adjusting lock escalation thresholds.
Increased performance, because SQL Server minimizes system overhead by using locks appropriate to the task.
Application developers can concentrate on development, because SQL Server adjusts locking automatically.

Multigranular locking allows different types of resources to be locked by a transaction. To minimize the cost of locking, SQL Server locks resources automatically at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, are expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions, but has a lower overhead because fewer locks are being maintained.

SQL Server can lock these resources (increasing granularity).

Resource	Description
RID	Row identifier. Used to lock a single row within a table individually.
Key	Key; a row lock within an index. Used to protect key ranges in serializable transactions.
Page	8K data page or index page.
Extent	Contiguous group of eight data pages or index pages.
Table	Entire table, including all data and indexes.
DB	Database.

Lock Modes

SQL Server locks resources using different lock modes that determine how the resources can be accessed by concurrent transactions.

SQL Server uses several resource lock modes.

Lock mode	Description
Shared	Used for operations that do not change or update data (read-only operations), such as a SELECT statement.
Update	Used on resources that can be updated. Prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and then potentially updating resources later.
Exclusive	Used for data-modification operations, such as UPDATE, INSERT, or DELETE. Ensures that multiple updates cannot be made to the same resource at the same time.
Intent	Used to establish a lock hierarchy.
Schema	Used when an operation dependent on the schema of a table is executing. There are two types of schema locks: schema stability and schema modification.

Shared Locks

Shared locks allow concurrent transactions to read a resource. No other transactions can modify the data while shared locks exist on the resource. Shared locks on a resource are released as soon as the data has been read, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared locks for the duration of the transaction.

Update Locks

Update locks prevent a common form of deadlock. If two transactions acquire shared-mode locks on a resource and then attempt to update data concurrently, an attempted conversion to exclusive locks occurs. Both transactions are waiting for the other to release its shared-mode lock, and a deadlock occurs. Update locks are used to eliminate this problem and only one transaction can obtain an update lock to a resource at a time.

Exclusive Locks

Exclusive locks prevent access to a resource by concurrent transactions. No other transactions can read or modify data locked with an exclusive lock.

Intent Locks

An intent lock indicates that SQL Server wants to acquire a shared or exclusive lock on some of the resources lower in the hierarchy. Intent locks improve performance because SQL Server only needs to examine intent locks at the table level to determine if a transaction can acquire a lock on that table safely. This removes the need to examine every row or page lock on the table to determine if a transaction can lock the entire table.

Schema Locks

Schema locks are taken when a table data definition language operation (DDL) is being performed, such as adding a column or dropping a table. Another type of schema lock is taken while compiling queries, which do not block out any transactional locks, including exclusive locks. While a query is being compiled other transactions can run, however, DDL operations cannot be performed on the table.

Base Table and Index Architecture

Overview

Fundamental changes were made in base table organization. This new organization allows the query processor to make use of more secondary indexes, greatly improving performance for decision support applications. The query optimizer has a wide set of execution strategies and many of the optimization limitations of earlier versions of SQL Server have been removed. In particular, SQL Server 7.0 is less sensitive to index-selection issues, resulting in less tuning work.

Table Organization

The data for each table is now stored in a collection of 8K data pages. Each data page has a 96 byte header containing system information such as the ID of the table that owns the page and pointers to the next and previous pages for pages linked in a list. A row offset table is at the end of the page. Data rows fill the rest of the page.

SQL Server 7.0 tables use one of two methods to organize their data pages:

Clustered tables are tables that have a clustered index. The data rows are stored in order based on the clustered index key. The data pages are linked in a doubly-linked list. The index is implemented as a b-tree index structure that supports fast retrieval of the rows based on their clustered index key values.
Heaps are tables that have no clustered index. The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

Table Indexes

A SQL Server index is a structure associated with a table that speeds retrieval of the rows in the table. An index contains keys built from one or more columns in the table. These keys are stored in a structure that allows SQL Server to find the row or rows associated with the key values quickly and efficiently. If a table is created with no indexes, the data rows are not stored in any particular order. This structure is called a heap. The two types of SQL Server indexes are clustered and nonclustered indexes

Clustered Indexes

A clustered index is one in which the order of the values in the index is the same as the order of the data stored in the table. (To some degree, the storage of the data is the index.) A clustered index is analogous to the indexing method used in a telephone directory. In a telephone directory, the data is arranged by the last name of each entry, and the location of each entry is determined by the last name.

Because the clustered index dictates the storage of the table's data, a table can contain only one clustered index; however, the index can be created on multiple columns. For example, a telephone directory is organized by the last name but within the last name organization, it is organized by first name when there is more than one entry with the same last name.

The clustered index contains a hierarchical tree with the range of values stored in a given area of the index. When searching for data based on a clustered index value, SQL Server isolates the page with the specified value quickly and then searches the page for the record or records with the specified value. The lowest level, or leaf node, of the index tree is the page that contains the data.

Nonclustered Indexes

A nonclustered index is analogous to an index in a textbook. The data is stored in one place; the index is stored in another, with pointers to the storage location of the index items in the data. The lowest level, or leaf node, of a nonclustered index is the storage location (its page number and offset in the page) of the index entry. Therefore, a nonclustered index, in comparison with a clustered index, has an extra level between the index structure and the data itself.

When SQL Server searches for data based on a nonclustered index, it searches the index for the specified value to obtain the location of the rows of data, and then retrieves the data directly from their storage locations. This makes nonclustered indexes the optimal choice for exact match queries.

Some books contain multiple indexes. For example, a gardening book can contain one index for the common names of plants and another index for the scientific names because these are the two most common ways in which the readers find information. The same is true for nonclustered indexes. You can define a nonclustered index for each of the columns commonly used to find the data in the table.

Because nonclustered indexes store clustered index keys as their pointers to data rows, it is important to keep clustered index keys as small as possible. Avoid choosing large columns as the keys to clustered indexes if a table also has nonclustered indexes.

SQL Server supports up to 249 nonclustered indexes on each table. The non-clustered indexes have a b-tree index structure similar to the one in clustered indexes. The difference is that nonclustered indexes have no effect on the order of the data rows. The collection of data pages for a heap is not affected if nonclustered indexes are defined for the table.

Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key has high selectivity, a key value that is found in 1,000 rows has poor selectivity. The selectivity and distribution statistics are used by SQL Server to optimize its navigation through tables when processing Transact-SQL statements. The statistics for each index are not limited to a single page but are stored as a long string of bits across multiple pages in the same way image data is stored.

Data Type Changes

Unicode Data

SQL Server supports Unicode data types, which makes it easier to store data in multiple languages within one database by eliminating the problem of converting characters and installing multiple code pages. Unicode stores character data using 2 bytes for each character rather than 1 byte. There are 65,536 different bit patterns in 2 bytes, so Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have large numbers of characters. Programming languages also support Unicode data types.

The fact that Unicode data needs twice as much storage space is offset by the elimination of the need to convert extended characters between code pages. The new data types that support Unicode are ntext, nchar, and nvarchar. They are the same as text, char, and varchar, except for the wider range of characters supported and the increased storage space used.

Traditional, non-Unicode data types in SQL Server allow the use of characters which are defined by a particular character set. A character set is chosen during SQL Server Setup and is immutable throughout the lifetime of the installation. Using Unicode data types, a column can store any character that is defined by the Unicode Standard, which includes all of the characters that are defined in the various character sets. Using Unicode data types takes twice as much storage space as non-Unicode data types.

Improved Data Storage

Data storage flexibility is improved greatly with the expansion of the maximum limits for char, varchar, binary, and varbinary data types to 8,000 bytes, increased from 255 bytes. It is no longer necessary to use text and image data types for data storage for anything but very large data values. The Transact-SQL string functions also support these very long char and varchar values, while the SUBSTRING function can be used to process text and image columns. The handling of Nulls and empty strings has been improved. A new uniqueidentifier data type is provided for storing a globally unique identifier (GUID).

Text and Image Enhancements

SQL Server has a solid foundation for building object-relational features. One immediate enhancement is for text and image storage, which uses a new, efficient data structure.

The SQL Server 7.0 ntext, text, and image data types are capable of holding extremely large amounts of data (up to 2 GB) in a single value. A single data value is typically larger than can be retrieved by an application in one step, some values may be larger than the virtual memory available on the client. This means that special steps are usually needed to retrieve these values. If the values for these data types is no longer than a Unicode, character, or binary string (4,000 characters, 8,000 characters, 8,000 bytes respectively), the value can be referenced in SELECT, UPDATE, and INSERT statements much the same way as the smaller data types.

Text, ntext, and image values are not stored as part of the data row but in a separate collection of pages of their own. For each of these values, all that is stored in the data row is a 16-byte pointer. For each row, this pointer points to the location of data. A row containing multiple text, ntext, or image columns has one pointer for each column.

The data is stored in a collection of 8K pages that are not necessarily located next to each other. In SQL Server 7.0, the pages are organized logically in a b-tree structure, while in earlier versions of SQL Server they were linked together in a page chain. The advantage of the method used by SQL Server 7.0 is that operations starting in the middle of the string are more efficient. SQL Server 7.0 can navigate the b-tree quickly, while earlier versions of SQL Server had to scan through the page chain. The structure of the b-tree differs slightly depending on whether there is less than or more than 32K of data.

Transaction Log Management

Overview

Transaction logs assist in recovering database integrity in the event of system failure. Log records for a single database are maintained on one or more operating system files called log files. This is a serial recording of all modifications that have occurred in the database, and which transaction performed each modification.

The log grows continuously as logged operations occur in the database. For some large operations it records only the fact that the operation took place. The log records the commit or rollback of each transaction. This allows SQL Server either to restore or to back out of each transaction:

Rolling a transaction back occurs when backing out of an incomplete transaction. SQL Server restores the database to the condition prior to the time the transaction began by reversing the sequence of alterations.
Rolling a transaction forward occurs when restoring a transaction log. Modification to the database are applied in the same sequence in which they originally occurred. At the end of this process the database is in the same state it was in at the time the log was backed up.

The log manager in SQL Server 7.0 is improved over version 6.5: The new log manager:

Does not compete with data for buffer cache pages.
Can be spread over one or more physical files.
Grows and shrinks automatically.
Allows for quick and non-intrusive truncation.
Enables larger I/Os.

Log Manager Architecture

A SQL Server 7.0 log consists of one or more physical files that only contain log entries. Previously, the log was a system table that used ordinary database pages. These log pages were allocated and deallocated just like pages of other tables, and they competed with data pages for space in the memory cache.

Each log file is divided logically into smaller segments called virtual log files. Virtual log files are the unit of truncation for the transaction log. When a virtual log file no longer contains log records for active transactions it can be truncated and the space becomes available to log new transactions.

SQL Server tries to avoid having lots of very small virtual log files. The number of virtual log files grows more slowly than their size. If a log file grows in small increments it will tend to have many small virtual log files. If the log file grows in larger increments SQL Server will create a smaller number of larger virtual log files.

As records are written to the log, the end of the log grows from one virtual log file to the next. If there is more than one physical log file for a database the end of the log grows through each virtual file in each physical file before circling back to the first virtual log file in the first physical file.

The smallest size for a virtual log file is 256K. The minimum size for a transaction log is 512K, which gives you two virtual log files of 256K each. The number and size of the virtual log files in a transaction log both increase as the size of the log file increases. A small log file might have a small number of small virtual log files. A very large log file will have larger virtual log files.

Logs are now allowed to grow and shrink automatically. If there is no reusable space available, the file will be extended by adding another logical log file chunk. If more space is need, another logical log file gets added. Internally the log manager divides physical log files into logical log files. The logical files are consider either active or reusable. A logical log file can be backed up if it does not contain any portion of the active log. A file can be reused if it has been backed up.

Memory, Buffering and Read-Ahead

Automatic Memory Management

SQL Server changes its memory requirements dynamically based on available system resources (or it can be fixed manually if preferred). This mechanism allows the best use to be made of system resources.

There are several competing subsystems in SQL Server that must share the available memory. The log and recovery systems need memory to read and undo pages, and the query processor uses it to performs hashing and sorting. Other subsystems that use memory are the procedure cache, buffer pool, a lock manager and data structures. In SQL Server 7.0 these systems allocate the memory they need dynamically and can give memory back when it is no longer needed.

SQL Server starts with a default memory value. As more applications are started, the system is queried periodically to determine the amount of free physical memory available. SQL Server grows or shrinks the buffer cache to keep free physical memory at around 5 MB plus to prevent Windows NT from paging. If there is less than 5 MB free, SQL Server releases memory to Windows NT that usually goes on the free list. If there is more than 5 MB of physical memory, SQL Server recommits memory to the buffer cache. SQL Server adds memory to the buffer cache when its work load requires more memory; a server at rest does not grow its buffer cache.

Buffer Management and I/O

Buffer management I/O has changed, it now uses a clocking mechanism instead of a least recently used list. The clocking mechanism improves performance because there is less synchronization required. This improves scaling on large SMP systems, but there is not much effect on small systems. Query performance is improved because it makes use of both parallel I/O and large I/O on the files that make up the database object.

Read-Ahead

The read-ahead mechanism is improved and simplified significantly. By carefully putting hints and directions in from both the query processor and the storage engine, more than 1,000 lines of code for a separate mechanism were eliminated, as well as configuration parameters. It eliminates separate read-ahead threads and minimizes context switches. The new allocation data structures allow us to read ahead without following a page chain. The query processor can help with read ahead by using the mid tier of an index to predict the next page of an index scan (includes clustered table scans).

The parallel read-ahead mechanism allows the disk subsystem to work at maximum speed. The inputs and outputs are decoupled, and the mechanism pulls in data pages and passes them off to the next available CPU (one multiple CPU systems). I/Os are simultaneously issued against multiple files.

Innovation and Evolution

SQL Server 7.0 is a defining release for Microsoft's database products, building on the solid foundation established by SQL Server 6.5. More than one dozen new patents in database technology were submitted for SQL Server 7.0. Important areas of leadership and innovation in Microsoft SQL Server 7.0 include:

First database to scale from the laptop to the enterprise using the same code base, offering 100 percent code compatibility.
First database to support auto-configuration and self-tuning.
First database with an integrated OLAP server.
First database with integrated Data Transformation Services.
The Data Warehousing Framework is the first comprehensive approach to the solving the metadata problem.
First database to provide multiserver management for large numbers of servers.
Widest array of replication options of any database.
Best integration with Windows NT Server, Microsoft Office and BackOffice.
Universal Data Access, Microsoft's strategy for enabling high-performance access to a variety of information sources.

Microsoft listens closely to customer requests for new and enhanced features. The new storage engine architecture was designed to support customer applications well into the next century. Our design teams have worked hard to implement clean subsystems so components can be improved and replaced easily over time. The storage engine is designed for both general extensibility and for specific features that are already planned for future releases.

Microsoft SQL Server 7.0 Storage Engine

On This Page

Introduction

Customer Requirements

Storage Engine Release Goals

Goals and Features Summary

Storage Engine Design Goals

Storage Engine Features

Storage Engine Architectural Overview

Overview

Benefits

Storage Engine Subsystems

Databases, Files and File Groups

Overview

Files

File Groups

Using Files and Filegroups

Space Management

File Shrink

File Grow

Physical Database Architecture

Pages and Extents

Physical Database Files and Filegroups

Torn Page Detection

Locking Enhancements

Row Level Locking

Dynamic Locking

Lock Modes

Base Table and Index Architecture

Overview

Table Organization

Table Indexes

Clustered Indexes

Nonclustered Indexes

Distribution Statistics

Data Type Changes

Unicode Data

Improved Data Storage

Text and Image Enhancements

Transaction Log Management

Overview

Log Manager Architecture

Memory, Buffering and Read-Ahead

Automatic Memory Management

Buffer Management and I/O

Read-Ahead

Innovation and Evolution

Additional resources