SQL Server 索引体系结构和设计指南SQL Server Index Architecture and Design Guide

适用对象: yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

索引设计不佳和缺少索引是提高数据库和应用程序性能的主要障碍。Poorly designed indexes and a lack of indexes are primary sources of database application bottlenecks. 设计高效的索引对于获得良好的数据库和应用程序性能极为重要。Designing efficient indexes is paramount to achieving good database and application performance. 在索引体系结构上的本 SQL ServerSQL Server 索引设计指南包含的信息和最佳做法可帮助设计满足应用程序需要的高效索引。This SQL ServerSQL Server index design guide contains information on index architecture, and best practices to help you design effective indexes to meet the needs of your application.

本指南假定读者对 SQL ServerSQL Server中提供的索引类型有一般了解。This guide assumes the reader has a general understanding of the index types available in SQL ServerSQL Server. 有关索引类型的一般说明,请参阅 索引类型For a general description of index types, see Index Types.

本指南涉及以下类型的索引:This guide covers the following types of indexes:

  • 聚集Clustered
  • 非聚集Nonclustered
  • 唯一Unique
  • 筛选Filtered
  • columnstoreColumnstore
  • 哈希Hash
  • 内存优化非聚集索引Memory-Optimized Nonclustered

有关 XML 索引的信息,请参阅 XML 索引概述For information about XML indexes, see XML Indexes Overview.

有关空间索引的信息,请参阅空间索引概述For information about Spatial indexes, see Spatial Indexes Overview.

有关全文索引的信息,请参阅填充全文索引For information about Full-text indexes, see Populate Full-Text Indexes.

索引设计基础知识Index Design Basics

索引是与表或视图关联的磁盘上或内存中结构,可以加快从表或视图中检索行的速度。An index is an on-disk or in-memory structure associated with a table or view that speeds retrieval of rows from the table or view. 索引包含由表或视图中的一列或多列生成的键。An index contains keys built from one or more columns in the table or view. 对于磁盘上索引,这些键存储在某个结构(B 树)中,使 SQL Server 可以快速高效地找到与键值关联的行。For on-disk indexes, these keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

索引在逻辑上以组织为包含行和列的表存储数据;在物理上以按行数据格式(称为行存储1),或以按列数据格式(称为列存储)存储数据 。An index stores data logically organized as a table with rows and columns, and physically stored in a row-wise data format called rowstore 1, or stored in a column-wise data format called columnstore.

为数据库及其工作负荷选择正确的索引是一项需要在查询速度与更新所需开销之间取得平衡的复杂任务。The selection of the right indexes for a database and its workload is a complex balancing act between query speed and update cost. 如果索引较窄,或者说索引关键字中只有很少的几列,则需要的磁盘空间和维护开销都较少。Narrow indexes, or indexes with few columns in the index key, require less disk space and maintenance overhead. 而另一方面,宽索引可覆盖更多的查询。Wide indexes, on the other hand, cover more queries. 您可能需要试验若干不同的设计,才能找到最有效的索引。You may have to experiment with several different designs before finding the most efficient index. 可以添加、修改和删除索引而不影响数据库架构或应用程序设计。Indexes can be added, modified, and dropped without affecting the database schema or application design. 因此,应试验多个不同的索引而无需犹豫。Therefore, you should not hesitate to experiment with different indexes.

SQL ServerSQL Server 中的查询优化器可在大多数情况下可靠地选择最高效的索引。The query optimizer in SQL ServerSQL Server reliably chooses the most effective index in the vast majority of cases. 总体索引设计策略应为查询优化器提供可供选择的多个索引,并依赖查询优化器做出正确的决定。Your overall index design strategy should provide a variety of indexes for the query optimizer to choose from and trust it to make the right decision. 这在多种情况下可减少分析时间并获得良好的性能。This reduces analysis time and produces good performance over a variety of situations. 若要查看查询优化器对特定查询使用的索引,请在 SQL Server Management StudioSQL Server Management Studio 中的“查询” 菜单上选择“包括实际的执行计划” 。To see which indexes the query optimizer uses for a specific query, in SQL Server Management StudioSQL Server Management Studio, on the Query menu, select Include Actual Execution Plan.

不要总是将索引的使用等同于良好的性能,或者将良好的性能等同于索引的高效使用。Do not always equate index usage with good performance, and good performance with efficient index use. 如果只要使用索引就能获得最佳性能,那查询优化器的工作就简单了。If using an index always helped produce the best performance, the job of the query optimizer would be simple. 但事实上,不正确的索引选择并不能获得最佳性能。In reality, an incorrect index choice can cause less than optimal performance. 因此,查询优化器的任务是只在索引或索引组合能提高性能时才选择它,而在索引检索有碍性能时则避免使用它。Therefore, the task of the query optimizer is to select an index, or combination of indexes, only when it will improve performance, and to avoid indexed retrieval when it will hinder performance.

1 行存储是存储关系表数据的传统方法。1 Rowstore has been the traditional way to store relational table data. SQL ServerSQL Server 中,行存储是指基础数据存储格式为堆、B 树(聚集索引)或内存优化表的表。In SQL ServerSQL Server, rowstore refers to table where the underlying data storage format is a heap, a B-tree (clustered index), or a memory-optimized table.

索引设计任务Index Design Tasks

建议的索引设计策略包括以下任务:The follow tasks make up our recommended strategy for designing indexes:

  1. 了解数据库本身的特征。Understand the characteristics of the database itself.

    • 例如,数据库是否是频繁修改数据的联机事务处理 (OLTP) 数据库,必须可承受高吞吐量。For example, is it an online transaction processing (OLTP) database with frequent data modifications that must sustain a high throughput. SQL Server 2014 (12.x)SQL Server 2014 (12.x) 开始,内存优化表和索引提供无闩锁设计,尤其适用于此应用场景。Starting with SQL Server 2014 (12.x)SQL Server 2014 (12.x), memory-optimized tables and indexes are especially appropriate for this scenario, by providing a latch-free design. 有关详细信息,请参阅本指南中的内存优化表的索引内存优化表的非聚集索引设计指南内存优化表的哈希索引设计指南For more information, see Indexes for Memory-Optimized Tables, or Nonclustered Index for Memory-Optimized Tables Design Guidelines and Hash Index for Memory-Optimized Tables Design Guidelines in this guide.
    • 或者,数据库是否是一种决策支持系统 (DSS) 或数据仓库 (OLAP) 数据库,必须快速处理超大型数据集。Or an example of a Decision Support System (DSS) or data warehousing (OLAP) database that must process very large data sets quickly. SQL Server 2012 (11.x)SQL Server 2012 (11.x) 开始,列存储索引尤其适用于典型的数据仓库数据集。Starting with SQL Server 2012 (11.x)SQL Server 2012 (11.x), columnstore indexes are especially appropriate for typical data warehousing data sets. 列存储索引可以通过为常见数据仓库查询(如筛选、聚合、分组和星型联接查询)提供更快的性能,以转变用户的数据仓库体验。Columnstore indexes can transform the data warehousing experience for users by enabling faster performance for common data warehousing queries such as filtering, aggregating, grouping, and star-join queries. 有关详细信息,请参阅本指南中的列存储索引概述列存储索引设计指南For more information, see Columnstore Indexes overview, or Columnstore Index Design Guidelines in this guide.
  2. 了解最常用的查询的特征。Understand the characteristics of the most frequently used queries. 例如,了解到最常用的查询联接两个或多个表将有助于决定要使用的最佳索引类型。For example, knowing that a frequently used query joins two or more tables will help you determine the best type of indexes to use.

  3. 了解查询中使用的列的特征。Understand the characteristics of the columns used in the queries. 例如,某个索引对于含有整数数据类型同时还是唯一的或非空的列是理想索引。For example, an index is ideal for columns that have an integer data type and are also unique or nonnull columns. 对于具有定义完善的数据子集的列,您可以在 SQL Server 2008SQL Server 2008 和更高版本中使用筛选索引。For columns that have well-defined subsets of data, you can use a filtered index in SQL Server 2008SQL Server 2008 and higher versions. 有关详细信息,请参阅本指南中的 筛选索引设计指南For more information, see Filtered Index Design Guidelines in this guide.

  4. 确定哪些索引选项可在创建或维护索引时提高性能。Determine which index options might enhance performance when the index is created or maintained. 例如,对某个现有大型表创建聚集索引将会受益于 ONLINE 索引选项。For example, creating a clustered index on an existing large table would benefit from the ONLINE index option. ONLINE 选项允许在创建索引或重新生成索引时继续对基础数据执行并发活动。The ONLINE option allows for concurrent activity on the underlying data to continue while the index is being created or rebuilt. 有关详细信息,请参阅 设置索引选项For more information, see Set Index Options.

  5. 确定索引的最佳存储位置。Determine the optimal storage location for the index. 非聚集索引可以与基础表存储在同一个文件组中,也可以存储在不同的文件组中。A nonclustered index can be stored in the same filegroup as the underlying table, or on a different filegroup. 索引的存储位置可通过提高磁盘 I/O 性能来提高查询性能。The storage location of indexes can improve query performance by increasing disk I/O performance. 例如,将非聚集索引存储在表文件组所在磁盘以外的某个磁盘上的一个文件组中可以提高性能,因为可以同时读取多个磁盘。For example, storing a nonclustered index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time.
    或者,聚集索引和非聚集索引也可以使用跨越多个文件组的分区方案。Alternatively, clustered and nonclustered indexes can use a partition scheme across multiple filegroups. 在维护整个集合的完整性时,使用分区可以快速而有效地访问或管理数据子集,从而使大型表或索引更易于管理。Partitioning makes large tables or indexes more manageable by letting you access or manage subsets of data quickly and efficiently, while maintaining the integrity of the overall collection. 有关详细信息,请参阅 Partitioned Tables and IndexesFor more information, see Partitioned Tables and Indexes. 在考虑分区时,应确定是否应对齐索引,即,是按实质上与表相同的方式进行分区,还是单独分区。When you consider partitioning, determine whether the index should be aligned, that is, partitioned in essentially the same manner as the table, or partitioned independently.

常规索引设计指南General Index Design Guidelines

经验丰富的数据库管理员能够设计出好的索引集,但是,即使对于不特别复杂的数据库和工作负荷来说,这项任务也十分复杂、耗时和易于出错。Experienced database administrators can design a good set of indexes, but this task is very complex, time-consuming, and error-prone even for moderately complex databases and workloads. 了解数据库、查询和数据列的特征可以帮助您设计出最佳索引。Understanding the characteristics of your database, queries, and data columns can help you design optimal indexes.

数据库注意事项Database Considerations

设计索引时,应考虑以下数据库准则:When you design an index, consider the following database guidelines:

  • 对表编制大量索引会影响 INSERTUPDATEDELETEMERGE 语句的性能,因为当表中的数据更改时,所有索引都须适当调整。Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes. 例如,如果在多个索引中使用了某个列,并且执行了修改该列数据的 UPDATE 语句,则必须更新包含该列的每个索引以及基础的基表(堆或聚集索引)中的该列。For example, if a column is used in several indexes and you execute an UPDATE statement that modifies that column's data, each index that contains that column must be updated as well as the column in the underlying base table (heap or clustered index).

    • 避免对经常更新的表进行过多的索引,并且索引应保持较窄,就是说,列要尽可能少。Avoid over-indexing heavily updated tables and keep indexes narrow, that is, with as few columns as possible.

    • 使用多个索引可以提高更新少而数据量大的查询的性能。Use many indexes to improve query performance on tables with low update requirements, but large volumes of data. 大量索引可以提高不修改数据的查询(例如 SELECT 语句)的性能,因为查询优化器有更多的索引可供选择,从而可以确定最快的访问方法。Large numbers of indexes can help the performance of queries that do not modify data, such as SELECT statements, because the query optimizer has more indexes to choose from to determine the fastest access method.

  • 对小表进行索引可能不会产生优化效果,因为查询优化器在遍历用于搜索数据的索引时,花费的时间可能比执行简单的表扫描还长。Indexing small tables may not be optimal because it can take the query optimizer longer to traverse the index searching for data than to perform a simple table scan. 因此,小表的索引可能从来不用,但仍必须在表中的数据更改时进行维护。Therefore, indexes on small tables might never be used, but must still be maintained as data in the table changes.

  • 视图包含聚合、表联接或聚合和联接的组合时,视图的索引可以显著地提升性能。Indexes on views can provide significant performance gains when the view contains aggregations, table joins, or a combination of aggregations and joins. 若要使查询优化器使用视图,并不一定非要在查询中显式引用该视图。The view does not have to be explicitly referenced in the query for the query optimizer to use it.

  • 使用数据库引擎优化顾问来分析数据库并生成索引建议。Use the Database Engine Tuning Advisor to analyze your database and make index recommendations. 有关详细信息,请参阅 Database Engine Tuning AdvisorFor more information, see Database Engine Tuning Advisor.

查询注意事项Query Considerations

设计索引时,应考虑以下查询准则:When you design an index, consider the following query guidelines:

  • 为经常用于查询中的谓词和联接条件的列创建非聚集索引。Create nonclustered indexes on the columns that are frequently used in predicates and join conditions in queries. 这些是你的 SARGable1 列。These are your SARGable1 columns. 但是,应避免添加不必要的列。However, you should avoid adding unnecessary columns. 添加太多索引列可能对磁盘空间和索引维护性能产生负面影响。Adding too many index columns can adversely affect disk space and index maintenance performance.

  • 涵盖索引可以提高查询性能,因为符合查询要求的全部数据都存在于索引本身中。Covering indexes can improve query performance because all the data needed to meet the requirements of the query exists within the index itself. 也就是说,只需要索引页,而不需要表的数据页或聚集索引来检索所需数据,因此,减少了总体磁盘 I/O。That is, only the index pages, and not the data pages of the table or clustered index, are required to retrieve the requested data; therefore, reducing overall disk I/O. 例如,对某一表(其中对列 abc创建了组合索引)的列 ab 的查询,仅仅从该索引本身就可以检索指定数据。For example, a query of columns a and b on a table that has a composite index created on columns a, b, and c can retrieve the specified data from the index alone.

    重要

    覆盖索引是针对非聚集索引的指定,它直接解析一个或几个类似的查询结果,而不访问其基表,并且不会引发查找。Covering indexes are the designation for a nonclustered index that resolves one or several similar query results directly with no access to its base table, and without incurring in lookups. 此类索引在叶级别上具有所有必要的非 SARGable 列。Such indexes have all the necessary non-SARGable columns in its leaf level. 这意味着,由 SELECT 子句以及所有 WHERE 和 JOIN 参数返回的列都被索引所覆盖。This means that the columns returned by either the SELECT clause and all the WHERE and JOIN arguments are covered by the index. 当与表本身的行和列相比,如果索引足够窄,那么执行查询的 I/O 可能会少得多,这意味着它是总列的一个真正子集。There is potentially much less I/O to execute the query, if the index is narrow enough when compared to the rows and columns in the table itself, meaning it is a real sub-set of the total columns. 如果选择大型表的一小部分,请考虑覆盖索引,其中的小部分是由一个固定谓词定义,比如一个稀疏列,例如它只包含几个非 NULL 值。Consider covering indexes when selecting a small portion of a large table, and where that small portion is defined by a fixed predicate, such as sparse columns that contain only a few non-NULL values, for example.

  • 将插入或修改尽可能多的行的查询写入单个语句内,而不要使用多个查询更新相同的行。Write queries that insert or modify as many rows as possible in a single statement, instead of using multiple queries to update the same rows. 仅使用一个语句,就可以利用优化的索引维护。By using only one statement, optimized index maintenance could be exploited.

  • 评估查询类型以及如何在查询中使用列。Evaluate the query type and how columns are used in the query. 例如,在完全匹配查询类型中使用的列就适合用于非聚集索引或聚集索引。For example, a column used in an exact-match query type would be a good candidate for a nonclustered or clustered index.

1 “SARGable”一词在关系数据库中指的是一个搜索可论证 的谓词,它可以利用一个索引来加快查询的执行过程。1 The term SARGable in relational databases refers to a Search ARGument-able predicate that can leverage an index to speed up the execution of the query.

列注意事项Column Considerations

设计索引时,应考虑以下列准则:When you design an index consider the following column guidelines:

  • 对于聚集索引,请保持较短的索引键长度。Keep the length of the index key short for clustered indexes. 另外,对唯一列或非空列创建聚集索引可以使聚集索引获益。Additionally, clustered indexes benefit from being created on unique or nonnull columns.

  • 无法指定 ntexttextimagevarchar(max)nvarchar(max)varbinary(max) 数据类型的列为索引键列。Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. 不过, varchar(max)nvarchar(max)varbinary(max)xml 数据类型的列可以作为非键索引列参与非聚集索引。However, varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns. 有关详细信息,请参阅本指南中的 具有包含列的索引For more information, see the section 'Index with Included Columns' in this guide.

  • xml 数据类型的列只能在 XML 索引中用作键列。An xml data type can only be a key column only in an XML index. 有关详细信息,请参阅 XML 索引 (SQL Server)For more information, see XML Indexes (SQL Server). SQL Server 2012 SP1 引入了称作选择性 XML 索引的一种新的 XML 索引。SQL Server 2012 SP1 introduces a new type of XML index known as a Selective XML Index. 这个新的索引可提高 SQL Server 中针对作为 XML 存储的数据的查询性能,从而通过降低索引本身的存储成本来加快大型 XML 数据工作负荷的索引编制和改进可伸缩性。This new index can improve querying performance over data stored as XML in SQL Server, allow for much faster indexing of large XML data workloads, and improve scalability by reducing storage costs of the index itself. 有关详细信息,请参阅选择性 XML 索引 (SXI)For more information, see Selective XML Indexes (SXI).

  • 检查列的唯一性。Examine column uniqueness. 在同一个列组合的唯一索引而不是非唯一索引提供了有关使索引更有用的查询优化器的附加信息。A unique index instead of a nonunique index on the same combination of columns provides additional information for the query optimizer that makes the index more useful. 有关详细信息,请参阅本指南中的 唯一索引设计指南For more information, see Unique Index Design Guidelines in this guide.

  • 在列中检查数据分布。Examine data distribution in the column. 通常情况下,为包含很少唯一值的列创建索引或在这样的列上执行联接将导致长时间运行的查询。Frequently, a long-running query is caused by indexing a column with few unique values, or by performing a join on such a column. 这是数据和查询的基本问题,通常不识别这种情况就无法解决这类问题。This is a fundamental problem with the data and query, and generally cannot be resolved without identifying this situation. 例如,如果物理电话簿按姓的字母顺序排序,而城市里所有人的姓都是 Smith 或 Jones,则无法快速找到某个人。For example, a physical telephone directory sorted alphabetically on last name will not expedite locating a person if all people in the city are named Smith or Jones. 有关数据分布的详细信息,请参阅 统计信息For more information about data distribution, see Statistics.

  • 考虑对具有定义完善的子集的列(例如,稀疏列、大部分值为 NULL 的列、含各类值的列以及含不同范围的值的列)使用筛选索引。Consider using filtered indexes on columns that have well-defined subsets, for example sparse columns, columns with mostly NULL values, columns with categories of values, and columns with distinct ranges of values. 设计良好的筛选索引可以提高查询性能,降低索引维护成本和存储成本。A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce storage costs.

  • 如果索引包含多个列,则应考虑列的顺序。Consider the order of the columns if the index will contain multiple columns. 用于等于 (=)、大于 (>)、小于 (<) 或 BETWEEN 搜索条件的 WHERE 子句或者参与联接的列应该放在最前面。The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. 其他列应该基于其非重复级别进行排序,就是说,从最不重复的列到最重复的列。Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.

    例如,如果将索引定义为 LastNameFirstName ,则该索引在搜索条件为 WHERE LastName = 'Smith'WHERE LastName = Smith AND FirstName LIKE 'J%'时将很有用。For example, if the index is defined as LastName, FirstName the index will be useful when the search criterion is WHERE LastName = 'Smith' or WHERE LastName = Smith AND FirstName LIKE 'J%'. 不过,查询优化器不会将此索引用于基于 FirstName (WHERE FirstName = 'Jane')而搜索的查询。However, the query optimizer would not use the index for a query that searched only on FirstName (WHERE FirstName = 'Jane').

  • 考虑对计算列进行索引。Consider indexing computed columns. 有关详细信息,请参阅 计算列上的索引For more information, see Indexes on Computed Columns.

索引的特征Index Characteristics

在确定某一索引适合某一查询之后,可以选择最适合具体情况的索引类型。After you have determined that an index is appropriate for a query, you can select the type of index that best fits your situation. 索引包含以下特性:Index characteristics include the following:

  • 聚集还是非聚集Clustered versus nonclustered
  • 唯一还是非唯一Unique versus nonunique
  • 单列还是多列Single column versus multicolumn
  • 索引中的列是升序排序还是降序排序Ascending or descending order on the columns in the index
  • 非聚集索引是全表还是经过筛选Full-table versus filtered for nonclustered indexes
  • 列存储与行存储Columnstore versus rowstore
  • 内存优化表的哈希索引与非聚集索引Hash versus nonclustered for Memory-Optimized tables

您也可以通过设置选项(例如 FILLFACTOR)自定义索引的初始存储特征以优化其性能或维护。You can also customize the initial storage characteristics of the index to optimize its performance or maintenance by setting an option such as FILLFACTOR. 而且,通过使用文件组或分区方案可以确定索引存储位置来优化性能。Also, you can determine the index storage location by using filegroups or partition schemes to optimize performance.

文件组或分区方案的索引设置Index Placement on Filegroups or Partitions Schemes

开发索引设计策略时,应该考虑在与数据库相关联的文件组上放置索引。As you develop your index design strategy, you should consider the placement of the indexes on the filegroups associated with the database. 仔细选择文件组或分区方案可以改进查询性能。Careful selection of the filegroup or partition scheme can improve query performance.

默认情况下,索引存储在基表所在的文件组上,该索引即在该基表上创建。By default, indexes are stored in the same filegroup as the base table on which the index is created. 非分区聚集索引和基表始终在同一个文件组中。A nonpartitioned clustered index and the base table always reside in the same filegroup. 但是,您可以执行以下操作:However, you can do the following:

  • 为除基表或聚集索引的文件组之外的文件组创建非聚集索引。Create nonclustered indexes on a filegroup other than the filegroup of the base table or clustered index.
  • 对要涵盖多个文件组的聚集和非聚集索引进行分区。Partition clustered and nonclustered indexes to span multiple filegroups.
  • 通过删除聚集索引并在 DROP INDEX 语句的 MOVE TO 子句中指定新的文件组或分区方案,或者在 CREATE INDEX 语句中使用 DROP_EXISTING 子句,将表从一个文件组移至另一个文件组。Move a table from one filegroup to another by dropping the clustered index and specifying a new filegroup or partition scheme in the MOVE TO clause of the DROP INDEX statement or by using the CREATE INDEX statement with the DROP_EXISTING clause.

通过对其他文件组创建非聚集索引,可以在文件组通过自带的控制器使用不同的物理驱动器时实现性能提升。By creating the nonclustered index on a different filegroup, you can achieve performance gains if the filegroups are using different physical drives with their own controllers. 这样一来,数据和索引信息即可由多个磁头并行读取。Data and index information can then be read in parallel by the multiple disk heads. 例如,如果文件组 Table_Af1 和文件组 Index_Af2 都由同一个查询使用,就可无争夺地充分使用这两个文件组,因此可以实现性能提升。For example, if Table_A on filegroup f1 and Index_A on filegroup f2 are both being used by the same query, performance gains can be achieved because both filegroups are being fully used without contention. 但是,如果 Table_A 由查询扫描而没有引用 Index_A ,则仅使用文件组 f1However, if Table_A is scanned by the query but Index_A is not referenced, only filegroup f1 is used. 这不会引起性能提升。This creates no performance gain.

由于无法预测将要发生的访问类型以及访问时间,因此更好的办法可能是展开所有文件组中的表和索引。Because you cannot predict what type of access will occur and when it will occur, it could be a better decision to spread your tables and indexes across all filegroups. 这将保证能够访问所有磁盘,因为所有数据和索引在所有磁盘上均匀展开,不受访问数据的方式的限制。This would guarantee that all disks are being accessed because all data and indexes are spread evenly across all disks, regardless of which way the data is accessed. 这对系统管理员来说也是更简单的方法。This is also a simpler approach for system administrators.

在多个文件组中分区Partitions across multiple Filegroups

还可以考虑在多个文件组中对聚集和非聚集索引分区。You can also consider partitioning clustered and nonclustered indexes across multiple filegroups. 根据分区函数,对已分区的索引进行水平分区或按行分区。Partitioned indexes are partitioned horizontally, or by row, based on a partition function. 分区函数定义如何根据某些列(称为分区依据列)的值将每一行映射到一组分区。The partition function defines how each row is mapped to a set of partitions based on the values of certain columns, called partitioning columns. 分区方案将分区映射指定给一组文件组。A partition scheme specifies the mapping of the partitions to a set of filegroups.

对索引进行分区有以下优点:Partitioning an index can provide the following benefits:

  • 提供使大型索引更易管理的可伸缩系统。Provide scalable systems that make large indexes more manageable. 例如,OLTP 系统可以实现处理大型索引的可识别分区的应用程序。OLTP systems, for example, can implement partition-aware applications that deal with large indexes.

  • 使查询运行得更快、更有效。Make queries run faster and more efficiently. 当查询访问索引的几个分区时,查询优化器同时可以处理各个分区,但不包括不受该查询影响的分区。When queries access several partitions of an index, the query optimizer can process individual partitions at the same time and exclude partitions that are not affected by the query.

有关详细信息,请参阅 Partitioned Tables and IndexesFor more information, see Partitioned Tables and Indexes.

索引排序顺序设计指南Index Sort Order Design Guidelines

定义索引时,应该考虑索引键列的数据是按升序还是按降序存储。When defining indexes, you should consider whether the data for the index key column should be stored in ascending or descending order. 升序是默认设置,保持与 SQL ServerSQL Server早期版本的兼容性。Ascending is the default and maintains compatibility with earlier versions of SQL ServerSQL Server. CREATE INDEX、CREATE TABLE 和 ALTER TABLE 语句的语法在索引和约束中的各列上支持关键字 ASC(升序)和 DESC(降序):The syntax of the CREATE INDEX, CREATE TABLE, and ALTER TABLE statements supports the keywords ASC (ascending) and DESC (descending) on individual columns in indexes and constraints.

当引用表的查询包含用以指定索引中键列的不同方向的 ORDER BY 子句时,指定键值存储在该索引中的顺序很有用。Specifying the order in which key values are stored in an index is useful when queries referencing the table have ORDER BY clauses that specify different directions for the key column or columns in that index. 在这些情况下,索引就无需在查询计划中使用 SORT 运算符。因此,使得查询更有效。In these cases, the index can remove the need for a SORT operator in the query plan; therefore, this makes the query more efficient. 例如, Adventure Works CyclesAdventure Works Cycles 采购部门的买方不得不评估他们从供应商处购买的产品的质量。For example, the buyers in the Adventure Works CyclesAdventure Works Cycles purchasing department have to evaluate the quality of products they purchase from vendors. 买方倾向于查验那些由具有高拒绝率的供应商发送的产品。The buyers are most interested in finding products sent by these vendors with a high rejection rate. 检索数据以满足此条件需要将 RejectedQty 表中的 Purchasing.PurchaseOrderDetail 列按降序(由大到小)排序,并且将 ProductID 列按升序(由小到大)排序,如下列查询所示。As shown in the following query, retrieving the data to meet this criteria requires the RejectedQty column in the Purchasing.PurchaseOrderDetail table to be sorted in descending order (large to small) and the ProductID column to be sorted in ascending order (small to large).

SELECT RejectedQty, ((RejectedQty/OrderQty)*100) AS RejectionRate,  
    ProductID, DueDate  
FROM Purchasing.PurchaseOrderDetail  
ORDER BY RejectedQty DESC, ProductID ASC;  

此查询的下列执行计划显示了查询优化器使用 SORT 运算符按 ORDER BY 子句指定的顺序返回结果集。The following execution plan for this query shows that the query optimizer used a SORT operator to return the result set in the order specified by the ORDER BY clause.

IndexSort1

如果使用与查询的 ORDER BY 子句中的键列匹配的键列创建索引,则无需在查询计划中使用 SORT 运算符,从而使查询计划更有效。If an index is created with key columns that match those in the ORDER BY clause in the query, the SORT operator can be eliminated in the query plan and the query plan is more efficient.

CREATE NONCLUSTERED INDEX IX_PurchaseOrderDetail_RejectedQty  
ON Purchasing.PurchaseOrderDetail  
    (RejectedQty DESC, ProductID ASC, DueDate, OrderQty);  

再次执行查询后,下列执行计划显示未使用 SORT 运算符,而使用了新创建的非聚集索引。After the query is executed again, the following execution plan shows that the SORT operator has been eliminated and the newly created nonclustered index is used.

InsertSort2

数据库引擎Database Engine 可以在两个方向上同样有效地移动。The 数据库引擎Database Engine can move equally efficiently in either direction. 对于一个在 ORDER BY 子句中列的排序方向倒排的查询,仍然可以使用定义为 (RejectedQty DESC, ProductID ASC) 的索引。An index defined as (RejectedQty DESC, ProductID ASC) can still be used for a query in which the sort direction of the columns in the ORDER BY clause are reversed. 例如,包含 ORDER BY 子句 ORDER BY RejectedQty ASC, ProductID DESC 的查询可以使用该索引。For example, a query with the ORDER BY clause ORDER BY RejectedQty ASC, ProductID DESC can use the index.

只可以为键列指定排序顺序。Sort order can be specified only for key columns. sys.index_columns 目录视图和 INDEXKEY_PROPERTY 函数报告索引列是按升序还是降序存储。The sys.index_columns catalog view and the INDEXKEY_PROPERTY function report whether an index column is stored in ascending or descending order.

元数据Metadata

使用这些元数据视图可以查看索引的属性。Use these metadata views to see attributes of indexes. 其他体系结构信息嵌入在其中的某些视图中。More architectural information is embedded in some of these views.

备注

对于列存储索引,所有列在元数据中作为包含性列存储。For columnstore indexes, all columns are stored in the metadata as included columns. 列存储索引中没有任何键列。The columnstore index does not have key columns.

sys.indexes (Transact-SQL)sys.indexes (Transact-SQL) sys.index_columns (Transact-SQL)sys.index_columns (Transact-SQL)
sys.partitions (Transact-SQL)sys.partitions (Transact-SQL) sys.internal_partitions (Transact-SQL)sys.internal_partitions (Transact-SQL)
sys.dm_db_index_operational_stats (Transact-SQL)sys.dm_db_index_operational_stats (Transact-SQL) sys.dm_db_index_physical_stats (Transact-SQL)sys.dm_db_index_physical_stats (Transact-SQL)
sys.column_store_segments (Transact-SQL)sys.column_store_segments (Transact-SQL) sys.column_store_dictionaries (Transact-SQL)sys.column_store_dictionaries (Transact-SQL)
sys.column_store_row_groups (Transact-SQL)sys.column_store_row_groups (Transact-SQL) sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)
sys.dm_db_column_store_row_group_physical_stats (Transact-SQL)sys.dm_db_column_store_row_group_physical_stats (Transact-SQL) sys.dm_column_store_object_pool (Transact-SQL)sys.dm_column_store_object_pool (Transact-SQL)
sys.dm_db_column_store_row_group_operational_stats (Transact-SQL)sys.dm_db_column_store_row_group_operational_stats (Transact-SQL) sys.dm_db_xtp_hash_index_stats (Transact-SQL)sys.dm_db_xtp_hash_index_stats (Transact-SQL)
sys.dm_db_xtp_index_stats (Transact-SQL)sys.dm_db_xtp_index_stats (Transact-SQL) sys.dm_db_xtp_object_stats (Transact-SQL)sys.dm_db_xtp_object_stats (Transact-SQL)
sys.dm_db_xtp_nonclustered_index_stats (Transact-SQL)sys.dm_db_xtp_nonclustered_index_stats (Transact-SQL) sys.dm_db_xtp_table_memory_stats (Transact-SQL)sys.dm_db_xtp_table_memory_stats (Transact-SQL)
sys.hash_indexes (Transact-SQL)sys.hash_indexes (Transact-SQL) sys.memory_optimized_tables_internal_attributes (Transact-SQL)sys.memory_optimized_tables_internal_attributes (Transact-SQL)

聚集索引设计指南Clustered Index Design Guidelines

聚集索引基于数据行的键值在表内排序和存储这些数据行。Clustered indexes sort and store the data rows in the table based on their key values. 每个表只能有一个聚集索引,因为数据行本身只能按一个顺序存储。There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. 每个表几乎都对列定义聚集索引来实现下列功能:With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

  • 可用于经常使用的查询。Can be used for frequently used queries.

  • 提供高度唯一性。Provide a high degree of uniqueness.

    备注

    创建 PRIMARY KEY 约束时,将在列上自动创建唯一索引。When you create a PRIMARY KEY constraint, a unique index on the column, or columns, is automatically created. 默认情况下,此索引是聚集索引,但是在创建约束时,可以指定创建非聚集索引。By default, this index is clustered; however, you can specify a nonclustered index when you create the constraint.

  • 可用于范围查询。Can be used in range queries.

如果未使用 UNIQUE 属性创建聚集索引,数据库引擎Database Engine会自动向表添加一个 4 字节的唯一标识符列。If the clustered index is not created with the UNIQUE property, the 数据库引擎Database Engine automatically adds a 4-byte uniqueifier column to the table. 必要时, 数据库引擎Database Engine 将向行自动添加一个唯一标识符值以使每个键唯一。When it is required, the 数据库引擎Database Engine automatically adds a uniqueifier value to a row to make each key unique. 此列和列值供内部使用,用户不能查看或访问。This column and its values are used internally and cannot be seen or accessed by users.

聚集索引体系结构Clustered Index Architecture

SQL ServerSQL Server 中,索引是按 B 树结构组织的。In SQL ServerSQL Server, indexes are organized as B-Trees. 索引 B 树中的每一页称为一个索引节点。Each page in an index B-tree is called an index node. B 树的顶端节点称为根节点。The top node of the B-tree is called the root node. 索引中的底层节点称为叶节点。The bottom nodes in the index are called the leaf nodes. 根节点与叶节点之间的任何索引级别统称为中间级。Any index levels between the root and the leaf nodes are collectively known as intermediate levels. 在聚集索引中,叶节点包含基础表的数据页。In a clustered index, the leaf nodes contain the data pages of the underlying table. 根节点和中间级节点包含存有索引行的索引页。The root and intermediate level nodes contain index pages holding index rows. 每个索引行包含一个键值和一个指针,该指针指向 B 树上的某一中间级页或叶级索引中的某个数据行。Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. 每级索引中的页均被链接在双向链接列表中。The pages in each level of the index are linked in a doubly-linked list.

聚集索引在 sys.partitions中有一行,其中,索引使用的每个分区的 index_id = 1。Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. 默认情况下,聚集索引有单个分区。By default, a clustered index has a single partition. 当聚集索引有多个分区时,每个分区都有一个包含该特定分区相关数据的 B 树结构。When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. 例如,如果聚集索引有四个分区,就有四个 B 树结构,每个分区中有一个 B 树结构。For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.

根据聚集索引中的数据类型,每个聚集索引结构将有一个或多个分配单元,将在这些单元中存储和管理特定分区的相关数据。Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. 每个聚集索引的每个分区中至少有一个 IN_ROW_DATA 分配单元。At a minimum, each clustered index will have one IN_ROW_DATA allocation unit per partition. 如果聚集索引包含大型对象 (LOB) 列,则它的每个分区中还会有一个 LOB_DATA 分配单元 。The clustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. 如果聚集索引包含的变量长度列超过 8,060 字节的行大小限制,则它的每个分区中还会有一个 ROW_OVERFLOW_DATA 分配单元 。It will also have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit.

数据链内的页和行将按聚集索引键值进行排序。The pages in the data chain and the rows in them are ordered on the value of the clustered index key. 所有插入操作都在所插入行中的键值与现有行中的排序顺序相匹配时执行。All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.

下图显式了聚集索引单个分区中的结构。This illustration shows the structure of a clustered index in a single partition.

bokind2

查询注意事项Query Considerations

在创建聚集索引之前,应先了解数据是如何被访问的。Before you create clustered indexes, understand how your data will be accessed. 考虑对具有以下特点的查询使用聚集索引:Consider using a clustered index for queries that do the following:

  • 使用运算符(如 BETWEEN、>、>=、< 和 <=)返回一系列值。Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.

    使用聚集索引找到包含第一个值的行后,便可以确保包含后续索引值的行物理相邻。After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. 例如,如果某个查询在一系列销售订单号间检索记录, SalesOrderNumber 列的聚集索引可快速定位包含起始销售订单号的行,然后检索表中所有连续的行,直到检索到最后的销售订单号。For example, if a query retrieves records between a range of sales order numbers, a clustered index on the column SalesOrderNumber can quickly locate the row that contains the starting sales order number, and then retrieve all successive rows in the table until the last sales order number is reached.

  • 返回大型结果集。Return large result sets.

  • 使用 JOIN 子句;一般情况下,使用该子句的是外键列。Use JOIN clauses; typically these are foreign key columns.

  • 使用 ORDER BYGROUP BY 子句。Use ORDER BY or GROUP BY clauses.

    在 ORDER BY 或 GROUP BY 子句中指定的列的索引,可以使 数据库引擎Database Engine 不必对数据进行排序,因为这些行已经排序。An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the 数据库引擎Database Engine to sort the data, because the rows are already sorted. 这样可以提高查询性能。This improves query performance.

列注意事项Column Considerations

一般情况下,定义聚集索引键时使用的列越少越好。Generally, you should define the clustered index key with as few columns as possible. 考虑具有下列一个或多个属性的列:Consider columns that have one or more of the following attributes:

  • 唯一或包含许多不重复的值Are unique or contain many distinct values

    例如,雇员 ID 唯一地标识雇员。For example, an employee ID uniquely identifies employees. EmployeeID 列的聚集索引或主键约束可提高基于雇员 ID 号搜索雇员信息的查询的性能。A clustered index or PRIMARY KEY constraint on the EmployeeID column would improve the performance of queries that search for employee information based on the employee ID number. 另外,可对 LastNameFirstNameMiddleName 列创建聚集索引,因为经常以这种方式分组和查询雇员记录,而且这些列的组合还可提供高区分度。Alternatively, a clustered index could be created on LastName, FirstName, MiddleName because employee records are frequently grouped and queried in this way, and the combination of these columns would still provide a high degree of difference.

    提示

    如果没有另行指定,在创建主键约束时,SQL ServerSQL Server 会创建一个聚集索引来支持该约束。If not specified differently, when creating a PRIMARY KEY constraint, SQL ServerSQL Server creates a clustered index to support that constraint. 虽然可使用 uniqueidentifier 来强制实施作为主键的唯一性,但它不是有效的聚集键 。Although a uniqueidentifier can be used to enforce uniqueness as a PRIMARY KEY, it is not an efficient clustering key. 如果使用 uniqueidentifier 作为主键,建议将其创建为非聚集索引,然后使用另一列(如 IDENTITY)创建聚集索引 。If using a uniqueidentifier as PRIMARY KEY, the recommendation is to create it as a nonclustered index, and use another column such as an IDENTITY to create the clustered index.

  • 按顺序被访问Are accessed sequentially

    例如,产品 ID 唯一地标识 Production.Product 数据库的 AdventureWorks2012AdventureWorks2012 表中的产品。For example, a product ID uniquely identifies products in the Production.Product table in the AdventureWorks2012AdventureWorks2012 database. 在其中指定顺序搜索的查询(如 WHERE ProductID BETWEEN 980 and 999)将从 ProductID的聚集索引受益。Queries in which a sequential search is specified, such as WHERE ProductID BETWEEN 980 and 999, would benefit from a clustered index on ProductID. 这是因为行将按该键列的排序顺序存储。This is because the rows would be stored in sorted order on that key column.

  • 定义为 IDENTITYDefined as IDENTITY.

  • 经常用于对表中检索到的数据进行排序。Used frequently to sort the data retrieved from a table.

    按该列对表进行聚集(即物理排序)是一个好方法,它可以在每次查询该列时节省排序操作的成本。It can be a good idea to cluster, that is physically sort, the table on that column to save the cost of a sort operation every time the column is queried.

聚集索引不适用于具有下列属性的列:Clustered indexes are not a good choice for the following attributes:

  • 频繁更改的列Columns that undergo frequent changes

    这将导致整行移动,因为 数据库引擎Database Engine 必须按物理顺序保留行中的数据值。This causes in the whole row to move, because the 数据库引擎Database Engine must keep the data values of a row in physical order. 这一点要特别注意,因为在大容量事务处理系统中数据通常是可变的。This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

  • 宽键Wide keys

    宽键是若干列或若干大型列的组合。Wide keys are a composite of several columns or several large-size columns. 所有非聚集索引将聚集索引中的键值用作查找键。The key values from the clustered index are used by all nonclustered indexes as lookup keys. 为同一表定义的任何非聚集索引都将增大许多,这是因为非聚集索引项包含聚集键,同时也包含为此非聚集索引定义的键列。Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.

非聚集索引设计指南Nonclustered Index Design Guidelines

非聚集索引包含索引键值和指向表数据存储位置的行定位器。A nonclustered index contains the index key values and row locators that point to the storage location of the table data. 可以对表或索引视图创建多个非聚集索引。You can create multiple nonclustered indexes on a table or indexed view. 通常,设计非聚集索引是为改善经常使用的、没有建立聚集索引的查询的性能。Generally, nonclustered indexes should be designed to improve the performance of frequently used queries that are not covered by the clustered index.

与使用书中索引的方式相似,查询优化器在搜索数据值时,先搜索非聚集索引以找到数据值在表中的位置,然后直接从该位置检索数据。Similar to the way you use an index in a book, the query optimizer searches for a data value by searching the nonclustered index to find the location of the data value in the table and then retrieves the data directly from that location. 这使非聚集索引成为完全匹配查询的最佳选择,因为索引包含说明查询所搜索的数据值在表中的精确位置的项。This makes nonclustered indexes the optimal choice for exact match queries because the index contains entries describing the exact location in the table of the data values being searched for in the queries. 例如,为了从 HumanResources. Employee 表中查询向特定经理负责的所有雇员,查询优化器可能使用非聚集索引 IX_Employee_ManagerID;它以 ManagerID 作为其键列。For example, to query the HumanResources. Employee table for all employees that report to a specific manager, the query optimizer might use the nonclustered index IX_Employee_ManagerID; this has ManagerID as its key column. 查询优化器能快速找出索引中与指定 ManagerID匹配的所有项。The query optimizer can quickly find all entries in the index that match the specified ManagerID. 每个索引项都指向表或聚集索引中准确的页和行,其中可以找到相应的数据。Each index entry points to the exact page and row in the table, or clustered index, in which the corresponding data can be found. 在查询优化器在索引中找到所有项之后,它可以直接转到准确的页和行进行数据检索。After the query optimizer finds all entries in the index, it can go directly to the exact page and row to retrieve the data.

非聚集索引体系结构Nonclustered Index Architecture

非聚集索引与聚集索引具有相同的 B 树结构,它们之间的显著差别在于以下两点:Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:

  • 基础表的数据行不按非聚集键的顺序排序和存储。The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

  • 非聚集索引的叶级别是由索引页而不是由数据页组成。The leaf level of a nonclustered index is made up of index pages instead of data pages.

非聚集索引行中的行定位器或是指向行的指针,或是行的聚集索引键,如下所述:The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:

  • 如果表是堆(意味着该表没有聚集索引),则行定位器是指向行的指针。If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. 该指针由文件标识符 (ID)、页码和页上的行数生成。The pointer is built from the file identifier (ID), page number, and number of the row on the page. 整个指针称为行 ID (RID)。The whole pointer is known as a Row ID (RID).

  • 如果表有聚集索引或索引视图上有聚集索引,则行定位器是行的聚集索引键。If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row.

对于索引使用的每个分区,非聚集索引在 index_id >1 的 sys.partitions 中都有对应的一行。Nonclustered indexes have one row in sys.partitions with index_id > 1 for each partition used by the index. 默认情况下,一个非聚集索引有单个分区。By default, a nonclustered index has a single partition. 如果一个非聚集索引有多个分区,则每个分区都有一个包含该特定分区的索引行的 B 树结构。When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition. 例如,如果一个非聚集索引有四个分区,那么就有四个 B 树结构,每个分区中一个。For example, if a nonclustered index has four partitions, there are four B-tree structures, with one in each partition.

根据非聚集索引中数据类型的不同,每个非聚集索引结构会有一个或多个分配单元,在其中存储和管理特定分区的数据。Depending on the data types in the nonclustered index, each nonclustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. 每个非聚集索引至少有一个针对每个分区的 IN_ROW_DATA 分配单元(存储索引 B 树页) 。At a minimum, each nonclustered index will have one IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. 如果非聚集索引包含大型对象 (LOB) 列,则还有一个针对每个分区的 LOB_DATA 分配单元 。The nonclustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. 此外,如果非聚集索引包含的可变长度列超过 8,060 字节的行大小限制,则还有一个针对每个分区的 ROW_OVERFLOW_DATA 分配单元 。Additionally, it will have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit.

下图说明了单个分区中的非聚集索引结构。The following illustration shows the structure of a nonclustered index in a single partition.

bokind1a

数据库注意事项Database Considerations

设计非聚集索引时需要注意数据库的特征。Consider the characteristics of the database when designing nonclustered indexes.

  • 更新要求较低但包含大量数据的数据库或表可以从许多非聚集索引中获益从而改善查询性能。Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance. 与全表非聚集索引相比,考虑为定义完善的数据子集创建筛选索引可以提高查询性能、降低索引存储开销并减少索引维护开销。Consider creating filtered indexes for well-defined subsets of data to improve query performance, reduce index storage costs, and reduce index maintenance costs compared with full-table nonclustered indexes.

    决策支持系统应用程序和主要包含只读数据的数据库可以从许多非聚集索引中获益。Decision Support System applications and databases that contain primarily read-only data can benefit from many nonclustered indexes. 查询优化器具有更多可供选择的索引用来确定最快的访问方法,并且数据库的低更新特征意味着索引维护不会降低性能。The query optimizer has more indexes to choose from to determine the fastest access method, and the low update characteristics of the database mean index maintenance will not impede performance.

  • 联机事务处理 (OLTP) 应用程序和包含经常更新的表的数据库应避免过多索引。Online Transaction Processing (OLTP) applications and databases that contain heavily updated tables should avoid over-indexing. 此外,索引应该是窄的,即列越少越好。Additionally, indexes should be narrow, that is, with as few columns as possible.

    对表编制大量索引会影响 INSERT、UPDATE、DELETE 和 MERGE 语句的性能,因为当表中的数据更改时,所有索引都须进行适当的调整。Large numbers of indexes on a table affect the performance of INSERT, UPDATE, DELETE, and MERGE statements because all indexes must be adjusted appropriately as data in the table changes.

查询注意事项Query Considerations

在创建非聚集索引之前,应先了解访问数据的方式。Before you create nonclustered indexes, you should understand how your data will be accessed. 考虑对具有以下属性的查询使用非聚集索引:Consider using a nonclustered index for queries that have the following attributes:

  • 使用 JOINGROUP BY 子句。Use JOIN or GROUP BY clauses.

    应为联接和分组操作中所涉及的列创建多个非聚集索引,为任何外键列创建一个聚集索引。Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.

  • 不返回大型结果集的查询。Queries that do not return large result sets.

    创建筛选索引以覆盖从大型表中返回定义完善的行子集的查询。Create filtered indexes to cover queries that return a well-defined subset of rows from a large table.

    提示

    通常,CREATE INDEX 语句的 WHERE 子句匹配所覆盖的查询的 WHERE 子句。Typically the WHERE clause of the CREATE INDEX statement matches the WHERE clause of a query being covered.

  • 包含经常包含在查询的搜索条件(例如返回完全匹配的 WHERE 子句)中的列。Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches.

    提示

    添加新索引时,请考虑成本和权益。Consider the cost versus benefit when adding new indexes. 将其他查询需求合并到现有索引中可能更可取。It may be preferable to consolidate additional query needs into an existing index. 例如,如果允许覆盖多个关键查询,而不是每个关键查询都有一个覆盖索引,则考虑添加一个或两个额外的叶级别列到现有索引。For example, consider adding one or two extra leaf level columns to an existing index, if it allows coverage of several critical queries, instead of having one exactly covering index per each critical query.

列注意事项Column Considerations

考虑具有以下一个或多个属性的列:Consider columns that have one or more of these attributes:

  • 覆盖查询。Cover the query.

    当索引包含查询中的所有列时,性能可以提升。Performance gains are achieved when the index contains all columns in the query. 查询优化器可以找到索引内的所有列值;不会访问表或聚集索引数据,这样就减少了磁盘 I/O 操作。The query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations. 使用具有包含列的索引来添加覆盖列,而不是创建宽索引键。Use index with included columns to add covering columns instead of creating a wide index key.

    如果表有聚集索引,则该聚集索引中定义的列将自动追加到表上每个非聚集索引的末端。If the table has a clustered index, the column or columns defined in the clustered index are automatically appended to the end of each nonclustered index on the table. 这可以生成覆盖查询,而不用在非聚集索引定义中指定聚集索引列。This can produce a covered query without specifying the clustered index columns in the definition of the nonclustered index. 例如,如果一个表在 C列上有聚集索引,则 BA 列的非聚集索引将具有其自己的键值列 BACFor example, if a table has a clustered index on column C, a nonclustered index on columns B and A will have as its key values columns B, A, and C.

  • 大量非重复值,如姓氏和名字的组合(前提是聚集索引被用于其他列)。Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns.

    如果只有很少的非重复值,例如仅有 1 和 0,则大多数查询将不使用索引,因为此时表扫描通常更有效。If there are very few distinct values, such as only 1 and 0, most queries will not use the index because a table scan is generally more efficient. 对于这种类型的数据,应考虑对仅出现在少数行中的非重复值创建筛选索引。For this type of data, consider creating a filtered index on a distinct value that only occurs in a small number of rows. 例如,如果大部分值都是 0,则查询优化器可以对包含 1 的数据行使用筛选查询。For example, if most of the values are 0, the query optimizer might use a filtered index for the data rows that contain 1.

使用包含列扩展非聚集索引Use Included Columns to Extend Nonclustered Indexes

您可以通过将非键列添加到非聚集索引的叶级,扩展非聚集索引的功能。You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. 通过包含非键列,可以创建覆盖更多查询的非聚集索引。By including nonkey columns, you can create nonclustered indexes that cover more queries. 这是因为非键列具有下列优点:This is because the nonkey columns have the following benefits:

  • 它们可以是不允许作为索引键列的数据类型。They can be data types not allowed as index key columns.

  • 在计算索引键列数或索引键大小时, 数据库引擎Database Engine 不考虑它们。They are not considered by the 数据库引擎Database Engine when calculating the number of index key columns or index key size.

当查询中的所有列都作为键列或非键列包含在索引中时,带有包含性非键列的索引可以显著提高查询性能。An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. 这样可以实现性能提升,因为查询优化器可以在索引中找到所有列值;不访问表或聚集索引数据,从而减少磁盘 I/O 操作。Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.

备注

当索引包含查询引用的所有列时,它通常称为“覆盖查询”。When an index contains all the columns referenced by the query it is typically referred to as covering the query.

键列存储在索引的所有级别中,而非键列仅存储在叶级别中。While key columns are stored at all levels of the index, nonkey columns are stored only at the leaf level.

使用包含列以避免大小限制Using Included Columns to Avoid Size Limits

可以将非键列包含在非聚集索引中,以避免超过当前索引大小的限制(最大键列数为 16,最大索引键大小为 900 字节)。You can include nonkey columns in a nonclustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes. 数据库引擎Database Engine 计算索引键列数或索引键大小时,不考虑非键列。The 数据库引擎Database Engine does not consider nonkey columns when calculating the number of index key columns or index key size.
例如,假设要为 Document 表中的以下列建立索引:For example, assume that you want to index the following columns in the Document table:

  • Title nvarchar(50)
  • Revision nchar(5)
  • FileName nvarchar(400)

因为 ncharnvarchar 数据类型的每个字符需要 2 个字节,所以包含这三列的索引将超出 900 字节的大小限制 10 个字节 (455 * 2)。Because the nchar and nvarchar data types require 2 bytes for each character, an index that contains these three columns would exceed the 900 byte size limitation by 10 bytes (455 * 2). 使用 INCLUDE 语句的 CREATE INDEX 子句,可以将索引键定义为 (Title, Revision),将 FileName 定义为非键列。By using the INCLUDE clause of the CREATE INDEX statement, the index key could be defined as (Title, Revision) and FileName defined as a nonkey column. 这样,索引键大小将为 110 个字节 (55 * 2),并且索引仍将包含所需的所有列。In this way, the index key size would be 110 bytes (55 * 2), and the index would still contain all the required columns. 下面的语句就创建了这样的索引。The following statement creates such an index.

CREATE INDEX IX_Document_Title   
ON Production.Document (Title, Revision)   
INCLUDE (FileName);   
带有包含列的索引准则Index with Included Columns Guidelines

设计带有包含列的非聚集索引时,请考虑下列准则:When you design nonclustered indexes with included columns consider the following guidelines:

  • 在 CREATE INDEX 语句的 INCLUDE 子句中定义非键列。Nonkey columns are defined in the INCLUDE clause of the CREATE INDEX statement.

  • 只能对表或索引视图的非聚集索引定义非键列。Nonkey columns can only be defined on nonclustered indexes on tables or indexed views.

  • 允许除 textntextimage之外的所有数据类型。All data types are allowed except text, ntext, and image.

  • 精确或不精确的确定性计算列都可以是包含列。Computed columns that are deterministic and either precise or imprecise can be included columns. 有关详细信息,请参阅 计算列上的索引For more information, see Indexes on Computed Columns.

  • 与键列一样,只要允许将计算列数据类型作为非键索引列,从 imagentexttext 数据类型派生的计算列就可以作为非键(包含性)列。As with key columns, computed columns derived from image, ntext, and text data types can be nonkey (included) columns as long as the computed column data type is allowed as a nonkey index column.

  • 不能同时在 INCLUDE 列表和键列列表中指定列名。Column names cannot be specified in both the INCLUDE list and in the key column list.

  • INCLUDE 列表中的列名不能重复。Column names cannot be repeated in the INCLUDE list.

列大小准则Column Size Guidelines
  • 必须至少定义一个键列。At least one key column must be defined. 最大非键列数为 1023 列。The maximum number of nonkey columns is 1023 columns. 也就是最大的表列数减 1。This is the maximum number of table columns minus 1.

  • 索引键列(不包括非键)必须遵守现有索引大小的限制(最大键列数为 16,总索引键大小为 900 字节)。Index key columns, excluding nonkeys, must follow the existing index size restrictions of 16 key columns maximum, and a total index key size of 900 bytes.

  • 所有非键列的总大小只受 INCLUDE 子句中所指定列的大小限制;例如, varchar(max) 列限制为 2 GB。The total size of all nonkey columns is limited only by the size of the columns specified in the INCLUDE clause; for example, varchar(max) columns are limited to 2 GB.

列修改准则Column Modification Guidelines

修改已定义为包含列的表列时,要受下列限制:When you modify a table column that has been defined as an included column, the following restrictions apply:

  • 除非先删除索引,否则无法从表中删除非键列。Nonkey columns cannot be dropped from the table unless the index is dropped first.

  • 除进行下列更改外,不能对非键列进行其他更改:Nonkey columns cannot be changed, except to do the following:

    • 将列的为空性从 NOT NULL 改为 NULL。Change the nullability of the column from NOT NULL to NULL.

    • 增加 varcharnvarcharvarbinary 列的长度。Increase the length of varchar, nvarchar, or varbinary columns.

      备注

      这些列修改限制也适用于索引键列。These column modification restrictions also apply to index key columns.

设计建议Design Recommendations

重新设计索引键大小较大的非聚集索引,以便只有用于搜索和查找的列为键列。Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. 将覆盖查询的所有其他列设置为包含性非键列。Make all other columns that cover the query included nonkey columns. 这样,将具有覆盖查询所需的所有列,但索引键本身较小,而且效率高。In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient.

例如,假设要设计覆盖下列查询的索引。For example, assume that you want to design an index to cover the following query.

SELECT AddressLine1, AddressLine2, City, StateProvinceID, PostalCode  
FROM Person.Address  
WHERE PostalCode BETWEEN N'98000' and N'99999';  

若要覆盖查询,必须在索引中定义每列。To cover the query, each column must be defined in the index. 尽管可以将所有列定义为键列,但键大小为 334 字节。Although you could define all columns as key columns, the key size would be 334 bytes. 因为实际上用作搜索条件的唯一列是 PostalCode 列(长度为 30 字节),所以更好的索引设计应该将 PostalCode 定义为键列并包含作为非键列的所有其他列。Because the only column actually used as search criteria is the PostalCode column, having a length of 30 bytes, a better index design would define PostalCode as the key column and include all other columns as nonkey columns.

下面的语句创建了一个覆盖查询的带有包含列的索引。The following statement creates an index with included columns to cover the query.

CREATE INDEX IX_Address_PostalCode  
ON Person.Address (PostalCode)  
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);  
性能注意事项Performance Considerations

避免添加不必要的列。Avoid adding unnecessary columns. 添加过多的索引列(键列或非键列)会对性能产生下列影响:Adding too many index columns, key or nonkey, can have the following performance implications:

  • 一页上能容纳的索引行将更少。Fewer index rows will fit on a page. 这样会使 I/O 增加并降低缓存效率。This could create I/O increases and reduced cache efficiency.

  • 需要更多的磁盘空间来存储索引。More disk space will be required to store the index. 特别是,将 varchar(max)nvarchar(max)varbinary(max)xml 数据类型添加为非键索引列会显著增加磁盘空间要求。In particular, adding varchar(max), nvarchar(max), varbinary(max), or xml data types as nonkey index columns may significantly increase disk space requirements. 这是因为列值被复制到了索引叶级别。This is because the column values are copied into the index leaf level. 因此,它们既驻留在索引中,也驻留在基表中。Therefore, they reside in both the index and the base table.

  • 索引维护可能会增加对基础表或索引视图执行修改、插入、更新或删除操作所需的时间。Index maintenance may increase the time that it takes to perform modifications, inserts, updates, or deletes, to the underlying table or indexed view.

您应该确定修改数据时在查询性能上的提升是否超过了对性能的影响,以及是否需要额外的磁盘空间要求。You will have to determine whether the gains in query performance outweigh the affect to performance during data modification and in additional disk space requirements.

唯一索引设计指南Unique Index Design Guidelines

唯一索引能够保证索引键中不包含重复的值,从而使表中的每一行从某种方式上具有唯一性。A unique index guarantees that the index key contains no duplicate values and therefore every row in the table is in some way unique. 只有当唯一性是数据本身的特征时,指定唯一索引才有意义。Specifying a unique index makes sense only when uniqueness is a characteristic of the data itself. 例如,如果要确保 NationalIDNumber 表中 HumanResources.Employee 列的值是唯一的,当主键为 EmployeeID时,对 NationalIDNumber 列创建 UNIQUE 约束。For example, if you want to make sure that the values in the NationalIDNumber column in the HumanResources.Employee table are unique, when the primary key is EmployeeID, create a UNIQUE constraint on the NationalIDNumber column. 如果用户尝试在该列中为多个雇员输入相同的值,将显示错误消息并且不能输入重复的值。If the user tries to enter the same value in that column for more than one employee, an error message is displayed and the duplicate value is not entered.

使用多列唯一索引,索引能够保证索引键中值的每个组合都是唯一的。With multicolumn unique indexes, the index guarantees that each combination of values in the index key is unique. 例如,如果为 LastNameFirstNameMiddleName 列的组合创建了唯一索引,则表中的任意两行都不会有这些列值的相同组合。For example, if a unique index is created on a combination of LastName, FirstName, and MiddleName columns, no two rows in the table could have the same combination of values for these columns.

聚集索引和非聚集索引都可以是唯一的。Both clustered and nonclustered indexes can be unique. 只要列中的数据是唯一的,就可以为同一个表创建一个唯一聚集索引和多个唯一非聚集索引。Provided that the data in the column is unique, you can create both a unique clustered index and multiple unique nonclustered indexes on the same table.

唯一索引的优点包括下列几点:The benefits of unique indexes include the following:

  • 能够确保定义的列的数据完整性。Data integrity of the defined columns is ensured.

  • 提供了对查询优化器有用的附加信息。Additional information helpful to the query optimizer is provided.

创建 PRIMARY KEY 或 UNIQUE 约束会自动为指定的列创建唯一索引。Creating a PRIMARY KEY or UNIQUE constraint automatically creates a unique index on the specified columns. 创建 UNIQUE 约束和创建独立于约束的唯一索引没有明显的区别。There are no significant differences between creating a UNIQUE constraint and creating a unique index independent of a constraint. 数据验证的方式是相同的,而且查询优化器不会区分唯一索引是由约束创建的还是手动创建的。Data validation occurs in the same manner and the query optimizer does not differentiate between a unique index created by a constraint or manually created. 但是,如果您的目的是要实现数据完整性,则应为列创建 UNIQUE 或 PRIMARY KEY 约束。However, you should create a UNIQUE or PRIMARY KEY constraint on the column when data integrity is the objective. 这样做才能使索引的目标明确。By doing this the objective of the index will be clear.

注意事项Considerations

  • 如果数据中存在重复的键值,则不能创建唯一索引、UNIQUE 约束或 PRIMARY KEY 约束。A unique index, UNIQUE constraint, or PRIMARY KEY constraint cannot be created if duplicate key values exist in the data.

  • 如果数据是唯一的并且您希望强制实现唯一性,则为相同的列组合创建唯一索引而不是非唯一索引可以为查询优化器提供附加信息,从而生成更有效的执行计划。If the data is unique and you want uniqueness enforced, creating a unique index instead of a nonunique index on the same combination of columns provides additional information for the query optimizer that can produce more efficient execution plans. 在这种情况下,建议创建唯一索引(最好通过创建 UNIQUE 约束来创建)。Creating a unique index (preferably by creating a UNIQUE constraint) is recommended in this case.

  • 唯一非聚集索引可以包括包含性非键列。A unique nonclustered index can contain included nonkey columns. 有关详细信息,请参阅 具有包含列的索引For more information, see Index with Included Columns.

筛选索引设计指南Filtered Index Design Guidelines

筛选索引是一种经过优化的非聚集索引,尤其适用于涵盖从定义完善的数据子集中选择数据的查询。A filtered index is an optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. 筛选索引使用筛选谓词对表中的部分行进行索引。It uses a filter predicate to index a portion of rows in the table. 与全表索引相比,设计良好的筛选索引可以提高查询性能、减少索引维护开销并可降低索引存储开销。A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.

适用范围SQL Server 2008SQL Server 2008SQL Server 2017SQL Server 2017Applies to: SQL Server 2008SQL Server 2008 through SQL Server 2017SQL Server 2017.

筛选索引与全表索引相比具有以下优点:Filtered indexes can provide the following advantages over full-table indexes:

  • 提高了查询性能和计划质量Improved query performance and plan quality

    设计良好的筛选索引可以提高查询性能和执行计划质量,因为它比全表非聚集索引小并且具有经过筛选的统计信息。A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics. 与全表统计信息相比,经过筛选的统计信息更加准确,因为它们只涵盖筛选索引中的行。The filtered statistics are more accurate than full-table statistics because they cover only the rows in the filtered index.

  • 减少了索引维护开销Reduced index maintenance costs

    仅在数据操作语言 (DML) 语句对索引中的数据产生影响时,才对索引进行维护。An index is maintained only when data manipulation language (DML) statements affect the data in the index. 与全表非聚集索引相比,筛选索引减少了索引维护开销,因为它更小并且仅在对索引中的数据产生影响时才进行维护。A filtered index reduces index maintenance costs compared with a full-table nonclustered index because it is smaller and is only maintained when the data in the index is affected. 筛选索引的数量可以非常多,特别是在其中包含很少受影响的数据时。It is possible to have a large number of filtered indexes, especially when they contain data that is affected infrequently. 同样,如果筛选索引只包含频繁受影响的数据,则索引大小较小时可以减少更新统计信息的开销。Similarly, if a filtered index contains only the frequently affected data, the smaller size of the index reduces the cost of updating the statistics.

  • 减少了索引存储开销Reduced index storage costs

    在没必要创建全表索引时,创建筛选索引可以减少非聚集索引的磁盘存储开销。Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table index is not necessary. 可以使用多个筛选索引替换一个全表非聚集索引而不会明显增加存储需求。You can replace a full-table nonclustered index with multiple filtered indexes without significantly increasing the storage requirements.

列中包含查询在 SELECT 语句中引用的定义完善的数据子集时,筛选索引很有用。Filtered indexes are useful when columns contain well-defined subsets of data that queries reference in SELECT statements. 以下是一些示例:Examples are:

  • 仅包含少量非 NULL 值的稀疏列。Sparse columns that contain only a few non-NULL values.

  • 包含多种类别的数据的异类列。Heterogeneous columns that contain categories of data.

  • 包含多个范围的值(如美元金额、时间和日期)的列。Columns that contain ranges of values such as dollar amounts, time, and dates.

  • 由列值的简单比较逻辑定义的表分区。Table partitions that are defined by simple comparison logic for column values.

如果索引中的行数与全表索引相比较少时,筛选索引减少的维护开销最为明显。Reduced maintenance costs for filtered indexes are most noticeable when the number of rows in the index is small compared with a full-table index. 如果筛选索引包含表中的大部分行,则与全表索引相比,其维护开销可能更高。If the filtered index includes most of the rows in the table, it could cost more to maintain than a full-table index. 在这种情况下,应使用全表索引而不是筛选索引。In this case, you should use a full-table index instead of a filtered index.

筛选索引是针对一个表定义的,仅支持简单比较运算符。Filtered indexes are defined on one table and only support simple comparison operators. 如果需要引用多个表或具有复杂逻辑的筛选表达式,则应创建视图。If you need a filter expression that references multiple tables or has complex logic, you should create a view.

设计注意事项Design Considerations

为了设计有效的筛选索引,必须了解应用程序使用哪些查询以及这些查询与您的数据子集有何关联。In order to design effective filtered indexes, it is important to understand what queries your application uses and how they relate to subsets of your data. 例如,所含值中大部分为 NULL 的列、含异类类别的值的列以及含不同范围的值的列都属于具有定义完善的子集的数据。Some examples of data that have well-defined subsets are columns with mostly NULL values, columns with heterogeneous categories of values and columns with distinct ranges of values. 以下设计注意事项提供了筛选索引优于全表索引的各种情况。The following design considerations give a variety of scenarios for when a filtered index can provide advantages over full-table indexes.

提示

非聚集列存储索引定义支持使用筛选的条件。The nonclustered columnstore index definition supports using a filtered condition. 若要尽量减少在 OLTP 表中添加列存储索引的性能影响,请使用筛选条件,以便创建仅关于运行工作负荷冷数据的非聚集列存储索引。To minimize the performance impact of adding a columnstore index on an OLTP table, use a filtered condition to create a nonclustered columnstore index on only the cold data of your operational workload.

数据子集的筛选索引Filtered Indexes for subsets of data

在列中只有少量相关值需要查询时,可以针对值的子集创建筛选索引。When a column only has a small number of relevant values for queries, you can create a filtered index on the subset of values. 例如,当列中的值大部分为 NULL 并且查询只从非 NULL 值中进行选择时,可以为非 NULL 数据行创建筛选索引。For example, when the values in a column are mostly NULL and the query selects only from the non-NULL values, you can create a filtered index for the non-NULL data rows. 由此得到的索引与对相同键列定义的全表非聚集索引相比,前者更小且维护开销更低。The resulting index will be smaller and cost less to maintain than a full-table nonclustered index defined on the same key columns.

例如, AdventureWorks2012 数据库中有一个包含了 2679 行的 Production.BillOfMaterials 表。For example, the AdventureWorks2012 database has a Production.BillOfMaterials table with 2679 rows. EndDate 列只有 199 行包含非 NULL 值,其他的 2480 行均包含 NULL。The EndDate column has only 199 rows that contain a non-NULL value and the other 2480 rows contain NULL. 下面的筛选索引将涵盖这样的查询:返回在该索引中定义的列的查询,以及只选择 EndDate中具有非 NULL 值的行的查询。The following filtered index would cover queries that return the columns defined in the index and that select only rows with a non-NULL value for EndDate.

CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate  
    ON Production.BillOfMaterials (ComponentID, StartDate)  
    WHERE EndDate IS NOT NULL ;  
GO  

筛选索引 FIBillOfMaterialsWithEndDate 对下面的查询有效。The filtered index FIBillOfMaterialsWithEndDate is valid for the following query. 您可以显示查询执行计划,以确定查询优化器是否使用了该筛选索引。You can display the query execution plan to determine if the query optimizer used the filtered index.

SELECT ProductAssemblyID, ComponentID, StartDate   
FROM Production.BillOfMaterials  
WHERE EndDate IS NOT NULL   
    AND ComponentID = 5   
    AND StartDate > '20080101' ;  

有关如何创建筛选索引以及如何定义筛选索引谓词表达式的详细信息,请参阅 创建筛选索引For more information about how to create filtered indexes and how to define the filtered index predicate expression, see Create Filtered Indexes.

异类数据的筛选索引Filtered Indexes for heterogeneous data

表中含有异类数据行时,可以为一种或多种类别的数据创建筛选索引。When a table has heterogeneous data rows, you can create a filtered index for one or more categories of data.

例如,为 Production.Product 表中列出的每种产品均分配了一个 ProductSubcategoryID,后者又与 Bikes、Components、Clothing 或 Accessories 产品类别相关联。For example, the products listed in the Production.Product table are each assigned to a ProductSubcategoryID, which are in turn associated with the product categories Bikes, Components, Clothing, or Accessories. 这些类别是异类类别,因为它们在 Production.Product 表中的列值并不是紧密相关的。These categories are heterogeneous because their column values in the Production.Product table are not closely correlated. 例如,对于每种产品类别,列 ColorReorderPointListPriceWeightClassStyle 均具有唯一特征。For example, the columns Color, ReorderPoint, ListPrice, Weight, Class, and Style have unique characteristics for each product category. 假设会经常查询具有子类别 27-36(包含端点)的 Accessories。Suppose that there are frequent queries for accessories which have subcategories between 27 and 36 inclusive. 通过对 Accessories 子类别创建筛选索引,可以提高对 Accessories 的查询的性能,如下例所示。You can improve the performance of queries for accessories by creating a filtered index on the accessories subcategories as shown in the following example.

CREATE NONCLUSTERED INDEX FIProductAccessories  
    ON Production.Product (ProductSubcategoryID, ListPrice)   
        Include (Name)  
WHERE ProductSubcategoryID >= 27 AND ProductSubcategoryID <= 36;  

筛选索引 FIProductAccessories 涵盖下面的查询,因为查询The filtered index FIProductAccessories covers the following query because the query

结果包含在该索引中,并且查询计划不包括基表查找。results are contained in the index and the query plan does not include a base table lookup. 例如,查询谓词表达式 ProductSubcategoryID = 33 是筛选索引谓词 ProductSubcategoryID >= 27ProductSubcategoryID <= 36的子集,查询谓词中的 ProductSubcategoryIDListPrice 列全都是索引中的键列,并且名称作为包含列存储在索引的叶级别。For example, the query predicate expression ProductSubcategoryID = 33 is a subset of the filtered index predicate ProductSubcategoryID >= 27 and ProductSubcategoryID <= 36, the ProductSubcategoryID and ListPrice columns in the query predicate are both key columns in the index, and name is stored in the leaf level of the index as an included column.

SELECT Name, ProductSubcategoryID, ListPrice  
FROM Production.Product  
WHERE ProductSubcategoryID = 33 AND ListPrice > 25.00 ;  

键列Key Columns

最好在筛选索引定义中包含少量的键或包含列,并且只包含查询优化器为查询执行计划选择筛选索引所需的列。It is a best practice to include a small number of key or included columns in a filtered index definition, and to incorporate only the columns that are necessary for the query optimizer to choose the filtered index for the query execution plan. 无论某一筛选索引是否涵盖了查询,查询优化器都可以为查询选择此筛选索引。The query optimizer can choose a filtered index for the query regardless of whether it does or does not cover the query. 但是,如果某一筛选索引涵盖了查询,则查询优化器更有可能选择此筛选索引。However, the query optimizer is more likely to choose a filtered index if it covers the query.

在某些情况下,筛选索引涵盖查询,但没有将筛选索引表达式中的列作为键或包含列包括在筛选索引定义中。In some cases, a filtered index covers the query without including the columns in the filtered index expression as key or included columns in the filtered index definition. 以下准则说明了筛选索引表达式中的列何时应为筛选索引定义中的键或包含列。The following guidelines explain when a column in the filtered index expression should be a key or included column in the filtered index definition. 这些示例引用了此前创建的筛选索引 FIBillOfMaterialsWithEndDateThe examples refer to the filtered index, FIBillOfMaterialsWithEndDate that was created previously.

如果筛选索引表达式等效于查询谓词并且查询并未在查询结果中返回筛选索引表达式中的列,则筛选索引表达式中的列不需要作为筛选索引定义中的键或包含列。A column in the filtered index expression does not need to be a key or included column in the filtered index definition if the filtered index expression is equivalent to the query predicate and the query does not return the column in the filtered index expression with the query results. 例如, FIBillOfMaterialsWithEndDate 涵盖下面的查询,因为查询谓词等效于筛选表达式,并且查询结果中未返回 EndDateFor example, FIBillOfMaterialsWithEndDate covers the following query because the query predicate is equivalent to the filter expression, and EndDate is not returned with the query results. FIBillOfMaterialsWithEndDate 不需要将 EndDate 作为筛选索引定义中的键或包含列。FIBillOfMaterialsWithEndDate does not need EndDate as a key or included column in the filtered index definition.

SELECT ComponentID, StartDate FROM Production.BillOfMaterials  
WHERE EndDate IS NOT NULL;   

如果查询谓词在不与筛选索引表达式等效的比较中使用了筛选索引表达式中的某列,则该列应为筛选索引定义中的键或包含列。A column in the filtered index expression should be a key or included column in the filtered index definition if the query predicate uses the column in a comparison that is not equivalent to the filtered index expression. 例如, FIBillOfMaterialsWithEndDate 对下面的查询有效,因为它从筛选索引中选择了行的子集。For example, FIBillOfMaterialsWithEndDate is valid for the following query because it selects a subset of rows from the filtered index. 但是,它不涵盖下面的查询,因为在比较 EndDate 中使用了 EndDate > '20040101',此比较不与筛选索引表达式等效。However, it does not cover the following query because EndDate is used in the comparison EndDate > '20040101', which is not equivalent to the filtered index expression. 查询处理器在不查找 EndDate值的情况下无法执行此查询。The query processor cannot execute this query without looking up the values of EndDate. 因此, EndDate 应为筛选索引定义中的键或包含列。Therefore, EndDate should be a key or included column in the filtered index definition.

SELECT ComponentID, StartDate FROM Production.BillOfMaterials  
WHERE EndDate > '20040101';   

如果筛选索引表达式中的某列在查询结果集中,则该列应为筛选索引定义中的键或包含列。A column in the filtered index expression should be a key or included column in the filtered index definition if the column is in the query result set. 例如, FIBillOfMaterialsWithEndDate 不涵盖下面的查询,因为它在查询结果中返回了 EndDate 列。For example, FIBillOfMaterialsWithEndDate does not cover the following query because it returns the EndDate column in the query results. 因此, EndDate 应为筛选索引定义中的键或包含列。Therefore, EndDate should be a key or included column in the filtered index definition.

SELECT ComponentID, StartDate, EndDate FROM Production.BillOfMaterials  
WHERE EndDate IS NOT NULL;  

表的聚集索引键不需要是筛选索引定义中的键或包含列。The clustered index key of the table does not need to be a key or included column in the filtered index definition. 聚集索引键自动包含在所有非聚集索引(包括筛选索引)中。The clustered index key is automatically included in all nonclustered indexes, including filtered indexes.

筛选谓词中的数据转换运算符Data Conversion Operators in the Filter Predicate

如果筛选索引结果的筛选索引表达式中指定的比较运算符会导致隐式或显式数据转换,则转换发生在比较运算符的左边时,会出现错误。If the comparison operator specified in the filtered index expression of the filtered index results in an implicit or explicit data conversion, an error will occur if the conversion occurs on the left side of a comparison operator. 解决方法是在比较运算符的右边编写包含数据转换运算符(CAST 或 CONVERT)的筛选索引表达式。A solution is to write the filtered index expression with the data conversion operator (CAST or CONVERT) on the right side of the comparison operator.

下面的示例创建一个包含多种数据类型的表。The following example creates a table with a variety of data types.

USE AdventureWorks2012;  
GO  
CREATE TABLE dbo.TestTable (a int, b varbinary(4));  

在下面的筛选索引定义中,列 b 隐式转换为整数数据类型,以便与常量 1 进行比较。In the following filtered index definition, column b is implicitly converted to an integer data type for the purpose of comparing it to the constant 1. 因为转换发生在筛选谓词中运算符的左边,所以这会生成错误消息 10611。This generates error message 10611 because the conversion occurs on the left hand side of the operator in the filtered predicate.

CREATE NONCLUSTERED INDEX TestTabIndex ON dbo.TestTable(a,b)  
WHERE b = 1;  

解决方法是将右侧的常量转换为与列 b的类型相同的类型,如下例所示:The solution is to convert the constant on the right hand side to be of the same type as column b, as seen in the following example:

CREATE INDEX TestTabIndex ON dbo.TestTable(a,b)  
WHERE b = CONVERT(Varbinary(4), 1);  

将数据转换从比较运算符的左边移动到右边可能会改变转换的含义。Moving the data conversion from the left side to the right side of a comparison operator might change the meaning of the conversion. 在上例中,将 CONVERT 运算符添加到右边时,相应的比较从整数比较更改为 varbinary 比较。In the above example, when the CONVERT operator was added to the right side, the comparison changed from an integer comparison to a varbinary comparison.

列存储索引设计指南Columnstore Index Design Guidelines

columnstore index 是使用列式数据格式(称为列存储)存储、检索和管理数据的技术。A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore. 有关详细信息,请参阅列存储索引概述For more information, refer to Columnstore Indexes overview.

有关详细信息,请参阅列存储索引 - 新增功能For version information, see Columnstore indexes - What's new.

列存储索引体系结构Columnstore Index Architecture

掌握这些基础知识可以更轻松地理解其他介绍如何有效使用列存储索引的文章。Knowing these basics will make it easier to understand other columnstore articles that explain how to use them effectively.

数据存储使用列存储索引和行存储压缩Data storage uses columnstore and rowstore compression

在提到列存储索引时,我们使用术语“行存储” 和“列存储” 来强调数据存储的格式。When discussing columnstore indexes, we use the terms rowstore and columnstore to emphasize the format for the data storage. 列存储索引使用这两种类型的存储。Columnstore indexes use both types of storage.

Clustered Columnstore IndexClustered Columnstore Index

  • “列存储” 是在逻辑上组织为包含行和列的表、在物理上以按列数据格式存储的数据。A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.

    列存储索引使用列存储格式以物理方式存储大部分数据。A columnstore index physically stores most of the data in columnstore format. 使用列存储格式时,数据将以列的形式压缩和解压缩。In columnstore format, the data is compressed and uncompressed as columns. 不需要解压缩每个行中未由查询请求的其他值。There is no need to uncompress other values in each row that are not requested by the query. 这样,便可以快速扫描大型表的整个列。This makes it fast to scan an entire column of a large table.

  • “行存储” 是在逻辑上组织为包含行和列的表、在物理上以按行数据格式存储的数据。A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. 这是存储关系表数据(如堆或聚集 B 树索引)的传统方法。This has been the traditional way to store relational table data such as a heap or clustered B-tree index.

    列存储索引还使用称为增量存储的行存储格式以物理方式存储某些行。A columnstore index also physically stores some rows in a rowstore format called a deltastore. 增量存储(也称为增量行组)是数量太少,不符合压缩到列存储中的条件的行的保存位置。The deltastore,also called delta rowgroups, is a holding place for rows that are too few in number to qualify for compression into the columnstore. 每个增量行组作为聚集 B 树索引实现。Each delta rowgroup is implemented as a clustered B-tree index.

  • 增量存储是数量太少,无法压缩到列存储中的行的保存位置 。The deltastore is a holding place for rows that are too few in number to be compressed into the columnstore. 增量存储以行存储格式存储行。The deltastore stores the rows in rowstore format.

针对行组和列段执行操作Operations are performed on rowgroups and column segments

列存储索引将行分组成可管理的单元。The columnstore index groups rows into manageable units. 其中每个单元称为一个行组 。Each of these units is called a rowgroup. 为提供最佳性能,行组中的行数大到能够提高压缩率,同时又小到能够从内存中操作受益。For best performance, the number of rows in a rowgroup is large enough to improve compression rates and small enough to benefit from in-memory operations.

例如,列存储索引针对行组执行以下操作:For example, the columnstore index performs these operations on rowgroups:

  • 将行组压缩到列存储中。Compresses rowgroups into the columnstore. 针对行组中的每个列段执行压缩。Compression is performed on each column segment within a rowgroup.
  • ALTER INDEX ... REORGANIZE 操作期间合并行组。Merges rowgroups during an ALTER INDEX ... REORGANIZE operation.
  • ALTER INDEX ... REBUILD 操作期间创建新行组。Creates new rowgroups during an ALTER INDEX ... REBUILD operation.
  • 在动态管理视图 (DMV) 中报告行组运行状况和碎片。Reports on rowgroup health and fragmentation in the dynamic management views (DMVs).

增量存储由一个或多个名为“增量行组”的行组构成 。The deltastore is comprised of one or more rowgroups called delta rowgroups. 每个增量行组是一个聚集 B 树索引,用于存储较小的大容量加载和插入操作,直到行组包含 1,048,576 行或者重新生成了索引。Each delta rowgroup is a clustered B-tree index that stores small bulk loads and inserts until the rowgroup contains 1,048,576 rows, or until the index is rebuilt. 当增量行组包含 1,048,576 行时,将被标记为已关闭,等待名为 tuple-mover 的进程将它压缩到列存储中。When a delta rowgroup contains 1,048,576 rows it is marked as closed, and waits for a process called the tuple-mover to compress it into the columnstore.

每个列在每个行组中都有自身的一些值。Each column has some of its values in each rowgroup. 这些值称为“列段” 。These values are called column segments. 每个行组包含表中每个列的一个列段。Each rowgroup contains one column segment for every column in the table. 每个列在每个行组中有一个列段。Each column has one column segment in each rowgroup.

Column segmentColumn segment

当列存储索引压缩行组时,会单独压缩每个列段。When the columnstore index compresses a rowgroup, it compresses each column segment separately. 若要解压缩整个列,列存储索引只需解压缩每个行组中的一个列段。To uncompress an entire column, the columnstore index only needs to uncompress one column segment from each rowgroup.

小规模加载和插入操作转到增量存储Small loads and inserts go to the deltastore

列存储索引一次至少可将 102,400 个行压缩到列存储索引中,以此提高列存储的压缩率和性能。A columnstore index improves columnstore compression and performance by compressing at least 102,400 rows at a time into the columnstore index. 若要批量压缩行,列存储索引可在增量存储中累积小规模的加载和插入操作。To compress rows in bulk, the columnstore index accumulates small loads and inserts in the deltastore. 增量存储操作在后台处理。The deltastore operations are handled behind the scenes. 若要返回正确的查询结果,聚集列存储索引将合并来自列存储和增量存储的查询结果。To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.

在以下情况下,行将转到增量存储:Rows go to the deltastore when they are:

  • 使用 INSERT INTO ... VALUES 语句插入。Inserted with the INSERT INTO ... VALUES statement.
  • 行位于批量加载操作的末尾,并且编号小于 102,400。At the end of a bulk load and they number less than 102,400.
  • 更新。Updated. 每个更新操作实现为删除并插入。Each update is implemented as a delete and an insert.

增量存储中还会存储标记为已删除、但实际并未从列存储中删除的已删除行的 ID 列表。The deltastore also stores a list of IDs for deleted rows that have been marked as deleted but not yet physically deleted from the columnstore.

增量行组已满时将压缩到列存储中When delta rowgroups are full they get compressed into the columnstore

聚集列存储索引最多收集每个增量行组中的 1,048,576 行,达到此数字后,会将行组压缩到列存储中。Clustered columnstore indexes collect up to 1,048,576 rows in each delta rowgroup before compressing the rowgroup into the columnstore. 这可以提高列存储索引的压缩率。This improves the compression of the columnstore index. 如果增量行组包含 1,048,576 行,列存储索引会将行组标记为已关闭。When a delta rowgroup contains 1,048,576 rows, the columnstore index marks the rowgroup as closed. 名为 tuple-mover 的后台进程查找每个已关闭的行组并将其压缩到列存储中。A background process, called the tuple-mover, finds each closed rowgroup and compresses it into the columnstore.

可以通过使用 ALTER INDEX 重新生成或重新组织索引,强制将增量行组压缩到列存储中。You can force delta rowgroups into the columnstore by using ALTER INDEX to rebuild or reorganize the index. 请注意,如果在压缩期间遇到内存压力,列存储索引可能会减少压缩行组中的行数。Note that if there is memory pressure during compression, the columnstore index might reduce the number of rows in the compressed rowgroup.

每个表分区具有自身的行组和增量行组Each table partition has its own rowgroups and delta rowgroups

索引、堆和列存储索引中的分区概念是相同的。The concept of partitioning is the same in both a clustered index, a heap, and a columnstore index. 将表分区会根据列值范围将表划分为较小的行组。Partitioning a table divides the table into smaller groups of rows according to a range of column values. 分区通常用于管理数据。It is often used for managing the data. 例如,可为每年的数据创建一个分区,然后使用分区切换将数据存档到更经济的存储中。For example, you could create a partition for each year of data, and then use partition switching to archive data to less expensive storage. 分区切换适用于列存储索引,可让你轻松将数据分区移到另一个位置。Partition switching works on columnstore indexes and makes it easy to move a partition of data to another location.

行组始终在表分区中定义。Rowgroups are always defined within a table partition. 将某个列存储索引分区后,每个分区都有其自身的压缩行组和增量行组。When a columnstore index is partitioned, each partition has its own compressed rowgroups and delta rowgroups.

每个分区可以包含多个增量行组Each partition can have multiple delta rowgroups

每个分区可以包含多个增量行组。Each partition can have more than one delta rowgroups. 如果列存储索引需要将数据添加到增量行组,而增量行组已锁定,则列存储索引会尝试获取其他增量行组中的锁。When the columnstore index needs to add data to a delta rowgroup and the delta rowgroup is locked, the columnstore index will try to obtain a lock on a different delta rowgroup. 如果没有任何可用的增量行组,列存储索引将创建新的增量行组。If there are no delta rowgroups available, the columnstore index will create a new delta rowgroup. 例如,具有 10 个分区的表可以轻松包含 20 个或更多的增量行组。For example, a table with 10 partitions could easily have 20 or more delta rowgroups.

可以在同一个表中组合列存储索引和行存储索引You can combine columnstore and rowstore indexes on the same table

非聚集索引包含基础表中部分或全部行与列的副本。A nonclustered index contains a copy of part or all of the rows and columns in the underlying table. 索引将定义为表的一个或多个列,并具有一个用于筛选行的可选条件。The index is defined as one or more columns of the table, and has an optional condition that filters the rows.

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,可以对行存储表创建可更新的非聚集列存储索引 。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can create an updatable nonclustered columnstore index on a rowstore table. 列存储索引将存储数据的副本,因此你需要提供额外的存储。The columnstore index stores a copy of the data so you do need extra storage. 但是,列存储索引中的数据压缩成的大小比行存储表所需的大小更小。However, the data in the columnstore index will compress to a smaller size than the rowstore table requires. 如果采取这种做法,你可以同时对列存储索引以及行存储索引上的事务运行分析。By doing this, you can run analytics on the columnstore index and transactions on the rowstore index at the same time. 当行存储表中的数据更改时,列存储将会更新,因此这两个索引适用于相同的数据。The column store is updated when data changes in the rowstore table, so both indexes are working against the same data.

SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,可以对一个列存储索引使用一个或多个非聚集行存储索引 。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x), you can have one or more nonclustered rowstore indexes on a columnstore index. 这样,便可以针对基础列存储上执行有效的表查找。By doing this, you can perform efficient table seeks on the underlying columnstore. 其他选项也可供使用。Other options become available too. 例如,可以通过在行存储表中使用 UNIQUE 约束来强制主键约束。For example, you can enforce a primary key constraint by using a UNIQUE constraint on the rowstore table. 由于不唯一的值无法插入行存储表,SQL Server 无法将值插入列存储。Since an non-unique value will fail to insert into the rowstore table, SQL Server cannot insert the value into the columnstore.

性能注意事项Performance considerations

  • 非聚集列存储索引定义支持使用筛选的条件。The nonclustered columnstore index definition supports using a filtered condition. 若要尽量减少在 OLTP 表中添加列存储索引的性能影响,请使用筛选条件,以便创建仅关于运行工作负荷冷数据的非聚集列存储索引。To minimize the performance impact of adding a columnstore index on an OLTP table, use a filtered condition to create a nonclustered columnstore index on only the cold data of your operational workload.

  • 一个内存中表可以有一个列存储索引。An in-memory table can have one columnstore index. 你可以在创建表时创建它,也可以稍后使用 ALTER TABLE (Transact-SQL) 来添加。You can create it when the table is created or add it later with ALTER TABLE (Transact-SQL). 在低于 SQL Server 2016 (13.x)SQL Server 2016 (13.x) 的版本中,仅基于磁盘的表可以有列存储索引。Before SQL Server 2016 (13.x)SQL Server 2016 (13.x), only a disk-based table could have a columnstore index.

有关详细信息,请参阅列存储索引 - 查询性能For more information, refer to Columnstore indexes - Query performance.

设计指南Design Guidance

  • 一个行存储表可以有一个可更新的非聚集列存储索引。A rowstore table can have one updateable nonclustered columnstore index. 在低于 SQL Server 2014 (12.x)SQL Server 2014 (12.x) 的版本中,非聚集列存储索引是只读的。Before SQL Server 2014 (12.x)SQL Server 2014 (12.x), the nonclustered columnstore index was read-only.

有关详细信息,请参阅列存储索引 - 设计指南For more information, refer to Columnstore indexes - Design Guidance.

哈希索引设计指南Hash Index Design Guidelines

所有内存优化表都至少必须有一个索引,因为行正是通过索引才连接在一起。All memory-optimized tables must have at least one index, because it is the indexes that connect the rows together. 在内存优化表中,每个索引也经过内存优化。On a memory-optimized table, every index is also memory-optimized. 哈希索引是内存优化表中可能存在的索引类型之一。Hash indexes are one of the possible index types in a memory-optimized table. 有关详细信息,请参阅内存优化表的索引For more information, see Indexes for Memory-Optimized Tables.

适用范围SQL Server 2014 (12.x)SQL Server 2014 (12.x)SQL Server 2017SQL Server 2017Applies to: SQL Server 2014 (12.x)SQL Server 2014 (12.x) through SQL Server 2017SQL Server 2017.

哈希索引体系结构Hash Index Architecture

哈希索引包含一个指针数组,该数组的每个元素被称为哈希桶。A hash index consists of an array of pointers, and each element of the array is called a hash bucket.

  • 每个桶为 8 个字节,用于存储键项的链接列表的内存地址。Each bucket is 8 bytes, which are used to store the memory address of a link list of key entries.
  • 每个条目是索引键的值,以及其在基础内存优化表中的对应行的地址。Each entry is a value for an index key, plus the address of its corresponding row in the underlying memory-optimized table.
  • 每个条目指向条目的链接列表中的下一个条目,所有都链接到当前桶。Each entry points to the next entry in a link list of entries, all chained to the current bucket.

必须在定义索引时指定哈希桶的数量:The number of buckets must be specified at index definition time:

  • 桶与表行或非重复值的比例越低,桶链接列表的平均长度就越长。The lower the ratio of buckets to table rows or to distinct values, the longer the average bucket link list will be.
  • 较短的链接列表比较长的链接列表执行更快。Short link lists perform faster than long link lists.
  • 哈希索引中的最大哈希桶数目为 1,073,741,824。The maximum number of buckets in hash indexes is 1,073,741,824.

提示

若要为你的数据确定合适的 BUCKET_COUNT ,请参阅 配置哈希索引桶计数To determine the right BUCKET_COUNT for your data, see Configuring the hash index bucket count.

哈希函数适用于索引键列,其结果用于确定键位于哪个哈希桶中。The hash function is applied to the index key columns and the result of the function determines what bucket that key falls into. 每个哈希桶都包含一个指向行的指针,这些行的哈希键值映射到该哈希桶。Each bucket has a pointer to rows whose hashed key values are mapped to that bucket.

用于哈希索引的哈希函数具有以下特征:The hashing function used for hash indexes has the following characteristics:

  • SQL ServerSQL Server 拥有一个用于所有哈希索引的哈希函数。has one hash function that is used for all hash indexes.
  • 哈希函数具有确定性。The hash function is deterministic. 相同的输入键值始终映射到哈希索引中的同一哈希桶。The same input key value is always mapped to the same bucket in the hash index.
  • 多个索引键可能映射到同一个哈希 Bucket。Multiple index keys may be mapped to the same hash bucket.
  • 哈希函数经过均衡处理,这意味着索引键值在哈希桶上的分布通常符合泊松分布或钟型曲线分布,而不是平坦的线性分布。The hash function is balanced, meaning that the distribution of index key values over hash buckets typically follows a Poisson or bell curve distribution, not a flat linear distribution.
  • 泊松分布并非均匀分布。Poisson distribution is not an even distribution. 索引键值并非均匀地分布在哈希 Bucket中。Index key values are not evenly distributed in the hash buckets.
  • 如果两个索引键映射到同一哈希桶,则产生哈希冲突 。If two index keys are mapped to the same hash bucket, there is a hash collision. 大量哈希冲突可影响读取操作的性能。A large number of hash collisions can have a performance impact on read operations. 现实的目标是 30% 的桶包含两个不同的键值。A realistic goal is for 30% of the buckets contain two different key values.

下图汇总了哈希索引和桶的交互作用。The interplay of the hash index and the buckets is summarized in the following image.

hekaton_tables_23dhekaton_tables_23d

配置哈希索引桶计数Configuring the hash index bucket count

哈希索引桶计数在索引创建时指定,可使用 ALTER TABLE...ALTER INDEX REBUILD 语法进行更改。The hash index bucket count is specified at index create time, and can be changed using the ALTER TABLE...ALTER INDEX REBUILD syntax.

在大多数情况下,桶计数在理想情况下应该介于索引键中非重复值数目的 1 到 2 倍之间。In most cases the bucket count would ideally be between 1 and 2 times the number of distinct values in the index key.
可能无法始终预测到某个特定索引键可能具有或将具有多少个值。You may not always be able to predict how many values a particular index key may have, or will have. 如果 BUCKET_COUNT 值在键值的实际数目的 10 倍以内,性能表现通常依然良好,并且高估通常比低估要好。Performance is usually still good if the BUCKET_COUNT value is within 10 times of the actual number of key values, and overestimating is generally better than underestimating.

太少 会带来以下缺点:Too few buckets has the following drawbacks:

  • 增多非重复键值的哈希冲突。More hash collisions of distinct key values.
  • 每个非重复值被迫与其他非重复值共享同一个桶。Each distinct value is forced to share the same bucket with a different distinct value.
  • 每个桶的平均链长度将会增加。The average chain length per bucket grows.
  • 桶链的长度越长,在索引中进行相等性查找的速度就越慢。The longer the bucket chain, the slower the speed of equality lookups in the index.

太多 会带来以下缺点:Too many buckets has the following drawbacks:

  • 桶计数过高可能会导致空桶增多。Too high a bucket count might result in more empty buckets.
  • 空桶会影响全文检索扫描的性能。Empty buckets impact the performance of full index scans. 如果定期执行这些操作,请考虑选择接近非重复索引键值数目的桶计数。If those are performed regularly, consider picking a bucket count close to the number of distinct index key values.
  • 空桶会使用内存,尽管每个桶只用 8 个字节。Empty buckets use memory, though each bucket uses only 8 bytes.

备注

添加更多的桶无益于减少共享重复值的、链接在一起的条目。Adding more buckets does nothing to reduce the chaining together of entries that share a duplicate value. 值重复率用于确定哈希是否为适当的索引类型,而不是用于计算桶计数。The rate of value duplication is used to decide whether a hash is the appropriate index type, not to calculate the bucket count.

性能注意事项Performance considerations

哈希索引的性能为:The performance of a hash index is:

  • WHERE 子句中的谓词为哈希索引键的每一列指定确切值时表现极好 。Excellent when the predicate in the WHERE clause specifies an exact value for each column in the hash index key. 如果有不等谓词,则哈希索引将恢复为扫描。A hash index will revert to a scan given an inequality predicate.
  • WHERE 子句中的谓词查找索引键中的一系列值时表现不佳 。Poor when the predicate in the WHERE clause looks for a range of values in the index key.
  • WHERE 子句中的谓词为双列哈希索引键的第一列规定了一个特定值,但没有为该键的其他列指定值时表现不佳 。Poor when the predicate in the WHERE clause stipulates one specific value for the first column of a two column hash index key, but does not specify a value for other columns of the key.

提示

谓词必须包括哈希索引键中的所有列。The predicate must include all columns in the hash index key. 哈希索引需要键(哈希)才能仔细查找索引。The hash index requires a key (to hash) to seek into the index. 如果索引键由两列组成,但 WHERE 子句仅提供第一列,则 SQL ServerSQL Server 没有完整的键可进行哈希运算。If an index key consists of two columns and the WHERE clause only provides the first column, SQL ServerSQL Server does not have a complete key to hash. 这将产生索引扫描查询计划。This will result in an index scan query plan.

如果使用哈希索引并且唯一索引键的数目是行计数的 100 倍(或者更多),则考虑增加桶计数以避免较大的行链,或改用非聚集索引If a hash index is used and the number of unique index keys is 100 times (or more) than the row count, consider either increasing to a larger bucket count to avoid large row chains, or use a nonclustered index instead.

声明注意事项Declaration considerations

哈希索引只能存在于内存优化表中,A hash index can exist only on a memory-optimized table. 而不能存在于基于磁盘的表中。It cannot exist on a disk-based table.

可以将哈希索引声明为:A hash index can be declared as:

  • “唯一”,或可以默认为“非唯一”。UNIQUE, or can default to Non-Unique.
  • NONCLUSTERED 为默认值。NONCLUSTERED, which is the default.

以下是在 CREATE TABLE 语句之外创建哈希索引的语法示例:The following is an example of the syntax to create a hash index, outside of the CREATE TABLE statement:

ALTER TABLE MyTable_memop  
ADD INDEX ix_hash_Column2 UNIQUE  
HASH (Column2) WITH (BUCKET_COUNT = 64);

行版本和垃圾回收Row versions and garbage collection

在内存优化表中,如果某行受 UPDATE 影响,表将创建该行的更新版本。In a memory-optimized table, when a row is affected by an UPDATE, the table creates an updated version of the row. 在更新事务期间,其他会话也许能够读取较旧版本的行,从而避免与行锁相关的性能下降。During the update transaction, other sessions might be able to read the older version of the row and thereby avoid the performance slowdown associated with a row lock.

哈希索引可能也会提供不同的条目版本来适应更新。The hash index might also have different versions of its entries to accommodate the update.

以后,当不再需要旧版本时,垃圾回收 (GC) 线程将遍历桶及其链接列表,以清理旧条目。Later when the older versions are no longer needed, a garbage collection (GC) thread traverses the buckets and their link lists to clean away old entries. 如果链接列表链长度较短,GC 线程的执行效果会更佳。The GC thread performs better if the link list chain lengths are short. 有关详细信息,请参阅内存中 OLTP 垃圾回收For more information, refer to In-Memory OLTP Garbage Collection.

内存优化非聚集索引设计指南Memory-Optimized Nonclustered Index Design Guidelines

非聚集索引是内存优化表中可能存在的一种索引类型。Nonclustered indexes are one of the possible index types in a memory-optimized table. 有关详细信息,请参阅内存优化表的索引For more information, see Indexes for Memory-Optimized Tables.

适用范围SQL Server 2014 (12.x)SQL Server 2014 (12.x)SQL Server 2017SQL Server 2017Applies to: SQL Server 2014 (12.x)SQL Server 2014 (12.x) through SQL Server 2017SQL Server 2017.

内存中非聚集索引体系结构In-memory Nonclustered Index Architecture

内存中非聚集索引最初由 Microsoft Research 在 2011 年提出设想并说明,并使用称为 Bw 树的数据结构实现。In-memory nonclustered indexes are implemented using a data structure called a Bw-Tree, originally envisioned and described by Microsoft Research in 2011. Bw 树是 B 树的无锁和无闩锁变体。A Bw-Tree is a lock and latch-free variation of a B-Tree. 有关详细信息,请参阅Bw 树:新硬件平台的 B 树For more details please see The Bw-Tree: A B-tree for New Hardware Platforms.

Bw 树处于一个非常高的级别,可以理解为按页 ID (PidMap) 组织的页映射,用于分配和重复使用页 ID (PidAlloc) 的设施,和在页映射中链接并相互链接的一组页。At a very high level the Bw-Tree can be understood as a map of pages organized by page ID (PidMap), a facility to allocate and reuse page IDs (PidAlloc) and a set of pages linked in the page map and to each other. 这三个高级别子组件组成了 Bw 树的基本内部结构。These three high level sub-components make up the basic internal structure of a Bw-Tree.

该结构一定程度上类似于常规 B 树,每页都有一组经过排序的键值,索引中的每个级别都指向更低的级别,并且叶级别指向数据行。The structure is similar to a normal B-Tree in the sense that each page has a set of key values that are ordered and there are levels in the index each pointing to a lower level and the leaf levels point to a data row. 但也存在一些差异。However there are several differences.

与哈希索引类似,多个数据行可链接在一起(版本)。Just like hash indexes, multiple data rows can be linked together (versions). 级别之间的页指针是逻辑页 ID,这些逻辑页 ID 是页映射表中的偏移量,该表又具有每页的物理地址。The page pointers between the levels are logical page IDs, which are offsets into a page mapping table, that in turn has the physical address for each page.

索引页没有就地更新。There are no in-place updates of index pages. 因此引入了新的增量页。New delta pages are introduced for this purpose.

  • 页更新不需要闩锁获锁定。No latching or locking is required for page updates.
  • 索引页没有固定大小。Index pages are not a fixed size.

每个非叶级别页中描述的键值是其指向的子级所包含的最高值,每一行还包含了该页逻辑页 ID。The key value in each non-leaf level page depicted is the highest value that the child that it points to contains and each row also contains that page logical page ID. 在叶级别页上,除键值外,还包含数据行的物理地址。On the leaf-level pages, along with the key value, it contains the physical address of the data row.

点查找与 B 树相似,但由于仅在一个方向链接页,SQL Server 数据库引擎SQL Server Database Engine跟随右页指针,其中每个非叶页具有其子级的最高值,而不是 B 树中的子级最低值。Point lookups are similar to B-Trees except that because pages are linked in only one direction, the SQL Server 数据库引擎SQL Server Database Engine follows right page pointers, where each non-leaf pages has the highest value of its child, rather than lowest value as in a B-Tree.

必须更改叶级别页时,SQL Server 数据库引擎SQL Server Database Engine不修改该页本身。If a Leaf-level page has to change, the SQL Server 数据库引擎SQL Server Database Engine does not modify the page itself. 相反,SQL Server 数据库引擎SQL Server Database Engine创建描述更改的增量记录,并将其追加到先前的页。Rather, the SQL Server 数据库引擎SQL Server Database Engine creates a delta record that describes the change, and appends it to the previous page. 然后将先前页的页映射表地址更新为增量记录的地址,使该地址成为当前页的物理地址。Then it also updates the page map table address for that previous page, to the address of the delta record which now becomes the physical address for this page.

管理 Bw 树结构可能需要三个不同的操作:整合、拆分和合并。There are three different operations that can be required for managing the structure of a Bw-Tree: consolidation, split and merge.

增量整合Delta Consolidation

增量记录的长链最终可能会降低搜索性能,因为这意味着通过索引搜索时,需要遍历这些长链。A long chain of delta records can eventually degrade search performance as it could mean we are traversing long chains when searching through an index. 如果向已有 16 个元素的链添加了新的增量记录,该增量记录中的更改将整合到引用的索引页中,然后将重新生成该页,其中包括触发整合的新增量记录指示的更改。If a new delta record is added to a chain that already has 16 elements, the changes in the delta records will be consolidated into the referenced index page, and the page will then be rebuilt, including the changes indicated by the new delta record that triggered the consolidation. 重新生成的新页使用相同的页 ID,但使用新的内存地址。The newly rebuilt page will have the same page ID but a new memory address.

hekaton_tables_23ehekaton_tables_23e

拆分页Split page

Bw 树中的索引页可按需增大,从存储单一行的大小开始,最多可存储 8 KB 的索引页。An index page in Bw-Tree grows on as-needed basis starting from storing a single row to storing a maximum of 8 KB. 索引页增大到 8 KB 后,插入一个新行会导致拆分索引页。Once the index page grows to 8 KB, a new insert of a single row will cause the index page to split. 对于内部页,这表示不再有添加另一个键值和指针的空间,对于叶页,这表示所有增量记录整合在一起后,行会因为太大而不能容纳在页中。For an internal page, this means when there is no more room to add another key value and pointer, and for a leaf page, it means that the row would be too big to fit on the page once all the delta records are incorporated. 叶页的页标头中的统计信息会持续跟踪整合增量记录所需的空间,该信息还会随着每个新增量记录的添加不断调整。The statistics information in the page header for a leaf page keeps track of how much space would be required to consolidate the delta records, and that information is adjusted as each new delta record is added.

拆分操作通过两个原子步骤完成。A Split operation is done in two atomic steps. 在下图中,假设因为正在插入值为 5 的键强制进行了叶页拆分,并存在一个指向当前叶级别页(键值 4)末尾的非叶页。In the picture below, assume a Leaf-page forces a split because a key with value 5 is being inserted, and a non-leaf page exists pointing to the end of the current Leaf-level page (key value 4).

hekaton_tables_23fhekaton_tables_23f

步骤 1: 分配两个新页(P1 和 P2),将旧 P1 页中的行拆分到这些新页上(包括新插入的行)。Step 1: Allocate two new pages P1 and P2, and split the rows from old P1 page onto these new pages, including the newly inserted row. 使用页映射表中的新槽存储 P2 页的物理地址。A new slot in Page Mapping Table is used to store the physical address of page P2. 此时,任何并发操作都还无法访问 P1 和 P2 页。These pages, P1 and P2 are not accessible to any concurrent operations yet. 此外,设置了从 P1 指向 P2 的逻辑指针。In addition, the logical pointer from P1 to P2 is set. 然后,在一个原子步骤中更新页映射表,将指针从旧 P1 更改到新 P1。Then, in one atomic step update the Page Mapping Table to change the pointer from old P1 to new P1.

步骤 2: 非叶页指向 P1,但是没有指针从非叶页直接指向 P2。Step 2: The non-leaf page points to P1 but there is no direct pointer from a non-leaf page to P2. 只能通过 P1 到达 P2。P2 is only reachable via P1. 要创建从非叶页指向 P2 的指针,需要分配新的非叶页(内部索引页),复制旧的非叶页中的所有行,并添加一个指向 P2 的新行。To create a pointer from a non-leaf page to P2, allocate a new non-leaf page (internal index page), copy all the rows from old non-leaf page, and add a new row to point to P2. 完成此操作后,在一个原子步骤中更新页映射表,将指针从旧的非叶页更改为新的非叶页。Once this is done, in one atomic step, update the Page Mapping Table to change the pointer from old non-leaf page to new non-leaf page.

合并页Merge page

如果 DELETE 操作导致某页的大小低于最大页大小(当前为 8 KB)的 10%,或该页上只有一行,那么该页会与相邻的页合并。When a DELETE operation results in a page having less than 10% of the maximum page size (currently 8 KB), or with a single row on it, that page will be merged with a contiguous page.

从某页中删除行时,会添加该删除操作的增量记录。When a row is deleted from a page, a delta record for the delete is added. 此外,还会进行相关检查,确定索引页(非叶页)是否符合合并条件。Additionally, a check is made to determine if the index page (non-leaf page) qualifies for Merge. 此检查还会验证删除行之后,剩余空间是否小于最大页大小的 10%。This check verifies if the remaining space after deleting the row will be less than 10% of maximum page size. 如果符合合并条件,可通过三个原子步骤完成合并。If it does qualify, the Merge is performed in three atomic steps.

下图中,假设 DELETE 操作删除键值 10。In the picture below, assume a DELETE operation will delete the key value 10.

hekaton_tables_23ghekaton_tables_23g

步骤 1: 创建表示键值 10(蓝色三角形)的增量页,并将该增量页在非叶页 Pp1 上的指针设置为新的增量页。Step 1: A delta page representing key value 10 (blue triangle) is created and its pointer in the non-leaf page Pp1 is set to the new delta page. 此外,创建一个特殊的合并增量页(绿色三角形),并链接该页,使其指向增量页。Additionally a special merge-delta page (green triangle) is created, and it is linked to point to the delta page. 在此阶段,这两个页(增量页和合并增量页)对任何并发事务都不可见。At this stage, both pages (delta page and merge-delta page) are not visible to any concurrent transaction. 在一个原子步骤中,页映射表中指向叶级别页 P1 的指针更新为指向合并增量页的指针。In one atomic step, the pointer to the Leaf-level page P1 in the Page Mapping Table is updated to point to the merge-delta page. 执行此步骤之后,Pp1 中键值 10 对应的项现在指向合并增量页。After this step, the entry for key value 10 in Pp1 now points to the merge-delta page.

步骤 2: 需要删除非叶页 Pp1 中表示键值 7 的行,然后将键值 10 对应的项更新为指向 P1。Step 2: The row representing key value 7 in the non-leaf page Pp1 needs to be removed, and the entry for key value 10 updated to point to P1. 为此,需要分配新的非叶页 Pp2,复制 Pp1 中除表示键值 7 的行之外的所有行,然后将键值 10 表示的行更新为指向页 P1。To do this, a new non-leaf page Pp2 is allocated and all the rows from Pp1 are copied except for the row representing key value 7; then the row for key value 10 is updated to point to page P1. 完成此操作后,在一个原子步骤中将 Pp1 的页映射表入口点更新为指向 Pp2。Once this is done, in one atomic step, the Page Mapping Table entry pointing to Pp1 is updated to point to Pp2. 无法再访问 Pp1。Pp1 is no longer reachable.

步骤 3: 合并叶级别页 P2 和 P1,并删除增量页。Step 3: The Leaf-level pages P2 and P1 are merged and the delta pages removed. 为此,分配新页 P3,合并 P2 和 P1 中的行,并在新的 P3 中包含增量页的更改。To do this, a new page P3 is allocated and the rows from P2 and P1 are merged, and the delta page changes are included in the new P3. 然后,在一个原子步骤中将页 P1 的页映射表入口点更新为指向页 P3。Then, in one atomic step, the Page Mapping Table entry pointing to page P1 is updated to point to page P3.

性能注意事项Performance considerations

使用不等谓词查询内存优化表时,非聚集索引的性能优于非聚集哈希索引。The performance of a nonclustered index is better than nonclustered hash indexes when querying a memory-optimized table with inequality predicates.

备注

内存优化的表中的列可以同时为哈希索引和非聚集索引的一部分。A column in a memory-optimized table can be part of both a hash index and a nonclustered index.

提示

非聚集索引键列中的列包含许多重复值时,更新、插入和删除的性能会降低。When a column in a nonclustered index key columns have many duplicate values, performance can degrade for updates, inserts, and deletes. 在这种情况下提高性能的一种方法是向非聚集索引添加另一列。One way to improve performance in this situation is to add another column to the nonclustered index.

其他阅读主题Additional Reading

CREATE INDEX (Transact-SQL) CREATE INDEX (Transact-SQL)
ALTER INDEX (Transact-SQL) ALTER INDEX (Transact-SQL)
CREATE XML INDEX (Transact-SQL)CREATE XML INDEX (Transact-SQL)
CREATE SPATIAL INDEX (Transact-SQL) CREATE SPATIAL INDEX (Transact-SQL)
重新组织和重新生成索引 Reorganize and Rebuild Indexes
使用 SQL Server 2008 索引视图提高性能Improving Performance with SQL Server 2008 Indexed Views
Partitioned Tables and IndexesPartitioned Tables and Indexes
创建主键 Create a Primary Key
内存优化表的索引Indexes for Memory-Optimized Tables
列存储索引概述Columnstore Indexes overview
内存优化表的哈希索引疑难解答 Troubleshooting Hash Indexes for Memory-Optimized Tables
内存优化表动态管理视图 (Transact-SQL) Memory-Optimized Table Dynamic Management Views (Transact-SQL)
与索引相关的动态管理视图和函数 (Transact-SQL) Index Related Dynamic Management Views and Functions (Transact-SQL)
计算列上的索引 Indexes on Computed Columns
索引和 ALTER TABLE Indexes and ALTER TABLE
自适应索引碎片整理Adaptive Index Defrag