统计信息Statistics

适用对象: yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

查询优化器使用统计信息来创建可提高查询性能的查询计划。The Query Optimizer uses statistics to create query plans that improve query performance. 对于大多数查询,查询优化器已为高质量查询计划生成必要的统计信息;但在一些情况下,需要创建附加的统计信息或修改查询设计以得到最佳结果。For most queries, the Query Optimizer already generates the necessary statistics for a high quality query plan; in some cases, you need to create additional statistics or modify the query design for best results. 本主题讨论用于高效使用查询优化统计信息的统计信息概念并提供指南。This topic discusses statistics concepts and provides guidelines for using query optimization statistics effectively.

组件和概念Components and Concepts

统计信息Statistics

查询优化的统计信息是二进制大型对象 (BLOB),这些对象包含与值在表或索引视图的一列或多列中的分布有关的统计信息。Statistics for query optimization are binary large objects (BLOBs) that contain statistical information about the distribution of values in one or more columns of a table or indexed view. 查询优化器使用这些统计信息来估计查询结果中的基数 或行数。The Query Optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result. 通过这些基数估计 ,查询优化器可以创建高质量的查询计划。These cardinality estimates enable the Query Optimizer to create a high-quality query plan. 例如,查询优化器可以根据谓词使用基数估计选择索引查找运算符而不是更耗资源的索引扫描运算符,假如这样做能提高查询性能。For example, depending on your predicates, the Query Optimizer could use cardinality estimates to choose the index seek operator instead of the more resource-intensive index scan operator, if doing so improves query performance.

每个统计信息对象都在包含一个或对个表列的列表上创建,并且包括将值的分布显示在第一列的直方图 。Each statistics object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column. 在多列上的统计信息对象也存储与各列中的值的相关性有关的统计信息。Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. 这些相关性统计信息(或 密度)根据列值的不同行的数目派生。These correlation statistics, or densities, are derived from the number of distinct rows of column values.

直方图Histogram

直方图 度量数据集中每个非重复值的出现频率。A histogram measures the frequency of occurrence for each distinct value in a data set. 查询优化器根据统计信息对象第一个键列中的列值来计算直方图,它选择列值的方法是以统计方式对行进行抽样或对表或视图中的所有行执行完全扫描。The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. 如果直方图是根据一组抽样行创建的,存储的总行数和非重复值总数则为估计值,且不必为整数。If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.

备注

SQL ServerSQL Server 中的直方图仅为单个列生成 - 统计信息对象键列集的第一列。Histograms in SQL ServerSQL Server are only built for a single column-the first column in the set of key columns of the statistics object.

若要创建直方图,查询优化器将对列值进行排序,计算与每个非重复列值匹配的值数,然后将列值聚合到最多 200 个连续直方图梯级中。To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. 每个直方图梯级都包含一个列值范围,后跟上限列值。Each histogram step includes a range of column values followed by an upper bound column value. 该范围包括介于两个边界值之间的所有可能列值,但不包括边界值自身。The range includes all possible column values between boundary values, excluding the boundary values themselves. 最小排序列值是第一个直方图梯级的上限值。The lowest of the sorted column values is the upper boundary value for the first histogram step.

有关详细信息,SQL ServerSQL Server 通过以下三个步骤从已排序的列值集创建直方图 :In more detail, SQL ServerSQL Server creates the histogram from the sorted set of column values in three steps:

  • 直方图初始化:在第一步中,处理始于排序集开始处的一个值序列,并收集 range_high_key、equal_rows、range_rows 和 distinct_range_rows 的最多 200 个值(在此步骤中,range_rows 和 distinct_range_rows 始终为零) 。Histogram initialization: In the first step, a sequence of values starting at the beginning of the sorted set is processed, and up to 200 values of range_high_key, equal_rows, range_rows, and distinct_range_rows are collected (range_rows and distinct_range_rows are always zero during this step). 已用尽所有输入或已找到 200 个值时,结束第一步。The first step ends either when all input has been exhausted, or when 200 values have been found.
  • 使用 Bucket 合并进行扫描:第二步中,按排序顺序处理从统计信息键前导列算起的每个额外值;将每个相继值添加到最后一个范围或在末尾创建一个新范围(这可能是因输入值已排序所致)。Scan with bucket merge: Each additional value from the leading column of the statistics key is processed in the second step, in sorted order; each successive value is either added to the last range or a new range at the end is created (this is possible because the input values are sorted). 如果创建了一个新范围,则一对现有相邻范围折叠为单个范围。If a new range is created, then one pair of existing, neighboring ranges is collapsed into a single range. 选择这对范围以最大限度减少信息丢失。This pair of ranges is selected in order to minimize information loss. 此方法使用最大差异 算法使直方图中的梯级数减至最少,并同时最大化边界值之间的差异。This method uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. 折叠后,梯级数在整个步骤中保持为 200。The number of steps after collapsing ranges stays at 200 throughout this step.
  • 直方图合并:第三步中,如果未丢失大量信息,可能折叠更多范围。Histogram consolidation: In the third step, more ranges may be collapsed if a significant amount of information is not lost. 直方图梯级数可以少于非重复值的数目,即使对于边界点少于 200 的列也是如此。The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. 因此,即使列包含超过 200 个唯一值,直方图具有的梯级数可能少于 200。Therefore, even if the column has more than 200 unique values, the histogram may have less than 200 steps. 对于仅包含唯一值的列,合并的直方图具有三个梯级(最小梯级数)。For a column consisting of only unique values, then the consolidated histogram will have a minimum of three steps.

备注

如果是使用示例而非全扫描生成直方图,则估计 equal_rows 、range_rows 、distinct_range_rows 和 average_range_rows 的值,因此它们无需为整数。If the histogram has been built using a sample rather than fullscan, then the values of equal_rows, range_rows, and distinct_range_rows and average_range_rows are estimated, and therefore they do not need to be whole integers.

下面的关系图显示包含六个梯级的直方图。The following diagram shows a histogram with six steps. 第一个上限值左侧的区域是第一个梯级。The area to the left of the first upper boundary value is the first step.

对于以上每个直方图步骤:For each histogram step above:

  • 粗线表示上限值 (range_high_key ) 和上限值的出现次数 (equal_rows )Bold line represents the upper boundary value (range_high_key) and the number of times it occurs (equal_rows)

  • range_high_key 左侧的纯色区域表示列值范围和每个列值的平均出现次数 (average_range_rows )。Solid area left of range_high_key represents the range of column values and the average number of times each column value occurs (average_range_rows). 第一个直方图梯级的 average_range_rows 始终是 0。The average_range_rows for the first histogram step is always 0.

  • 点线表示用于估计范围中的非重复值总数 (distinct_range_rows ) 和范围中的总指数 (range_rows )。Dotted lines represent the sampled values used to estimate total number of distinct values in the range (distinct_range_rows) and total number of values in the range (range_rows). 查询优化器使用 range_rows 和 distinct_range_rows 计算 average_range_rows ,且不存储抽样值。The query optimizer uses range_rows and distinct_range_rows to compute average_range_rows and does not store the sampled values.

密度向量Density Vector

密度 是有关给定列或列组合中重复项数目的信息,其计算公式为 1/(非重复值数目)。Density is information about the number of duplicates in a given column or combination of columns and it is calculated as 1/(number of distinct values). 查询优化器使用密度提高根据相同表或索引视图返回多个列的查询的基数估计。The query optimizer uses densities to enhance cardinality estimates for queries that return multiple columns from the same table or indexed view. 密度与值的选择性成反比,密度越小,值的选择性越大。As density decreases, selectivity of a value increases. 例如,在一个代表汽车的表中,很多汽车出自同一制造商,但每辆车都有唯一的车牌号 (VIN)。For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number (VIN). 因为 VIN 的密度比制造商低,所以 VIN 索引比制造商索引更具选择性。An index on the VIN is more selective than an index on the manufacturer, because VIN has lower density than manufacturer.

备注

频率是有关统计信息对象第一个键列中每个非重复值出现次数的信息,其计算公式为行计数 * 密度。Frequency is information about the occurrence of each distinct value in the first key column of the statistics object, and is calculated as row count * density. 最大频率 1 出现在具有唯一值的列中。A maximum frequency of 1 can be found in columns with unique values.

密度向量针对统计信息对象中的列的每个前缀包含一个密度。The density vector contains one density for each prefix of columns in the statistics object. 例如,如果统计信息对象包含键列 CustomerIdItemIdPrice,则根据以下每个列前缀计算密度。For example, if a statistics object has the key columns CustomerId, ItemId and Price, density is calculated on each of the following column prefixes.

列前缀Column prefix 计算密度所基于的对象Density calculated on
(CustomerId)(CustomerId) 具有与 CustomerId 匹配的值的行Rows with matching values for CustomerId
(CustomerId, ItemId)(CustomerId, ItemId) 具有与 CustomerId 和 ItemId 匹配的值的行Rows with matching values for CustomerId and ItemId
(CustomerId, ItemId, Price)(CustomerId, ItemId, Price) 具有与 CustomerId、ItemId 和 Price 匹配的值的行Rows with matching values for CustomerId, ItemId, and Price

筛选的统计信息Filtered Statistics

筛选统计信息可以提高以下从定义完善的数据子集选择数据的查询的查询性能。Filtered statistics can improve query performance for queries that select from well-defined subsets of data. 筛选统计信息使用筛选器谓词来选择统计信息中包括的数据子集。Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. 与全表统计信息相比,设计完美的筛选统计信息可以改进查询执行计划。Well-designed filtered statistics can improve the query execution plan compared with full-table statistics. 有关筛选器谓词的详细信息,请参阅 CREATE STATISTICS (Transact-SQL)For more information about the filter predicate, see CREATE STATISTICS (Transact-SQL). 有关何时创建筛选的统计信息的详细信息,请参阅本主题中的 何时创建统计信息 部分。For more information about when to create filtered statistics, see the When to Create Statistics section in this topic.

统计信息选项Statistics Options

可以设置三个选项来影响何时以及如何创建和更新统计信息。There are three options that you can set that affect when and how statistics are created and updated. 这些选项仅在数据库级别设置。These options are set at the database level only.

AUTO_CREATE_STATISTICS 选项AUTO_CREATE_STATISTICS Option

在自动创建统计信息选项 AUTO_CREATE_STATISTICS 为 ON 时,查询优化器将根据需要在查询谓词中的单独列上创建统计信息,以便改进查询计划的基数估计。When the automatic create statistics option, AUTO_CREATE_STATISTICS is ON, the Query Optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. 这些单列统计信息在现有统计信息对象中尚未具有直方图的列上创建。These single-column statistics are created on columns that do not already have a histogram in an existing statistics object. AUTO_CREATE_STATISTICS 选项不确定是否为索引创建了统计信息。The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. 此选项也不生成筛选统计信息。This option also does not generate filtered statistics. 它严格应用于全表的单列统计信息。It applies strictly to single-column statistics for the full table.

查询优化器通过使用 AUTO_CREATE_STATISTICS 选项创建统计信息时,统计信息名称以 _WA 开头。When the Query Optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA. 可以使用下面的查询来确定查询优化器是否为查询谓词列创建了统计信息。You can use the following query to determine if the Query Optimizer has created statistics for a query predicate column.

SELECT OBJECT_NAME(s.object_id) AS object_name,  
    COL_NAME(sc.object_id, sc.column_id) AS column_name,  
    s.name AS statistics_name  
FROM sys.stats AS s 
INNER JOIN sys.stats_columns AS sc  
    ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id  
WHERE s.name like '_WA%'  
ORDER BY s.name;  

AUTO_UPDATE_STATISTICS 选项AUTO_UPDATE_STATISTICS Option

在自动更新统计信息选项 AUTO_UPDATE_STATISTICS 为 ON 时,查询优化器将确定统计信息何时可能过期,然后在查询使用这些统计信息时更新它们。When the automatic update statistics option, AUTO_UPDATE_STATISTICS is ON, the Query Optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. 统计信息将在插入、更新、删除或合并操作更改表或索引视图中的数据分布后过期。Statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view. 查询优化器通过计算自最后统计信息更新后数据修改的次数并且将这一修改次数与某一阈值进行比较,确定统计信息何时可能过期。The Query Optimizer determines when statistics might be out-of-date by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold. 该阈值基于表中或索引视图中的行数。The threshold is based on the number of rows in the table or indexed view.

  • 直到 SQL Server 2014 (12.x)SQL Server 2014 (12.x)SQL ServerSQL Server 基于更改行的百分比使用阈值。Up to SQL Server 2014 (12.x)SQL Server 2014 (12.x), SQL ServerSQL Server uses a threshold based on the percent of rows changed. 这与表中的行数无关。This is regardless of the number of rows in the table. 阈值是:The threshold is:

    • 如果在评估时间统计信息时表基数为 500 或更低,则每达到 500 次修改时更新一次。If the table cardinality was 500 or less at the time statistics were evaluated, update for every 500 modifications.
    • 如果在评估时间统计信息时表基数大于 500,则每达到 500 + 修改次数的百分之二十时更新一次。If the table cardinality was above 500 at the time statistics were evaluated, update for every 500 + 20 percent of modifications.
  • SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,如果数据库兼容性级别为 130,SQL ServerSQL Server 将使用递减的动态统计信息更新阈值,此阈值将根据表中的行数进行调整。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) and under the database compatibility level 130, SQL ServerSQL Server uses a decreasing, dynamic statistics update threshold that adjusts according to the number of rows in the table. 它的计算方式为 1000 与当前的表基数乘积的平方根。This is calculated as the square root of the product of 1000 and the current table cardinality. 例如,如果表中包含 200 万行,则计算为 sqrt (1000 * 2000000) = 44721.359。For example if your table contains 2 million rows, then the calculation is? sqrt (1000 * 2000000) = 44721.359. 进行此更改后,将会更频繁地更新大型表的统计信息。With this change, statistics on large tables will be updated more often. 但是,如果数据库的兼容性级别低于 130,则 SQL Server 2014 (12.x)SQL Server 2014 (12.x) 阈值适用。However, if a database has a compatibility level below 130, then the SQL Server 2014 (12.x)SQL Server 2014 (12.x) threshold applies. ??

重要

SQL Server 2008 R2SQL Server 2008 R2 开始到 SQL Server 2014 (12.x)SQL Server 2014 (12.x),或者在 SQL Server 2016 (13.x)SQL Server 2016 (13.x)SQL Server 2017SQL Server 2017 中,如果数据库兼容性级别低于 130,且使用跟踪标志 2371SQL ServerSQL Server 将使用递减的动态统计信息更新阈值,此阈值将根据表中的行数进行调整。Starting with SQL Server 2008 R2SQL Server 2008 R2 through SQL Server 2014 (12.x)SQL Server 2014 (12.x), or in SQL Server 2016 (13.x)SQL Server 2016 (13.x) through SQL Server 2017SQL Server 2017 under database compatibility level lower than 130, use trace flag 2371 and SQL ServerSQL Server will use a decreasing, dynamic statistics update threshold that adjusts according to the number of rows in the table.

查询优化器在编译查询和执行缓存查询计划前,检查是否存在过期的统计信息。The Query Optimizer checks for out-of-date statistics before compiling a query and before executing a cached query plan. 在编译某一查询前,查询优化器使用查询谓词中的列、表和索引视图确定哪些统计信息可能过期。Before compiling a query, the Query Optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date. 在执行缓存查询计划前, 数据库引擎Database Engine 确认该查询计划引用最新的统计信息。Before executing a cached query plan, the 数据库引擎Database Engine verifies that the query plan references up-to-date statistics.

AUTO_UPDATE_STATISTICS 选项适用于为索引创建的统计信息对象、查询谓词中的单列以及使用 CREATE STATISTICS 语句创建的统计信息。The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. 此选项也适用于筛选的统计信息。This option also applies to filtered statistics.

有关控制 AUTO_UPDATE_STATISTICS 的详细信息,请参阅控制 SQL Server 中的 Autostat (AUTO_UPDATE_STATISTICS) 行为For more information about controlling AUTO_UPDATE_STATISTICS, see Controlling Autostat (AUTO_UPDATE_STATISTICS) behavior in SQL Server.

AUTO_UPDATE_STATISTICS_ASYNCAUTO_UPDATE_STATISTICS_ASYNC

异步统计信息更新选项 AUTO_UPDATE_STATISTICS_ASYNC 将确定查询优化器是使用同步统计信息更新还是异步统计信息更新。The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the Query Optimizer uses synchronous or asynchronous statistics updates. 默认情况下,异步统计信息更新选项为 OFF 状态,并且查询优化器以同步方式更新统计信息。By default, the asynchronous statistics update option is OFF, and the Query Optimizer updates statistics synchronously. AUTO_UPDATE_STATISTICS_ASYNC 选项适用于为索引创建的统计信息对象、查询谓词中的单列以及使用 CREATE STATISTICS 语句创建的统计信息。The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.

备注

若要在 SQL Server Management StudioSQL Server Management Studio 中设置异步统计信息更新选项,需在“数据库属性” 窗口的“选项” 页中同时将“自动更新统计信息” 和“自动异步更新统计信息” 选项设置为“True” 。To set the asynchronous statistics update option in SQL Server Management StudioSQL Server Management Studio, in the Options page of the Database Properties window, both Auto Update Statistics and Auto Update Statistics Asynchronously options need to be set to True.

统计信息更新可以是同步(默认设置)或异步的。Statistics updates can be either synchronous (the default) or asynchronous. 对于同步统计信息更新,查询将始终用最新的统计信息编译和执行;在统计信息过期时,查询优化器将在编译和执行查询前等待更新的统计信息。With synchronous statistics updates, queries always compile and execute with up-to-date statistics; When statistics are out-of-date, the Query Optimizer waits for updated statistics before compiling and executing the query. 对于异步统计信息更新,查询将用现有的统计信息编译,即使现有统计信息已过期。如果在查询编译时统计信息过期,查询优化器可以选择非最优查询计划。With asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date; The Query Optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. 在异步更新完成后编译的查询将从使用更新的统计信息中受益。Queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

执行更改数据分布的操作(例如截断表或对很大百分比的行执行大容量更新)时,考虑使用同步统计信息。Consider using synchronous statistics when you perform operations that change the distribution of data, such as truncating a table or performing a bulk update of a large percentage of the rows. 如果您在完成该操作后未更新统计信息,则使用同步统计信息将确保对更改的数据执行查询前统计信息是最新的。If you do not update the statistics after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data.

在以下情况下,考虑使用异步统计信息来实现可预测性更高的查询响应时间:Consider using asynchronous statistics to achieve more predictable query response times for the following scenarios:

  • 您的应用程序频繁执行相同的查询、类似的查询或类似的缓存查询计划。Your application frequently executes the same query, similar queries, or similar cached query plans. 与同步统计信息更新相比,使用异步统计信息更新时你的查询响应时间可能具有更高的可预测性,因为查询优化器可以执行传入的查询而不必等待最新的统计信息。Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the Query Optimizer can execute incoming queries without waiting for up-to-date statistics. 这避免延迟某些查询,而不延迟其他查询。This avoids delaying some queries and not others.

  • 您的应用程序遇到了客户端请求超时,这些超时是由于一个或多个查询正在等待更新后的统计信息所导致的。Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. 在某些情况下,等待同步统计信息可能会导致应用程序因过长超时而失败。In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

INCREMENTALINCREMENTAL

CREATE STATISTICS 的 INCREMENTAL 选项为 ON 时,创建的统计信息为每个分区的统计信息。When INCREMENTAL option of CREATE STATISTICS is ON, the statistics created are per partition statistics. 为 OFF 时,删除统计信息树并且 SQL ServerSQL Server 重新计算统计信息。When OFF, the statistics tree is dropped and SQL ServerSQL Server re-computes the statistics. 默认为 OFF。The default is OFF. 此设置覆盖数据库级别 INCREMENTAL 属性。This setting overrides the database level INCREMENTAL property. 要深入了解如何创建增量统计信息,请参阅 CREATE STATISTICS (Transact-SQL)For more information about creating incremental statistics, see CREATE STATISTICS (Transact-SQL). 要深入了解如何自动创建每个分区的统计信息,请参阅数据库属性(“选项”页)ALTER DATABASE SET 选项 (Transact-SQL)For more information about creating per partition statistics automatically, see Database Properties (Options Page) and ALTER DATABASE SET Options (Transact-SQL).

在将新分区添加到某个大型表时,应更新统计信息以便包括这些新分区。When new partitions are added to a large table, statistics should be updated to include the new partitions. 但是,浏览整个表(FULLSCAN 或 SAMPLE 选项)所需的时间可能会相当长。However the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long. 此外,因为可能只需针对新分区的统计信息,所以,扫描整个表不是必需的。Also, scanning the entire table isn't necessary because only the statistics on the new partitions might be needed. 该增量选项将在每个分区的基础上创建和存储统计信息,并且在更新时,只刷新需要新统计信息的那些分区上的统计信息The incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics

如果不支持每个分区统计信息,将忽略该选项并生成警告。If per partition statistics are not supported the option is ignored and a warning is generated. 对于以下统计信息类型,不支持增量统计信息:Incremental stats are not supported for following statistics types:

  • 使用未与基表的分区对齐的索引创建的统计信息。Statistics created with indexes that are not partition-aligned with the base table.
  • 对 Always On 可读辅助数据库创建的统计信息。Statistics created on Always On readable secondary databases.
  • 对只读数据库创建的统计信息。Statistics created on read-only databases.
  • 对筛选的索引创建的统计信息。Statistics created on filtered indexes.
  • 对视图创建的统计信息。Statistics created on views.
  • 对内部表创建的统计信息。Statistics created on internal tables.
  • 使用空间索引或 XML 索引创建的统计信息。Statistics created with spatial indexes or XML indexes.

适用范围SQL Server 2014 (12.x)SQL Server 2014 (12.x)SQL Server 2017SQL Server 2017Applies to: SQL Server 2014 (12.x)SQL Server 2014 (12.x) through SQL Server 2017SQL Server 2017.

何时创建统计信息When to create statistics

查询优化器已通过以下方式创建统计信息:The Query Optimizer already creates statistics in the following ways:

  1. 在创建索引时,查询优化器为表或视图上的索引创建统计信息。The Query Optimizer creates statistics for indexes on tables or views when the index is created. 这些统计信息将创建在索引的键列上。These statistics are created on the key columns of the index. 如果索引是一个筛选索引,则查询优化器将在为该筛选索引指定的行的同一子集上创建筛选统计信息。If the index is a filtered index, the Query Optimizer creates filtered statistics on the same subset of rows specified for the filtered index. 有关筛选索引的详细信息,请参阅创建筛选索引CREATE INDEX (Transact-SQL)For more information about filtered indexes, see Create Filtered Indexes and CREATE INDEX (Transact-SQL).

  2. AUTO_CREATE_STATISTICS 为 ON 时,查询优化器为查询谓词中的单列创建统计信息。The Query Optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.

期待您的反馈 :如果在本文中发现过时或不正确的内容(如步骤或代码示例),请告诉我们。We are listening: If you find something outdated or incorrect in this article, such as a step or a code example, please tell us. 可以单击此页底部的“反馈” 部分中的“本页” 按钮。You can click the This page button in the Feedback section at the bottom of this page. 我们通常在第二天阅读有关 SQL 的每项反馈。We read every item of feedback about SQL, typically the next day. 谢谢。Thanks.

对于大多数查询,用于创建统计信息的这两个方法就可以确保高质量的查询计划;但在很少的情况下,可以通过使用 CREATE STATISTICS 语句创建附加的统计信息来改进查询计划。For most queries, these two methods for creating statistics ensure a high-quality query plan; in a few cases, you can improve query plans by creating additional statistics with the CREATE STATISTICS statement. 这些附加的统计信息可以捕获查询优化器在为索引或单列创建统计信息时并未考虑的统计关联。These additional statistics can capture statistical correlations that the Query Optimizer does not account for when it creates statistics for indexes or single columns. 应用程序可能在表数据中具有附加的统计关联,如果在统计信息对象中计入这些关联,可能会令查询优化器改进查询计划。Your application might have additional statistical correlations in the table data that, if calculated into a statistics object, could enable the Query Optimizer to improve query plans. 例如,针对数据行子集的筛选统计信息或针对查询谓词列的多列统计信息可改进查询计划。For example, filtered statistics on a subset of data rows or multicolumn statistics on query predicate columns might improve the query plan.

在使用 CREATE STATISTICS 语句创建统计信息时,我们建议保持 AUTO_CREATE_STATISTICS 选项为 ON,以便查询优化器继续为查询谓词列定期创建单列统计信息。When creating statistics with the CREATE STATISTICS statement, we recommend keeping the AUTO_CREATE_STATISTICS option on so that the Query Optimizer continues to routinely create single-column statistics for query predicate columns. 有关查询谓词的详细信息,请参阅搜索条件 (Transact-SQL)For more information about query predicates, see Search Condition (Transact-SQL).

在以下任何情况适用时,考虑使用 CREATE STATISTICS 语句创建统计信息:Consider creating statistics with the CREATE STATISTICS statement when any of the following applies:

  • 数据库引擎Database Engine 优化顾问建议创建统计信息。The 数据库引擎Database Engine Tuning Advisor suggests creating statistics.
  • 查询谓词包含尚不位于相同索引中的多个相关列。The query predicate contains multiple correlated columns that are not already in the same index.
  • 查询从数据的子集中选择数据。The query selects from a subset of data.
  • 查询缺少统计信息。The query has missing statistics.

查询谓词包含多个相关列Query Predicate contains multiple correlated columns

在某一查询谓词包含具有跨列关系和相关性的多列时,针对多列的统计信息可能会改进查询计划。When a query predicate contains multiple columns that have cross-column relationships and dependencies, statistics on the multiple columns might improve the query plan. 针对多列的统计信息包含称作密度 的跨列相关统计信息,这些统计信息不可用于单列统计信息。Statistics on multiple columns contain cross-column correlation statistics, called densities, that are not available in single-column statistics. 在查询结果依赖于多列之间的数据关系时,密度可以改进基数估计。Densities can improve cardinality estimates when query results depend on data relationships among multiple columns.

如果多个列已处于同一索引中,则多列统计信息对象已存在并且不必手动创建它。If the columns are already in the same index, the multicolumn statistics object already exists and it is not necessary to create it manually. 如果这些列尚未处于同一索引中,则可以通过对多个列创建索引或通过使用 CREATE STATISTICS 语句,创建多列统计信息。If the columns are not already in the same index, you can create multicolumn statistics by creating an index on the columns or by using the CREATE STATISTICS statement. 与统计信息对象相比,它要求更多的系统资源来维护索引。It requires more system resources to maintain an index than a statistics object. 如果应用程序不要求多列索引,您可以通过创建统计信息对象但不创建索引,有效地利用系统资源。If the application does not require the multicolumn index, you can economize on system resources by creating the statistics object without creating the index.

在创建多列统计信息时,统计信息对象定义中列的顺序将影响生成基数估计的密度的效率。When creating multicolumn statistics, the order of the columns in the statistics object definition affects the effectiveness of densities for making cardinality estimates. 统计信息对象在统计信息对象定义中存储键列的每个前缀的密度。The statistics object stores densities for each prefix of key columns in the statistics object definition. 有关密度的详细信息,请参阅此页中的密度部分。For more information about densities, see Density section in this page.

为了创建用于基数估计的密度,查询谓词中的列必须匹配统计信息对象定义中列的前缀之一。To create densities that are useful for cardinality estimates, the columns in the query predicate must match one of the prefixes of columns in the statistics object definition. 例如,以下内容在列 LastNameMiddleNameFirstName上创建多列统计信息对象。For example, the following creates a multicolumn statistics object on the columns LastName, MiddleName, and FirstName.

USE AdventureWorks2012;  
GO  
IF EXISTS (SELECT name FROM sys.stats  
    WHERE name = 'LastFirst'  
    AND object_ID = OBJECT_ID ('Person.Person'))  
DROP STATISTICS Person.Person.LastFirst;  
GO  
CREATE STATISTICS LastFirst ON Person.Person (LastName, MiddleName, FirstName);  
GO  

在此示例中,统计信息对象 LastFirst 具有以下列前缀的密度:(LastName)(LastName, MiddleName)(LastName, MiddleName, FirstName)In this example, the statistics object LastFirst has densities for the following column prefixes: (LastName), (LastName, MiddleName), and (LastName, MiddleName, FirstName). 密度不可用于 (LastName, FirstName)The density is not available for (LastName, FirstName). 如果查询使用 LastNameFirstName ,并且没有使用 MiddleName,则密度不可用于基数估计。If the query uses LastName and FirstName without using MiddleName, the density is not available for cardinality estimates.

查询从数据的子集中选择数据Query Selects from a subset of data

在查询优化器为单个列和索引创建统计信息时,它为所有行中的值创建统计信息。When the Query Optimizer creates statistics for single columns and indexes, it creates the statistics for the values in all rows. 在查询从行的某一子集中选择数据时,这一行子集具有唯一的数据分布,筛选统计信息可以改进查询计划。When queries select from a subset of rows, and that subset of rows has a unique data distribution, filtered statistics can improve query plans. 可以通过使用 CREATE STATISTICS 语句并在此语句中用 WHERE 子句定义筛选器谓词表达式来创建筛选统计信息。You can create filtered statistics by using the CREATE STATISTICS statement with the WHERE clause to define the filter predicate expression.

例如,使用 AdventureWorks2012AdventureWorks2012 时,Production.Product 表中的每种产品都属于 Production.ProductCategory 表中的以下四个类别之一:自行车、组件、服装或附件。For example, using AdventureWorks2012AdventureWorks2012, each product in the Production.Product table belongs to one of four categories in the Production.ProductCategory table: Bikes, Components, Clothing, and Accessories. 上述每个类别在重量方面的数据分布均不同:自行车的重量范围为 13.77 到 30.0,部件的重量范围为 2.12 到 1050.00 且有些部件的重量为 NULL 值,服装的重量全部为 NULL,附件的重量也为 NULL。Each of the categories has a different data distribution for weight: bike weights range from 13.77 to 30.0, component weights range from 2.12 to 1050.00 with some NULL values, clothing weights are all NULL, and accessory weights are also NULL.

使用自行车为例,针对所有自行车重量的筛选统计信息将向查询优化器提供更精确的统计信息,并且与全表统计信息或者针对 Weight 列的不存在的统计信息相比,可以改进查询计划质量。Using Bikes as an example, filtered statistics on all bike weights will provide more accurate statistics to the Query Optimizer and can improve the query plan quality compared with full-table statistics or nonexistent statistics on the Weight column. 该自行车重量列很适合于筛选统计信息,但如果重量查找的数目相对较少,则不见得适合于筛选索引。The bike weight column is a good candidate for filtered statistics but not necessarily a good candidate for a filtered index if the number of weight lookups is relatively small. 筛选索引所提供的在查找方面的性能提升可能抵不上将筛选索引添加到数据库所导致的额外维护和存储成本。The performance gain for lookups that a filtered index provides might not outweigh the additional maintenance and storage cost for adding a filtered index to the database.

以下语句对自行车的所有子类别创建 BikeWeights 筛选统计信息。The following statement creates the BikeWeights filtered statistics on all of the subcategories for Bikes. 筛选的谓词表达式通过使用比较 Production.ProductSubcategoryID IN (1,2,3)枚举所有自行车子类别,对自行车进行定义。The filtered predicate expression defines bikes by enumerating all of the bike subcategories with the comparison Production.ProductSubcategoryID IN (1,2,3). 该谓词无法使用“自行车”类别名称,因为它存储于 Production.ProductCategory 表中,并且筛选表达式中的所有列都必须位于相同的表中。The predicate cannot use the Bikes category name because it is stored in the Production.ProductCategory table, and all columns in the filter expression must be in the same table.

USE AdventureWorks2012;
GO
IF EXISTS ( SELECT name FROM sys.stats
    WHERE name = 'BikeWeights'
    AND object_ID = OBJECT_ID ('Production.Product'))
DROP STATISTICS Production.Product.BikeWeights;
GO
CREATE STATISTICS BikeWeights
    ON Production.Product (Weight)
WHERE ProductSubcategoryID IN (1,2,3);
GO

查询优化器可使用 BikeWeights 筛选的统计信息来改进下面这个查询的查询计划,此查询选择重量超过 25 的所有自行车。The Query Optimizer can use the BikeWeights filtered statistics to improve the query plan for the following query that selects all of the bikes that weigh more than 25.

SELECT P.Weight AS Weight, S.Name AS BikeName  
FROM Production.Product AS P  
    JOIN Production.ProductSubcategory AS S   
    ON P.ProductSubcategoryID = S.ProductSubcategoryID  
WHERE P.ProductSubcategoryID IN (1,2,3) AND P.Weight > 25  
ORDER BY P.Weight;  
GO  

查询识别缺少的统计信息Query identifies missing statistics

如果错误或其他事件阻止查询优化器创建统计信息,则查询优化器将不使用统计信息创建查询计划。If an error or other event prevents the Query Optimizer from creating statistics, the Query Optimizer creates the query plan without using statistics. 查询优化器将统计信息标记为缺失,并且在下次执行查询时尝试重新生成统计信息。The Query Optimizer marks the statistics as missing and attempts to regenerate the statistics the next time the query is executed.

在使用 SQL Server Management StudioSQL Server Management Studio以图形方式显示查询的执行计划时,缺少的统计信息将予以警告显示(表名称以红色文本显示)。Missing statistics are indicated as warnings (table name in red text) when the execution plan of a query is graphically displayed using SQL Server Management StudioSQL Server Management Studio. 另外,使用 监视 Missing Column Statistics SQL Server ProfilerSQL Server Profiler 事件类可以指明何时缺少统计信息。Additionally, monitoring the Missing Column Statistics event class by using SQL Server ProfilerSQL Server Profiler indicates when statistics are missing. 有关详细信息,请参阅 Errors and Warnings 事件类别(数据库引擎)For more information, see Errors and Warnings Event Category (Database Engine).

如果缺少统计信息,则执行以下步骤:If statistics are missing, perform the following steps:

当有关只读数据库或只读快照的统计信息丢失或变得陈旧时, 数据库引擎Database Engine 将创建临时统计信息并在 tempdb中进行维护。When statistics on a read-only database or read-only snapshot are missing or stale, the 数据库引擎Database Engine creates and maintains temporary statistics in tempdb. 数据库引擎Database Engine 创建临时统计信息时,将在统计信息名称后追加后缀 _readonly_database_statistic ,以便将临时统计信息与永久统计信息加以区分。When the 数据库引擎Database Engine creates temporary statistics, the statistics name is appended with the suffix _readonly_database_statistic to differentiate the temporary statistics from the permanent statistics. 后缀 _readonly_database_statistic 是为 SQL ServerSQL Server 生成的统计信息预留的。The suffix _readonly_database_statistic is reserved for statistics generated by SQL ServerSQL Server. 可以在读写数据库上创建和重新生成临时统计信息的脚本。Scripts for the temporary statistics can be created and reproduced on a read-write database. 编写脚本时,Management StudioManagement Studio 将统计信息名称的后缀从 _readonly_database_statistic 更改为 _readonly_database_statistic_scripted 。When scripted, Management StudioManagement Studio changes the suffix of the statistics name from _readonly_database_statistic to _readonly_database_statistic_scripted.

只有 SQL ServerSQL Server 可以创建和更新临时统计信息。Only SQL ServerSQL Server can create and update temporary statistics. 但是,您可以使用用于永久统计信息的相同工具来删除临时统计信息和监视统计信息属性:However, you can delete temporary statistics and monitor statistics properties using the same tools that you use for permanent statistics:

  • 使用 DROP STATISTICS 语句删除临时统计信息。Delete temporary statistics using the DROP STATISTICS statement.
  • 使用 sys.statssys.stats_columns 目录视图监视统计信息 。Monitor statistics using the sys.stats and sys.stats_columns catalog views. sys_stats 包含 is_temporary 列,用于指示哪些统计信息是永久的,哪些统计信息是临时的。sys_stats includes the is_temporary column, to indicate which statistics are permanent and which are temporary.

因为临时统计信息存储于 tempdb中,所以重新启动 SQL ServerSQL Server 服务将导致所有临时统计信息消失。Because temporary statistics are stored in tempdb, a restart of the SQL ServerSQL Server service causes all temporary statistics to disappear.

何时更新统计信息When to update statistics

查询优化器确定统计信息何时可能过期,然后在查询计划需要统计信息时更新它们。The Query Optimizer determines when statistics might be out-of-date and then updates them when they are needed for a query plan. 在某些情况下,将 AUTO_UPDATE_STATISTICS 设置为 ON 时,可以通过更频繁地更新统计信息来优化查询计划,并因此提高查询性能。In some cases you can improve the query plan and therefore improve query performance by updating statistics more frequently than occur when AUTO_UPDATE_STATISTICS is on. 可以使用 UPDATE STATISTICS 语句或存储过程 sp_updatestats 来更新统计信息。You can update statistics with the UPDATE STATISTICS statement or the stored procedure sp_updatestats.

更新统计信息可确保查询使用最新的统计信息进行编译。Updating statistics ensures that queries compile with up-to-date statistics. 不过,更新统计信息会导致查询重新编译。However, updating statistics causes queries to recompile. 我们建议不要太频繁地更新统计信息,因为需要在改进查询计划和重新编译查询所用时间之间权衡性能。We recommend not updating statistics too frequently because there is a performance tradeoff between improving query plans and the time it takes to recompile queries. 具体的折衷方案取决于你的应用程序。The specific tradeoffs depend on your application.

在使用 UPDATE STATISTICS 或 sp_updatestats 更新统计信息时,我们建议保持将 AUTO_UPDATE_STATISTICS 设置为 ON,以便查询优化器继续定期更新统计信息。When updating statistics with UPDATE STATISTICS or sp_updatestats, we recommend keeping AUTO_UPDATE_STATISTICS set to ON so that the Query Optimizer continues to routinely update statistics. 有关如何更新列、索引、表或索引视图的统计信息的详细信息,请参阅 UPDATE STATISTICS (Transact-SQL)For more information about how to update statistics on a column, an index, a table, or an indexed view, see UPDATE STATISTICS (Transact-SQL). 有关如何为数据库中的所有用户定义表和内部表更新统计信息的信息,请参阅存储过程 sp_updatestats (Transact-SQL)For information about how to update statistics for all user-defined and internal tables in the database, see the stored procedure sp_updatestats (Transact-SQL).

若要确定最近一次更新统计信息的时间,请使用 sys.dm_db_stats_propertiesSTATS_DATE 函数。To determine when statistics were last updated, use the sys.dm_db_stats_properties or STATS_DATE functions.

在以下情况下考虑更新统计信息:Consider updating statistics for the following conditions:

  • 查询执行时间很长。Query execution times are slow.
  • 在升序或降序键列上发生插入操作。Insert operations occur on ascending or descending key columns.
  • 在维护操作后。After maintenance operations.

查询执行时间很长Query execution times are slow

如果查询响应时间很长或不可预知,则在执行其他故障排除步骤前,确保查询具有最新的统计信息。If query response times are slow or unpredictable, ensure that queries have up-to-date statistics before performing additional troubleshooting steps.

在升序或降序键列上发生插入操作Insert operations occur on ascending or descending key columns

与查询优化器执行的统计信息更新相比,升序或降序键列(例如 IDENTITY 或实时时间戳列)上的统计信息可能要求更频繁地更新。Statistics on ascending or descending key columns, such as IDENTITY or real-time timestamp columns, might require more frequent statistics updates than the Query Optimizer performs. 插入操作将新值追加到升序或降序键列上。Insert operations append new values to ascending or descending columns. 添加的行的数目可能过小,以致无法触发统计信息更新。The number of rows added might be too small to trigger a statistics update. 如果统计信息不是最新的并且查询从最频繁添加的行中选择数据,则当前统计信息将没有这些新值的基数估计。If statistics are not up-to-date and queries select from the most recently added rows, the current statistics will not have cardinality estimates for these new values. 这可能导致不精确的基数估计和查询性能低下。This can result in inaccurate cardinality estimates and slow query performance.

例如,如果统计信息未更新以包括最近销售订单日期的基数估计,则从最近销售订单日期选择的查询将具有不精确的基数估计。For example, a query that selects from the most recent sales order dates will have inaccurate cardinality estimates if the statistics are not updated to include cardinality estimates for the most recent sales order dates.

在维护操作后After maintenance operations

考虑在执行维护过程(例如截断表或对很大百分比的行执行大容量插入)后更新统计信息。Consider updating statistics after performing maintenance procedures that change the distribution of data, such as truncating a table or performing a bulk insert of a large percentage of the rows. 这可以避免在将来查询等待自动统计信息更新时在查询处理中出现延迟。This can avoid future delays in query processing while queries wait for automatic statistics updates.

索引的重新生成、碎片整理或重新组织之类的操作不会更改数据的分布。Operations such as rebuilding, defragmenting, or reorganizing an index do not change the distribution of data. 因此,在执行 ALTER INDEX REBUILDDBCC REINDEXDBCC INDEXDEFRAGALTER INDEX REORGANIZE 操作后,无需更新统计信息。Therefore, you do not need to update statistics after performing ALTER INDEX REBUILD, DBCC DBREINDEX, DBCC INDEXDEFRAG, or ALTER INDEX REORGANIZE operations. 查询优化器将在使用 ALTER INDEX REBUILD 或 DBCC DBREINDEX 对表或视图重新生成索引时更新统计信息,但是,此统计信息更新是重新创建索引的副产品。The Query Optimizer updates statistics when you rebuild an index on a table or view with ALTER INDEX REBUILD or DBCC DBREINDEX, however this statistics update is a byproduct of re-creating the index. 在 DBCC INDEXDEFRAG 或 ALTER INDEX REORGANIZE 操作后,查询优化器并不更新统计信息。The Query Optimizer does not update statistics after DBCC INDEXDEFRAG or ALTER INDEX REORGANIZE operations.

提示

SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4 开始,使用 CREATE STATISTICS (Transact-SQL)UPDATE STATISTICS (Transact-SQL) 的 PERSIST_SAMPLE_PERCENT 选项设置和保留未显式指定采样百分比的后续统计信息更新的特定采样百分比。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4, use the PERSIST_SAMPLE_PERCENT option of CREATE STATISTICS (Transact-SQL) or UPDATE STATISTICS (Transact-SQL), to set and retain a specific sampling percentage for subsequent statistic updates that do not explicitly specify a sampling percentage.

自动索引和统计信息管理Automatic index and statistics management

利用自适应索引碎片整理等解决方案,自动管理一个或多个数据库的索引碎片整理和统计信息更新。Leverage solutions such as Adaptive Index Defrag to automatically manage index defragmentation and statistics updates for one or more databases. 此过程根据碎片级别以及其他参数,自动选择是重新生成索引还是重新组织索引,并使用线性阈值更新统计信息。This procedure automatically chooses whether to rebuild or reorganize an index according to its fragmentation level, amongst other parameters, and update statistics with a linear threshold.

高效使用统计信息的查询Queries that use statistics effectively

某些查询实现(如查询谓词中的局部变量和复杂的表达式)可能导致查询计划不是最佳的。Certain query implementations, such as local variables and complex expressions in the query predicate, can lead to suboptimal query plans. 遵循有关高效使用统计信息的查询设计指导原则可以避免这种情况。Following query design guidelines for using statistics effectively can help to avoid this. 有关查询谓词的详细信息,请参阅搜索条件 (Transact-SQL)For more information about query predicates, see Search Condition (Transact-SQL).

您可以通过应用查询设计指导原则来改进查询计划,这些查询设计指导原则高效地使用统计信息,以便改进在查询谓词中使用的表达式、变量和函数的 基数估计You can improve query plans by applying query design guidelines that use statistics effectively to improve cardinality estimates for expressions, variables, and functions used in query predicates. 在查询优化器不知道表达式、变量或函数的值时,它并不知道在直方图中要查找的值,因此无法从直方图检索最佳的基数估计。When the Query Optimizer does not know the value of an expression, variable, or function, it does not know which value to lookup in the histogram and therefore cannot retrieve the best cardinality estimate from the histogram. 查询优化器而是为直方图中所有取样行,在每个不同值的平均行数的基础上执行基数估计。Instead, the Query Optimizer bases the cardinality estimate on the average number of rows per distinct value for all of the sampled rows in the histogram. 这将导致不是最佳的基数估计并且可能影响查询性能。This leads to suboptimal cardinality estimates and can hurt query performance. 有关直方图的详细信息,请参阅本页的直方图部分或 sys.dm_db_stats_histogramFor more information about histograms, see histogram section in this page or sys.dm_db_stats_histogram.

下面的指导原则描述如何编写查询以便通过改进基数估计来改进查询计划。The following guidelines describe how to write queries to improve query plans by improving cardinality estimates.

改进表达式的基数估计Improving cardinality estimates for expressions

要改进表达式的基数估计,请遵循以下指导原则:To improve cardinality estimates for expressions, follow these guidelines:

  • 只要可能,应简化其中含常量的表达式。Whenever possible, simplify expressions with constants in them. 查询优化器在确定基数估计前并不对包含常量的所有函数和表达式进行求值。The Query Optimizer does not evaluate all functions and expressions containing constants prior to determining cardinality estimates. 例如,将表达式 ABS(-100) 简化为 100For example, simplify the expression ABS(-100) to 100.

  • 如果表达式使用多个变量,则考虑为表达式创建一个计算列,然后对该计算列创建统计信息或索引。If the expression uses multiple variables, consider creating a computed column for the expression and then create statistics or an index on the computed column. 例如,如果您为表达式 WHERE PRICE + Tax > 100 创建计算列,则查询谓词 Price + Tax可能会具有更好的基数估计。For example, the query predicate WHERE PRICE + Tax > 100 might have a better cardinality estimate if you create a computed column for the expression Price + Tax.

改进变量和函数的基数估计Improving cardinality estimates for variables and functions

要改进变量和函数的基数估计,请遵循以下指导原则:To improve the cardinality estimates for variables and functions, follow these guidelines:

  • 如果查询谓词使用局部变量,则考虑重新编写查询以使用参数,而非局部变量。If the query predicate uses a local variable, consider rewriting the query to use a parameter instead of a local variable. 在查询优化器创建查询执行计划时,局部变量的值未知。The value of a local variable is not known when the Query Optimizer creates the query execution plan. 在查询使用某一参数时,查询优化器将基数估计用于传递到存储过程的第一个实际参数值。When a query uses a parameter, the Query Optimizer uses the cardinality estimate for the first actual parameter value that is passed to the stored procedure.

  • 考虑使用标准表或临时表来保存多语句表值函数 (mstvf) 的结果。Consider using a standard table or temporary table to hold the results of multi-statement table-valued functions (mstvf). 查询优化器并不为多语句表值函数创建统计信息。The Query Optimizer does not create statistics for multi-statement table-valued functions. 使用此方法,查询优化器可对表列创建统计信息并使用它们创建更好的查询计划。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan.

  • 考虑使用标准表或临时表来代替表变量。Consider using a standard table or temporary table as a replacement for table variables. 查询优化器不会为表变量创建统计信息。The Query Optimizer does not create statistics for table variables. 使用此方法,查询优化器可对表列创建统计信息并使用它们创建更好的查询计划。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan. 在确定是使用临时表还是表变量时需要进行一些权衡,与临时表相比,在存储过程中使用的表变量会导致更少的存储过程的重新编译。There are tradeoffs in determining whether to use a temporary table or a table variable; Table variables used in stored procedures cause fewer recompilations of the stored procedure than temporary tables. 根据应用程序,使用临时表来代替表变量可能不会改进性能。Depending on the application, using a temporary table instead of a table variable might not improve performance.

  • 如果某一存储过程包含使用某一传入的参数的查询,则在查询中使用该参数值之前,应避免在该存储过程内更改该参数值。If a stored procedure contains a query that uses a passed-in parameter, avoid changing the parameter value within the stored procedure before using it in the query. 查询的基数估计基于传入的参数值,而非更新的值。The cardinality estimates for the query are based on the passed-in parameter value and not the updated value. 为了避免更改参数值,您可以重新编写查询以使用两个存储过程。To avoid changing the parameter value, you can rewrite the query to use two stored procedures.

    例如,以下存储过程 Sales.GetRecentSales 将在 @date 为 NULL 时更改参数 @date 的值。For example, the following stored procedure Sales.GetRecentSales changes the value of the parameter @date when @date is NULL.

    USE AdventureWorks2012;  
    GO  
    IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL  
        DROP PROCEDURE Sales.GetRecentSales;  
    GO  
    CREATE PROCEDURE Sales.GetRecentSales (@date datetime)  
    AS BEGIN  
        IF @date IS NULL  
            SET @date = DATEADD(MONTH, -3, (SELECT MAX(ORDERDATE) FROM Sales.SalesOrderHeader))  
        SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d  
        WHERE h.SalesOrderID = d.SalesOrderID  
        AND h.OrderDate > @date  
    END  
    GO  
    

    如果对存储过程 Sales.GetRecentSales 的首次调用为 @date 参数传递了 NULL,则查询优化器将使用针对 @date = NULL 的基数估计编译存储过程,即使查询谓词不是使用 @date = NULL调用的。If the first call to the stored procedure Sales.GetRecentSales passes a NULL for the @date parameter, the Query Optimizer will compile the stored procedure with the cardinality estimate for @date = NULL even though the query predicate is not called with @date = NULL. 此基数估计可能与实际查询结果中的行数差别很大。This cardinality estimate might be significantly different than the number of rows in the actual query result. 因此,查询优化器可能会选择非最佳查询计划。As a result, the Query Optimizer might choose a suboptimal query plan. 若要避免此情况发生,您可以按如下所示将存储过程重新编写成两个过程:To help avoid this, you can rewrite the stored procedure into two procedures as follows:

    USE AdventureWorks2012;  
    GO  
    IF OBJECT_ID ( 'Sales.GetNullRecentSales', 'P') IS NOT NULL  
        DROP PROCEDURE Sales.GetNullRecentSales;  
    GO  
    CREATE PROCEDURE Sales.GetNullRecentSales (@date datetime)  
    AS BEGIN  
        IF @date is NULL  
            SET @date = DATEADD(MONTH, -3, (SELECT MAX(ORDERDATE) FROM Sales.SalesOrderHeader))  
        EXEC Sales.GetNonNullRecentSales @date;  
    END  
    GO  
    IF OBJECT_ID ( 'Sales.GetNonNullRecentSales', 'P') IS NOT NULL  
        DROP PROCEDURE Sales.GetNonNullRecentSales;  
    GO  
    CREATE PROCEDURE Sales.GetNonNullRecentSales (@date datetime)  
    AS BEGIN  
        SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d  
        WHERE h.SalesOrderID = d.SalesOrderID  
        AND h.OrderDate > @date  
    END  
    GO  
    

使用查询提示改进基数估计Improving cardinality estimates with query hints

为了改进局部变量的基数估计,可以将 OPTIMIZE FOR <value>OPTIMIZE FOR UNKNOWN 查询提示与 RECOMPILE 一起使用。To improve cardinality estimates for local variables, you can use the OPTIMIZE FOR <value> or OPTIMIZE FOR UNKNOWN query hints with RECOMPILE. 有关详细信息,请参阅查询提示 (Transact-SQL)For more information, see Query Hints (Transact-SQL).

对于某些应用程序,每次执行查询时都重新编译查询可能会占用过多时间。For some applications, recompiling the query each time it executes might take too much time. OPTIMIZE FOR 查询提示可对此给予帮助,即使不使用 RECOMPILE 选项。The OPTIMIZE FOR query hint can help even if you don't use the RECOMPILE option. 例如,可以将 OPTIMIZE FOR 选项添加到存储过程 Sales.GetRecentSales,以便指定一个特定的日期。For example, you could add an OPTIMIZE FOR option to the stored procedure Sales.GetRecentSales to specify a specific date. 以下示例将 OPTIMIZE FOR 选项添加到 Sales.GetRecentSales 过程。The following example adds the OPTIMIZE FOR option to the Sales.GetRecentSales procedure.

USE AdventureWorks2012;  
GO  
IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL  
    DROP PROCEDURE Sales.GetRecentSales;  
GO  
CREATE PROCEDURE Sales.GetRecentSales (@date datetime)  
AS BEGIN  
    IF @date is NULL  
        SET @date = DATEADD(MONTH, -3, (SELECT MAX(ORDERDATE) FROM Sales.SalesOrderHeader))  
    SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d  
    WHERE h.SalesOrderID = d.SalesOrderID  
    AND h.OrderDate > @date  
    OPTION ( OPTIMIZE FOR ( @date = '2004-05-01 00:00:00.000'))  
END;  
GO  

使用计划指南改进基数估计Improving cardinality estimates with Plan Guides

对于某些应用程序,查询计划指南可能不适用,因为您无法更改查询,或者使用 RECOMPILE 查询提示可能导致过多的重新编译。For some applications, query design guidelines might not apply because you cannot change the query or using the RECOMPILE query hint might be cause too many recompiles. 您可以使用计划指南来指定 USE PLAN 之类的其他提示,以便在向应用程序供应商调查应用程序变化的同时,控制查询的行为。You can use plan guides to specify other hints, such as USE PLAN, to control the behavior of the query while investigating application changes with the application vendor. 有关计划指南的详细信息,请参阅 Plan GuidesFor more information about plan guides, see Plan Guides.

另请参阅See Also

CREATE STATISTICS (Transact-SQL) CREATE STATISTICS (Transact-SQL)
UPDATE STATISTICS (Transact-SQL) UPDATE STATISTICS (Transact-SQL)
sp_updatestats (Transact-SQL) sp_updatestats (Transact-SQL)
DBCC SHOW_STATISTICS (Transact-SQL) DBCC SHOW_STATISTICS (Transact-SQL)
ALTER DATABASE SET 选项 (Transact-SQL) ALTER DATABASE SET Options (Transact-SQL)
DROP STATISTICS (Transact-SQL) DROP STATISTICS (Transact-SQL)
CREATE INDEX (Transact-SQL) CREATE INDEX (Transact-SQL)
ALTER INDEX (Transact-SQL) ALTER INDEX (Transact-SQL)
创建筛选索引 Create Filtered Indexes
控制 SQL Server 中的 Autostat (AUTO_UPDATE_STATISTICS) 行为 Controlling Autostat (AUTO_UPDATE_STATISTICS) behavior in SQL Server
STATS_DATE (Transact-SQL) STATS_DATE (Transact-SQL)
sys.dm_db_stats_properties (Transact-SQL) sys.dm_db_stats_properties (Transact-SQL)
sys.dm_db_stats_histogram (Transact-SQL)sys.dm_db_stats_histogram (Transact-SQL)
sys.statssys.stats
sys.stats_columns (Transact-SQL) sys.stats_columns (Transact-SQL)
自适应索引碎片整理Adaptive Index Defrag