# 统计信息Statistics

## 组件和概念Components and Concepts

### 统计信息Statistics

#### 直方图Histogram

SQL ServerSQL Server 中的直方图仅为单个列生成 - 统计信息对象键列集的第一列。Histograms in SQL ServerSQL Server are only built for a single column-the first column in the set of key columns of the statistics object.

• 直方图初始化：在第一步中，处理始于排序集开始处的一个值序列，并收集 range_high_key、equal_rows、range_rows 和 distinct_range_rows 的最多 200 个值（在此步骤中，range_rows 和 distinct_range_rows 始终为零） 。Histogram initialization: In the first step, a sequence of values starting at the beginning of the sorted set is processed, and up to 200 values of range_high_key, equal_rows, range_rows, and distinct_range_rows are collected (range_rows and distinct_range_rows are always zero during this step). 已用尽所有输入或已找到 200 个值时，结束第一步。The first step ends either when all input has been exhausted, or when 200 values have been found.
• 使用 Bucket 合并进行扫描：第二步中，按排序顺序处理从统计信息键前导列算起的每个额外值；将每个相继值添加到最后一个范围或在末尾创建一个新范围（这可能是因输入值已排序所致）。Scan with bucket merge: Each additional value from the leading column of the statistics key is processed in the second step, in sorted order; each successive value is either added to the last range or a new range at the end is created (this is possible because the input values are sorted). 如果创建了一个新范围，则一对现有相邻范围折叠为单个范围。If a new range is created, then one pair of existing, neighboring ranges is collapsed into a single range. 选择这对范围以最大限度减少信息丢失。This pair of ranges is selected in order to minimize information loss. 此方法使用最大差异 算法使直方图中的梯级数减至最少，并同时最大化边界值之间的差异。This method uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. 折叠后，梯级数在整个步骤中保持为 200。The number of steps after collapsing ranges stays at 200 throughout this step.
• 直方图合并：第三步中，如果未丢失大量信息，可能折叠更多范围。Histogram consolidation: In the third step, more ranges may be collapsed if a significant amount of information is not lost. 直方图梯级数可以少于非重复值的数目，即使对于边界点少于 200 的列也是如此。The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. 因此，即使列包含超过 200 个唯一值，直方图具有的梯级数可能少于 200。Therefore, even if the column has more than 200 unique values, the histogram may have less than 200 steps. 对于仅包含唯一值的列，合并的直方图具有三个梯级（最小梯级数）。For a column consisting of only unique values, then the consolidated histogram will have a minimum of three steps.

• 粗线表示上限值 (range_high_key ) 和上限值的出现次数 (equal_rows )Bold line represents the upper boundary value (range_high_key) and the number of times it occurs (equal_rows)

• range_high_key 左侧的纯色区域表示列值范围和每个列值的平均出现次数 (average_range_rows )。Solid area left of range_high_key represents the range of column values and the average number of times each column value occurs (average_range_rows). 第一个直方图梯级的 average_range_rows 始终是 0。The average_range_rows for the first histogram step is always 0.

• 点线表示用于估计范围中的非重复值总数 (distinct_range_rows ) 和范围中的总指数 (range_rows )。Dotted lines represent the sampled values used to estimate total number of distinct values in the range (distinct_range_rows) and total number of values in the range (range_rows). 查询优化器使用 range_rows 和 distinct_range_rows 计算 average_range_rows ，且不存储抽样值。The query optimizer uses range_rows and distinct_range_rows to compute average_range_rows and does not store the sampled values.

#### 密度向量Density Vector

(CustomerId)(CustomerId) 具有与 CustomerId 匹配的值的行Rows with matching values for CustomerId
(CustomerId, ItemId)(CustomerId, ItemId) 具有与 CustomerId 和 ItemId 匹配的值的行Rows with matching values for CustomerId and ItemId
(CustomerId, ItemId, Price)(CustomerId, ItemId, Price) 具有与 CustomerId、ItemId 和 Price 匹配的值的行Rows with matching values for CustomerId, ItemId, and Price

### 统计信息选项Statistics Options

#### AUTO_CREATE_STATISTICS 选项AUTO_CREATE_STATISTICS Option

``````SELECT OBJECT_NAME(s.object_id) AS object_name,
COL_NAME(sc.object_id, sc.column_id) AS column_name,
s.name AS statistics_name
FROM sys.stats AS s
INNER JOIN sys.stats_columns AS sc
ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id
WHERE s.name like '_WA%'
ORDER BY s.name;
``````

#### AUTO_UPDATE_STATISTICS 选项AUTO_UPDATE_STATISTICS Option

• 直到 SQL Server 2014 (12.x)SQL Server 2014 (12.x)SQL ServerSQL Server 基于更改行的百分比使用阈值。Up to SQL Server 2014 (12.x)SQL Server 2014 (12.x), SQL ServerSQL Server uses a threshold based on the percent of rows changed. 这与表中的行数无关。This is regardless of the number of rows in the table. 阈值是：The threshold is:

• 如果在评估时间统计信息时表基数为 500 或更低，则每达到 500 次修改时更新一次。If the table cardinality was 500 or less at the time statistics were evaluated, update for every 500 modifications.
• 如果在评估时间统计信息时表基数大于 500，则每达到 500 + 修改次数的百分之二十时更新一次。If the table cardinality was above 500 at the time statistics were evaluated, update for every 500 + 20 percent of modifications.
• SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始，如果数据库兼容性级别为 130，SQL ServerSQL Server 将使用递减的动态统计信息更新阈值，此阈值将根据表中的行数进行调整。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) and under the database compatibility level 130, SQL ServerSQL Server uses a decreasing, dynamic statistics update threshold that adjusts according to the number of rows in the table. 它的计算方式为 1000 与当前的表基数乘积的平方根。This is calculated as the square root of the product of 1000 and the current table cardinality. 例如，如果表中包含 200 万行，则计算为 sqrt (1000 * 2000000) = 44721.359。For example if your table contains 2 million rows, then the calculation is? sqrt (1000 * 2000000) = 44721.359. 进行此更改后，将会更频繁地更新大型表的统计信息。With this change, statistics on large tables will be updated more often. 但是，如果数据库的兼容性级别低于 130，则 SQL Server 2014 (12.x)SQL Server 2014 (12.x) 阈值适用。However, if a database has a compatibility level below 130, then the SQL Server 2014 (12.x)SQL Server 2014 (12.x) threshold applies. ??

SQL Server 2008 R2SQL Server 2008 R2 开始到 SQL Server 2014 (12.x)SQL Server 2014 (12.x)，或者在 SQL Server 2016 (13.x)SQL Server 2016 (13.x) 及更高版本中，如果数据库兼容性级别 130，使用跟踪标志 2371 SQL ServerSQL Server 将使用递减的动态统计信息更新阈值，此阈值将根据表中的行数进行调整。Starting with SQL Server 2008 R2SQL Server 2008 R2 through SQL Server 2014 (12.x)SQL Server 2014 (12.x), or in SQL Server 2016 (13.x)SQL Server 2016 (13.x) and later under database compatibility level lower than 130, use trace flag 2371 and SQL ServerSQL Server will use a decreasing, dynamic statistics update threshold that adjusts according to the number of rows in the table.

AUTO_UPDATE_STATISTICS 选项适用于为索引创建的统计信息对象、查询谓词中的单列以及使用 CREATE STATISTICS 语句创建的统计信息。The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. 此选项也适用于筛选的统计信息。This option also applies to filtered statistics.

#### AUTO_UPDATE_STATISTICS_ASYNCAUTO_UPDATE_STATISTICS_ASYNC

• 您的应用程序频繁执行相同的查询、类似的查询或类似的缓存查询计划。Your application frequently executes the same query, similar queries, or similar cached query plans. 与同步统计信息更新相比，使用异步统计信息更新时你的查询响应时间可能具有更高的可预测性，因为查询优化器可以执行传入的查询而不必等待最新的统计信息。Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the Query Optimizer can execute incoming queries without waiting for up-to-date statistics. 这避免延迟某些查询，而不延迟其他查询。This avoids delaying some queries and not others.

• 您的应用程序遇到了客户端请求超时，这些超时是由于一个或多个查询正在等待更新后的统计信息所导致的。Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. 在某些情况下，等待同步统计信息可能会导致应用程序因过长超时而失败。In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

#### INCREMENTALINCREMENTAL

CREATE STATISTICS 的 INCREMENTAL 选项为 ON 时，创建的统计信息为每个分区的统计信息。When INCREMENTAL option of CREATE STATISTICS is ON, the statistics created are per partition statistics. 为 OFF 时，删除统计信息树并且 SQL ServerSQL Server 重新计算统计信息。When OFF, the statistics tree is dropped and SQL ServerSQL Server re-computes the statistics. 默认为 OFF。The default is OFF. 此设置覆盖数据库级别 INCREMENTAL 属性。This setting overrides the database level INCREMENTAL property. 要深入了解如何创建增量统计信息，请参阅 CREATE STATISTICS (Transact-SQL)For more information about creating incremental statistics, see CREATE STATISTICS (Transact-SQL). 要深入了解如何自动创建每个分区的统计信息，请参阅数据库属性（“选项”页）ALTER DATABASE SET 选项 (Transact-SQL)For more information about creating per partition statistics automatically, see Database Properties (Options Page) and ALTER DATABASE SET Options (Transact-SQL).

• 使用未与基表的分区对齐的索引创建的统计信息。Statistics created with indexes that are not partition-aligned with the base table.
• 对 Always On 可读辅助数据库创建的统计信息。Statistics created on Always On readable secondary databases.
• 对只读数据库创建的统计信息。Statistics created on read-only databases.
• 对筛选的索引创建的统计信息。Statistics created on filtered indexes.
• 对视图创建的统计信息。Statistics created on views.
• 对内部表创建的统计信息。Statistics created on internal tables.
• 使用空间索引或 XML 索引创建的统计信息。Statistics created with spatial indexes or XML indexes.

## 何时创建统计信息When to create statistics

1. 在创建索引时，查询优化器为表或视图上的索引创建统计信息。The Query Optimizer creates statistics for indexes on tables or views when the index is created. 这些统计信息将创建在索引的键列上。These statistics are created on the key columns of the index. 如果索引是一个筛选索引，则查询优化器将在为该筛选索引指定的行的同一子集上创建筛选统计信息。If the index is a filtered index, the Query Optimizer creates filtered statistics on the same subset of rows specified for the filtered index. 有关筛选索引的详细信息，请参阅创建筛选索引CREATE INDEX (Transact-SQL)For more information about filtered indexes, see Create Filtered Indexes and CREATE INDEX (Transact-SQL).

2. AUTO_CREATE_STATISTICS 为 ON 时，查询优化器为查询谓词中的单列创建统计信息。The Query Optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.

• 数据库引擎Database Engine 优化顾问建议创建统计信息。The 数据库引擎Database Engine Tuning Advisor suggests creating statistics.
• 查询谓词包含尚不位于相同索引中的多个相关列。The query predicate contains multiple correlated columns that are not already in the same index.
• 查询从数据的子集中选择数据。The query selects from a subset of data.
• 查询缺少统计信息。The query has missing statistics.

### 查询谓词包含多个相关列Query Predicate contains multiple correlated columns

``````USE AdventureWorks2012;
GO
IF EXISTS (SELECT name FROM sys.stats
WHERE name = 'LastFirst'
AND object_ID = OBJECT_ID ('Person.Person'))
DROP STATISTICS Person.Person.LastFirst;
GO
CREATE STATISTICS LastFirst ON Person.Person (LastName, MiddleName, FirstName);
GO
``````

### 查询从数据的子集中选择数据Query Selects from a subset of data

``````USE AdventureWorks2012;
GO
IF EXISTS ( SELECT name FROM sys.stats
WHERE name = 'BikeWeights'
AND object_ID = OBJECT_ID ('Production.Product'))
DROP STATISTICS Production.Product.BikeWeights;
GO
CREATE STATISTICS BikeWeights
ON Production.Product (Weight)
WHERE ProductSubcategoryID IN (1,2,3);
GO
``````

``````SELECT P.Weight AS Weight, S.Name AS BikeName
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (1,2,3) AND P.Weight > 25
ORDER BY P.Weight;
GO
``````

### 查询识别缺少的统计信息Query identifies missing statistics

• Verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on.
• 请确保数据库不是只读的。Verify that the database is not read-only. 如果数据库是只读的，则无法保存新统计信息对象。If the database is read-only, a new statistics object cannot be saved.
• 通过使用 CREATE STATISTICS 语句创建缺少的统计信息。Create the missing statistics by using the CREATE STATISTICS statement.

• 使用 DROP STATISTICS 语句删除临时统计信息。Delete temporary statistics using the DROP STATISTICS statement.
• 使用 sys.statssys.stats_columns 目录视图监视统计信息 。Monitor statistics using the sys.stats and sys.stats_columns catalog views. sys_stats 包含 is_temporary 列，用于指示哪些统计信息是永久的，哪些统计信息是临时的。sys_stats includes the is_temporary column, to indicate which statistics are permanent and which are temporary.

## 何时更新统计信息When to update statistics

• 查询执行时间很长。Query execution times are slow.
• 在升序或降序键列上发生插入操作。Insert operations occur on ascending or descending key columns.
• 在维护操作后。After maintenance operations.

### 在维护操作后After maintenance operations

SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4 开始，使用 CREATE STATISTICS (Transact-SQL)UPDATE STATISTICS (Transact-SQL) 的 PERSIST_SAMPLE_PERCENT 选项设置和保留未显式指定采样百分比的后续统计信息更新的特定采样百分比。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4, use the PERSIST_SAMPLE_PERCENT option of CREATE STATISTICS (Transact-SQL) or UPDATE STATISTICS (Transact-SQL), to set and retain a specific sampling percentage for subsequent statistic updates that do not explicitly specify a sampling percentage.

## 高效使用统计信息的查询Queries that use statistics effectively

### 改进表达式的基数估计Improving cardinality estimates for expressions

• 只要可能，应简化其中含常量的表达式。Whenever possible, simplify expressions with constants in them. 查询优化器在确定基数估计前并不对包含常量的所有函数和表达式进行求值。The Query Optimizer does not evaluate all functions and expressions containing constants prior to determining cardinality estimates. 例如，将表达式 `ABS(-100)` 简化为 `100`For example, simplify the expression `ABS(-100)` to `100`.

• 如果表达式使用多个变量，则考虑为表达式创建一个计算列，然后对该计算列创建统计信息或索引。If the expression uses multiple variables, consider creating a computed column for the expression and then create statistics or an index on the computed column. 例如，如果您为表达式 `WHERE PRICE + Tax > 100` 创建计算列，则查询谓词 `Price + Tax`可能会具有更好的基数估计。For example, the query predicate `WHERE PRICE + Tax > 100` might have a better cardinality estimate if you create a computed column for the expression `Price + Tax`.

### 改进变量和函数的基数估计Improving cardinality estimates for variables and functions

• 如果查询谓词使用局部变量，则考虑重新编写查询以使用参数，而非局部变量。If the query predicate uses a local variable, consider rewriting the query to use a parameter instead of a local variable. 在查询优化器创建查询执行计划时，局部变量的值未知。The value of a local variable is not known when the Query Optimizer creates the query execution plan. 在查询使用某一参数时，查询优化器将基数估计用于传递到存储过程的第一个实际参数值。When a query uses a parameter, the Query Optimizer uses the cardinality estimate for the first actual parameter value that is passed to the stored procedure.

• 考虑使用标准表或临时表来保存多语句表值函数 (mstvf) 的结果。Consider using a standard table or temporary table to hold the results of multi-statement table-valued functions (mstvf). 查询优化器并不为多语句表值函数创建统计信息。The Query Optimizer does not create statistics for multi-statement table-valued functions. 使用此方法，查询优化器可对表列创建统计信息并使用它们创建更好的查询计划。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan.

• 考虑使用标准表或临时表来代替表变量。Consider using a standard table or temporary table as a replacement for table variables. 查询优化器不会为表变量创建统计信息。The Query Optimizer does not create statistics for table variables. 使用此方法，查询优化器可对表列创建统计信息并使用它们创建更好的查询计划。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan. 在确定是使用临时表还是表变量时需要进行一些权衡，与临时表相比，在存储过程中使用的表变量会导致更少的存储过程的重新编译。There are tradeoffs in determining whether to use a temporary table or a table variable; Table variables used in stored procedures cause fewer recompilations of the stored procedure than temporary tables. 根据应用程序，使用临时表来代替表变量可能不会改进性能。Depending on the application, using a temporary table instead of a table variable might not improve performance.

• 如果某一存储过程包含使用某一传入的参数的查询，则在查询中使用该参数值之前，应避免在该存储过程内更改该参数值。If a stored procedure contains a query that uses a passed-in parameter, avoid changing the parameter value within the stored procedure before using it in the query. 查询的基数估计基于传入的参数值，而非更新的值。The cardinality estimates for the query are based on the passed-in parameter value and not the updated value. 为了避免更改参数值，您可以重新编写查询以使用两个存储过程。To avoid changing the parameter value, you can rewrite the query to use two stored procedures.

例如，以下存储过程 `Sales.GetRecentSales` 将在 `@date` 为 NULL 时更改参数 `@date` 的值。For example, the following stored procedure `Sales.GetRecentSales` changes the value of the parameter `@date` when `@date` is NULL.

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetRecentSales;
GO
CREATE PROCEDURE Sales.GetRecentSales (@date datetime)
AS BEGIN
IF @date IS NULL
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
GO
``````

如果对存储过程 `Sales.GetRecentSales` 的首次调用为 `@date` 参数传递了 NULL，则查询优化器将使用针对 `@date = NULL` 的基数估计编译存储过程，即使查询谓词不是使用 `@date = NULL`调用的。If the first call to the stored procedure `Sales.GetRecentSales` passes a NULL for the `@date` parameter, the Query Optimizer will compile the stored procedure with the cardinality estimate for `@date = NULL` even though the query predicate is not called with `@date = NULL`. 此基数估计可能与实际查询结果中的行数差别很大。This cardinality estimate might be significantly different than the number of rows in the actual query result. 因此，查询优化器可能会选择非最佳查询计划。As a result, the Query Optimizer might choose a suboptimal query plan. 若要避免此情况发生，您可以按如下所示将存储过程重新编写成两个过程：To help avoid this, you can rewrite the stored procedure into two procedures as follows:

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetNullRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetNullRecentSales;
GO
CREATE PROCEDURE Sales.GetNullRecentSales (@date datetime)
AS BEGIN
IF @date is NULL
EXEC Sales.GetNonNullRecentSales @date;
END
GO
IF OBJECT_ID ( 'Sales.GetNonNullRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetNonNullRecentSales;
GO
CREATE PROCEDURE Sales.GetNonNullRecentSales (@date datetime)
AS BEGIN
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
GO
``````

### 使用查询提示改进基数估计Improving cardinality estimates with query hints

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetRecentSales;
GO
CREATE PROCEDURE Sales.GetRecentSales (@date datetime)
AS BEGIN
IF @date is NULL