統計資料Statistics

元件和概念Components and Concepts

統計資料Statistics

長條圖Histogram

「長條圖」 可測量資料集中每一個相異值的發生頻率。A histogram measures the frequency of occurrence for each distinct value in a data set. 查詢最佳化工具會計算有關統計資料物件之第一個索引鍵資料行中資料行值的長條圖，以統計方式取樣資料列或執行資料表或檢視表中所有資料列的完整掃描來選取資料行值。The query optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view. 如果長條圖是從一組取樣的資料列所建立，資料列數和相異值數的儲存總計會是預估值，而且不需要為整數。If the histogram is created from a sampled set of rows, the stored totals for number of rows and number of distinct values are estimates and do not need to be whole integers.

SQL ServerSQL Server 中的長條圖只會針對單一資料行建置；也就是統計資料物件的索引鍵資料行集合中第一個資料行。Histograms in SQL ServerSQL Server are only built for a single column-the first column in the set of key columns of the statistics object.

• 長條圖初始化：第一個步驟會從已排序的集合開頭處理一連串值，並收集最多 200 個 range_high_keyequal_rowsrange_rowsdistinct_range_rows 的值 (在此步驟中，range_rowsdistinct_range_rows 一律為零)。Histogram initialization: In the first step, a sequence of values starting at the beginning of the sorted set is processed, and up to 200 values of range_high_key, equal_rows, range_rows, and distinct_range_rows are collected (range_rows and distinct_range_rows are always zero during this step). 當所有的輸入都已用完，或已找到 200 個值時，就會結束第一個步驟。The first step ends either when all input has been exhausted, or when 200 values have been found.
• 使用貯體合併掃描：第二個步驟會依順序處理統計資料索引鍵之前置資料行的每一個額外值；每個後續的值可以新增到最後一個範圍，或在結束時建立新的範圍 (由於輸入的值會排序，因此這是可行的)。Scan with bucket merge: Each additional value from the leading column of the statistics key is processed in the second step, in sorted order; each successive value is either added to the last range or a new range at the end is created (this is possible because the input values are sorted). 建立新的範圍時，會將現有的一組相鄰範圍摺疊成單一範圍。If a new range is created, then one pair of existing, neighboring ranges is collapsed into a single range. 系統會選取這一組範圍，以將資訊遺失的機率降至最低。This pair of ranges is selected in order to minimize information loss. 此方法會使用「最大差異」 演算法，讓長條圖中的步驟數減至最少，同時讓界限值之間的差異最大化。This method uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. 在這整個步驟期間，範圍摺疊之後的步驟數目仍然為 200。The number of steps after collapsing ranges stays at 200 throughout this step.
• 長條圖彙總：第三個步驟可能會摺疊更多範圍 (如果不會遺失大量資訊的話)。Histogram consolidation: In the third step, more ranges may be collapsed if a significant amount of information is not lost. 長條圖步驟的數目可以少於相異值數目，即使包含了少於 200 個界限點的資料行也是如此。The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. 因此，即使資料行具有超過 200 個唯一值，長條圖仍可能只需 200 個以下的步驟。Therefore, even if the column has more than 200 unique values, the histogram may have less than 200 steps. 若資料行都是由唯一值組成，則合併的長條圖將只有最少的三個步驟。For a column consisting of only unique values, then the consolidated histogram will have a minimum of three steps.

• 粗線代表上限值 (range_high_key) 以及其所發生的次數 (equal_rows)Bold line represents the upper boundary value (range_high_key) and the number of times it occurs (equal_rows)

• range_high_key 左邊的實線區域代表資料行值範圍，以及每一個資料行值發生的平均次數 (average_range_rows)。Solid area left of range_high_key represents the range of column values and the average number of times each column value occurs (average_range_rows). 第一個長條圖步驟的 average_range_rows 一定是 0。The average_range_rows for the first histogram step is always 0.

• 虛線代表用來預估範圍內相異值總數的取樣值 (distinct_range_rows) 以及範圍內的值總數 (range_rows)。Dotted lines represent the sampled values used to estimate total number of distinct values in the range (distinct_range_rows) and total number of values in the range (range_rows). 查詢最佳化工具會使用 range_rowsdistinct_range_rows 來計算 average_range_rows，而且不會儲存取樣值。The query optimizer uses range_rows and distinct_range_rows to compute average_range_rows and does not store the sampled values.

密度向量Density Vector

「頻率」是統計資料物件第一個索引鍵資料行中每一個相異值的發生次數資訊，其計算方式為資料列計數乘以密度。Frequency is information about the occurrence of each distinct value in the first key column of the statistics object, and is calculated as row count * density. 如果資料行具有唯一值，則其最大頻率為 1。A maximum frequency of 1 can be found in columns with unique values.

(CustomerId)(CustomerId) 與 CustomerId 的值相符的資料列Rows with matching values for CustomerId
(CustomerId, ItemId)(CustomerId, ItemId) 與 CustomerId 和 ItemId 的值相符的資料列Rows with matching values for CustomerId and ItemId
(CustomerId, ItemId, Price)(CustomerId, ItemId, Price) 與 CustomerId、ItemId 和 Price 的值相符的資料列Rows with matching values for CustomerId, ItemId, and Price

統計資料選項Statistics Options

AUTO_CREATE_STATISTICS 選項AUTO_CREATE_STATISTICS Option

``````SELECT OBJECT_NAME(s.object_id) AS object_name,
COL_NAME(sc.object_id, sc.column_id) AS column_name,
s.name AS statistics_name
FROM sys.stats AS s
INNER JOIN sys.stats_columns AS sc
ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id
WHERE s.name like '_WA%'
ORDER BY s.name;
``````

AUTO_UPDATE_STATISTICS 選項AUTO_UPDATE_STATISTICS Option

• SQL ServerSQL Server (截至 SQL Server 2014 (12.x)SQL Server 2014 (12.x)) 會使用變更資料列的百分比作為臨界值。Up to SQL Server 2014 (12.x)SQL Server 2014 (12.x), SQL ServerSQL Server uses a threshold based on the percent of rows changed. 與資料表中的資料列數無關。This is regardless of the number of rows in the table. 臨界值是：The threshold is:

• 若資料表基數在評估統計資料時為 500 或更小的數值，將會在每 500 次修改之後更新。If the table cardinality was 500 or less at the time statistics were evaluated, update for every 500 modifications.
• 若資料表基數在評估統計資料時為超過 500 的數值，將會在每 500 + 20% 的修改次數之後更新。If the table cardinality was above 500 at the time statistics were evaluated, update for every 500 + 20 percent of modifications.
• SQL Server 2016 (13.x)SQL Server 2016 (13.x) 開始並 在資料庫相容性層級 130 之下，SQL ServerSQL Server 會使用降低、動態的統計資料更新臨界值。此臨界值會根據資料表中的資料列數目來做出調整。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) and under the database compatibility level 130, SQL ServerSQL Server uses a decreasing, dynamic statistics update threshold that adjusts according to the number of rows in the table. 這是以 1000 乘以目前資料表基數的平方根來計算。This is calculated as the square root of the product of 1000 and the current table cardinality. 例如，如果您的資料表包含 2 百萬個資料列，則計算結果是 sqrt (1000 * 2000000) = 44721.359。For example if your table contains 2 million rows, then the calculation is? sqrt (1000 * 2000000) = 44721.359. 透過這項變更，大型資料表上的統計資料會經常更新。With this change, statistics on large tables will be updated more often. 不過，如果資料庫的相容性層級低於 130，便會套用 SQL Server 2014 (12.x)SQL Server 2014 (12.x) 臨界值。However, if a database has a compatibility level below 130, then the SQL Server 2014 (12.x)SQL Server 2014 (12.x) threshold applies. ??

AUTO_UPDATE_STATISTICS 選項會套用至針對索引所建立的統計資料物件、查詢述詞中的單一資料行，以及使用 CREATE STATISTICS 陳述式所建立的統計資料。The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. 此外，這個選項也會套用至篩選的統計資料。This option also applies to filtered statistics.

AUTO_UPDATE_STATISTICS_ASYNCAUTO_UPDATE_STATISTICS_ASYNC

• 您的應用程式經常會執行相同的查詢、相似的查詢或相似的快取查詢計劃。Your application frequently executes the same query, similar queries, or similar cached query plans. 相較於使用同步統計資料更新，使用非同步統計資料更新可能會讓您更容易預測查詢的回應時間，因為查詢最佳化工具不需要等候最新的統計資料，就可以執行傳入的查詢。Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the Query Optimizer can execute incoming queries without waiting for up-to-date statistics. 這樣會避免延遲某些查詢，但無法避免延遲其他查詢。This avoids delaying some queries and not others.

• 您的應用程式遇到等候更新統計資料之一或多個查詢所造成的用戶端要求逾時。Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. 在某些情況下，等候同步統計資料可能會造成具有彙總逾時的應用程式失敗。In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

INCREMENTALINCREMENTAL

• 建立統計資料時，所使用的索引未與基底資料表進行分割區對齊。Statistics created with indexes that are not partition-aligned with the base table.
• 在 AlwaysOn 可讀取次要資料庫上建立的統計資料。Statistics created on Always On readable secondary databases.
• 在唯讀資料庫上建立的統計資料。Statistics created on read-only databases.
• 在篩選的索引上建立的統計資料。Statistics created on filtered indexes.
• 在檢視上建立的統計資料。Statistics created on views.
• 在內部資料表上建立的統計資料。Statistics created on internal tables.
• 使用空間索引或 XML 索引建立的統計資料。Statistics created with spatial indexes or XML indexes.

何時建立統計資料When to create statistics

1. 建立索引時，查詢最佳化工具就會針對資料表或檢視表的索引建立統計資料。The Query Optimizer creates statistics for indexes on tables or views when the index is created. 這些統計資料是針對索引的索引鍵資料行所建立的。These statistics are created on the key columns of the index. 如果索引是篩選的索引，查詢最佳化工具就會在針對篩選索引所指定的相同資料列子集上建立篩選的統計資料。If the index is a filtered index, the Query Optimizer creates filtered statistics on the same subset of rows specified for the filtered index. 如需篩選索引的詳細資訊，請參閱建立篩選的索引CREATE INDEX (Transact-SQL)For more information about filtered indexes, see Create Filtered Indexes and CREATE INDEX (Transact-SQL).

2. 開啟 AUTO_CREATE_STATISTICS 時，查詢最佳化工具會針對查詢述詞中的單一資料行建立統計資料。The Query Optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.

• Database EngineDatabase Engine Tuning Advisor 建議您建立統計資料。The Database EngineDatabase Engine Tuning Advisor suggests creating statistics.
• 查詢述詞包含多個尚未存在相同索引中的相互關聯資料行。The query predicate contains multiple correlated columns that are not already in the same index.
• 查詢會從資料子集中選取。The query selects from a subset of data.
• 查詢具有遺失的統計資料。The query has missing statistics.

查詢述詞包含多個相互關聯的資料行Query Predicate contains multiple correlated columns

``````USE AdventureWorks2012;
GO
IF EXISTS (SELECT name FROM sys.stats
WHERE name = 'LastFirst'
AND object_ID = OBJECT_ID ('Person.Person'))
DROP STATISTICS Person.Person.LastFirst;
GO
CREATE STATISTICS LastFirst ON Person.Person (LastName, MiddleName, FirstName);
GO
``````

查詢會從資料子集中選取Query Selects from a subset of data

``````USE AdventureWorks2012;
GO
IF EXISTS ( SELECT name FROM sys.stats
WHERE name = 'BikeWeights'
AND object_ID = OBJECT_ID ('Production.Product'))
DROP STATISTICS Production.Product.BikeWeights;
GO
CREATE STATISTICS BikeWeights
ON Production.Product (Weight)
WHERE ProductSubcategoryID IN (1,2,3);
GO
``````

``````SELECT P.Weight AS Weight, S.Name AS BikeName
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (1,2,3) AND P.Weight > 25
ORDER BY P.Weight;
GO
``````

查詢會識別遺失的統計資料Query identifies missing statistics

• Verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on.
• 確認資料庫不是唯讀的。Verify that the database is not read-only. 若資料庫為唯讀，將無法儲存新統計資料物件。If the database is read-only, a new statistics object cannot be saved.
• 使用 CREATE STATISTICS 陳述式來建立遺失的統計資料。Create the missing statistics by using the CREATE STATISTICS statement.

• 使用 DROP STATISTICS 陳述式刪除暫時統計資料。Delete temporary statistics using the DROP STATISTICS statement.
• 使用 sys.statssys.stats_columns 目錄檢視來監視統計資料。Monitor statistics using the sys.stats and sys.stats_columns catalog views. sys_stats 包含 is_temporary 資料行，以指示哪些統計資料為永久性及哪些統計資料為暫時性。sys_stats includes the is_temporary column, to indicate which statistics are permanent and which are temporary.

何時更新統計資料When to update statistics

• 查詢執行時間很慢。Query execution times are slow.
• 插入作業針對遞增或遞減索引鍵資料行進行。Insert operations occur on ascending or descending key columns.
• 在維護作業之後。After maintenance operations.

在維護作業之後After maintenance operations

SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4 開始，請使用 CREATE STATISTICS (Transact-SQL)UPDATE STATISTICS (Transact-SQL) 的 PERSIST_SAMPLE_PERCENT 選項，來針對後續不會明確指定取樣百分比的統計資料更新，設定並保留特定的取樣百分比。Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1 CU4, use the PERSIST_SAMPLE_PERCENT option of CREATE STATISTICS (Transact-SQL) or UPDATE STATISTICS (Transact-SQL), to set and retain a specific sampling percentage for subsequent statistic updates that do not explicitly specify a sampling percentage.

有效使用統計資料的查詢Queries that use statistics effectively

改善運算式的基數估計值Improving cardinality estimates for expressions

• 您應該盡可能簡化含有常數的運算式。Whenever possible, simplify expressions with constants in them. 在判斷基數估計值之前，查詢最佳化工具不會評估包含常數的所有函式和運算式。The Query Optimizer does not evaluate all functions and expressions containing constants prior to determining cardinality estimates. 例如，請將運算式 `ABS(-100)` 簡化為 `100`For example, simplify the expression `ABS(-100)` to `100`.

• 如果運算式使用多個變數，請考慮建立運算式的計算資料行，然後再針對計算資料行建立統計資料或索引。If the expression uses multiple variables, consider creating a computed column for the expression and then create statistics or an index on the computed column. 例如，如果您建立了 `WHERE PRICE + Tax > 100` 運算式的計算資料行， `Price + Tax`查詢述詞可能會具有較佳的基數估計值。For example, the query predicate `WHERE PRICE + Tax > 100` might have a better cardinality estimate if you create a computed column for the expression `Price + Tax`.

改善變數和函式的基數估計值Improving cardinality estimates for variables and functions

• 如果查詢述詞使用區域變數，請考慮將查詢重新撰寫成使用參數而非區域變數。If the query predicate uses a local variable, consider rewriting the query to use a parameter instead of a local variable. 當查詢最佳化工具建立查詢執行計畫時，並無法得知區域變數的值。The value of a local variable is not known when the Query Optimizer creates the query execution plan. 當查詢使用參數時，查詢最佳化工具就會使用傳遞給預存程序之第一個實際參數值的基數估計值。When a query uses a parameter, the Query Optimizer uses the cardinality estimate for the first actual parameter value that is passed to the stored procedure.

• 請考慮使用標準資料表或暫存資料表來保存多重陳述式資料表值函式 (mstvf) 的結果。Consider using a standard table or temporary table to hold the results of multi-statement table-valued functions (mstvf). 查詢最佳化工具不會針對多重陳述式資料表值函式建立統計資料。The Query Optimizer does not create statistics for multi-statement table-valued functions. 透過這種方法，查詢最佳化工具就可以建立資料表資料行的統計資料，然後使用它們來建立較佳的查詢計劃。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan.

• 請考慮使用標準資料表或暫存資料表當做資料表變數的取代項目。Consider using a standard table or temporary table as a replacement for table variables. 查詢最佳化工具不會針對資料表變數建立統計資料。The Query Optimizer does not create statistics for table variables. 透過這種方法，查詢最佳化工具就可以建立資料表資料行的統計資料，然後使用它們來建立較佳的查詢計劃。With this approach the Query Optimizer can create statistics on the table columns and use them to create a better query plan. 當您在判斷要使用暫存資料表或資料表變數時，存在權衡取捨。在預存程序中使用的資料表變數會讓預存程序重新編譯的次數比暫存資料表更少。There are tradeoffs in determining whether to use a temporary table or a table variable; Table variables used in stored procedures cause fewer recompilations of the stored procedure than temporary tables. 根據應用程式而定，使用暫存資料表來取代資料表變數可能不會改善效能。Depending on the application, using a temporary table instead of a table variable might not improve performance.

• 如果預存程序包含使用傳入參數的查詢，請避免在查詢中使用之前，變更預存程序中的參數值。If a stored procedure contains a query that uses a passed-in parameter, avoid changing the parameter value within the stored procedure before using it in the query. 查詢的基數估計值是以傳入參數而非更新的值為基礎。The cardinality estimates for the query are based on the passed-in parameter value and not the updated value. 若要避免變更參數值，您可以將查詢重新撰寫成使用兩個預存程序。To avoid changing the parameter value, you can rewrite the query to use two stored procedures.

例如，下列預存程序 `Sales.GetRecentSales` 會在 `@date` 為 NULL 時變更 `@date` 參數的值。For example, the following stored procedure `Sales.GetRecentSales` changes the value of the parameter `@date` when `@date` is NULL.

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetRecentSales;
GO
CREATE PROCEDURE Sales.GetRecentSales (@date datetime)
AS BEGIN
IF @date IS NULL
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
GO
``````

如果預存程序 `Sales.GetRecentSales` 的第一次呼叫傳遞 NULL 給 `@date` 參數，查詢最佳化工具就會使用 `@date = NULL` 的基數估計值來編譯此預存程序，即使沒有使用 `@date = NULL` 來呼叫查詢述詞也一樣。If the first call to the stored procedure `Sales.GetRecentSales` passes a NULL for the `@date` parameter, the Query Optimizer will compile the stored procedure with the cardinality estimate for `@date = NULL` even though the query predicate is not called with `@date = NULL`. 這個基數估計值可能會與實際查詢結果中的資料列數目具有大幅差異。This cardinality estimate might be significantly different than the number of rows in the actual query result. 因此，查詢最佳化工具可能會選擇到次佳的查詢計劃。As a result, the Query Optimizer might choose a suboptimal query plan. 為了協助避免這種情況發生，您可以將此預存程序重新撰寫成兩個程序，如下所示：To help avoid this, you can rewrite the stored procedure into two procedures as follows:

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetNullRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetNullRecentSales;
GO
CREATE PROCEDURE Sales.GetNullRecentSales (@date datetime)
AS BEGIN
IF @date is NULL
EXEC Sales.GetNonNullRecentSales @date;
END
GO
IF OBJECT_ID ( 'Sales.GetNonNullRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetNonNullRecentSales;
GO
CREATE PROCEDURE Sales.GetNonNullRecentSales (@date datetime)
AS BEGIN
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderID
AND h.OrderDate > @date
END
GO
``````

使用查詢提示來改善基數估計值Improving cardinality estimates with query hints

``````USE AdventureWorks2012;
GO
IF OBJECT_ID ( 'Sales.GetRecentSales', 'P') IS NOT NULL
DROP PROCEDURE Sales.GetRecentSales;
GO
CREATE PROCEDURE Sales.GetRecentSales (@date datetime)
AS BEGIN
IF @date is NULL