# 基數估計 (SQL Server)Cardinality Estimation (SQL Server)

SQL ServerSQL Server 查詢最佳化工具是以成本為基礎的查詢最佳化工具。The SQL ServerSQL Server Query Optimizer is a cost-based Query Optimizer. 這表示它會選取估計處理成本最低的查詢計畫來執行。This means that it selects query plans that have the lowest estimated processing cost to execute. 查詢最佳化工具根據兩個主要因素來判斷執行查詢計劃的成本：The Query Optimizer determines the cost of executing a query plan based on two main factors:

• 在查詢計畫的每一個層級進行處理的資料列總數，此稱為計畫的基數。The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
• 查詢中使用的運算子所指定的演算法成本模型。The cost model of the algorithm dictated by the operators used in the query.

SQL ServerSQL Server 中的基數估計主要衍生自於建立索引或統計資料時，手動或自動建立的長條圖。Cardinality estimation (CE) in SQL ServerSQL Server is derived primarily from histograms that are created when indexes or statistics are created, either manually or automatically. 有時候，SQL ServerSQL Server 也會使用條件約束資訊及查詢的邏輯重寫來判斷基數。Sometimes, SQL ServerSQL Server also uses constraint information and logical rewrites of queries to determine cardinality.

• 查詢的述詞，在相同資料表的不同資料行之間使用比較運算子。Queries with predicates that use comparison operators between different columns of the same table.
• 查詢的述詞使用運算子，且下列任一情況為真：Queries with predicates that use operators, and any one of the following are true:
• 運算子任一邊所關聯的資料行中，沒有任何統計資料。There are no statistics on the columns involved on either side of the operators.
• 統計資料中的值分佈並不平均，但查詢會搜尋具有高度選擇性的值集。The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. 如果運算子不是等號 (=) 運算子，此情況會特別明顯。This situation can be especially true if the operator is anything other than the equality (=) operator.
• 述詞使用不等於 (!=) 比較運算子或 `NOT` 邏輯運算子。The predicate uses the not equal to (!=) comparison operator or the `NOT` logical operator.
• 查詢，其使用任一個 SQL Server 內建函式，或引數不是常數值之純量值的使用者定義函式。Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.
• 查詢透過算術或字串串連運算子，與聯結資料行相關聯。Queries that involve joining columns through arithmetic or string concatenation operators.
• 查詢所比較的變數，在編譯及最佳化查詢時其值不明。Queries that compare variables whose values are not known when the query is compiled and optimized.

• 執行頻繁很高而經常同時執行多個執行個體的 OLTP (線上交易處理) 查詢。An OLTP (online transaction processing) query that runs so frequently that multiple instance of it often run concurrently.
• 在您的 OLTP 營業期間執行大量彙總的 SELECT。A SELECT with substantial aggregation that runs during your OLTP business hours.

## CE 的版本Versions of the CE

• 獨立性： 在不同資料行散發的資料會假設為各自獨立，除非提供可用的相互關聯資訊。Independence: Data distributions on different columns are assumed to be independent of each other, unless correlation information is available and usable.
• 一致性： 相異值會平均分佈，使其全都具有相同頻率。Uniformity: Distinct values are evenly spaced and that they all have the same frequency. 更明確地說，在每個長條圖步驟中，相異值會平均分布且各值都具有相同頻率。More precisely, within each histogram step, distinct values are evenly spread and each value has same frequency.
• 內含項目 (簡單)： 使用者查詢存在的資料。Containment (Simple): Users query for data that exists. 舉例來說，就兩個資料表間的相等聯結而言，會在聯結長條圖以預估聯結選擇性前，將各輸入長條圖中的述詞選擇性1 納入考量。For example, for an equality join between two tables, factor in the predicates selectivity1 in each input histogram, before joining histograms to estimate the join selectivity.
• 包含詞/句： 對於 `Column = Constant` 的篩選述詞而言，會假設相關聯資料行的常數實際存在。Inclusion: For filter predicates where `Column = Constant`, the constant is assumed to actually exist for the associated column. 若對應的長條圖步驟為非空白，則其中一個步驟的相異值會假設為符合來自述詞的值。If a corresponding histogram step is non-empty, one of the step's distinct values is assumed to match the value from the predicate.

1 滿足述詞的資料列計數。1 Row count that satisfies the predicate.

• 獨立性變成相互關聯：不同資料行值的結合不需要獨立。Independence becomes Correlation: The combination of the different column values are not necessarily independent. 這可能會更類似實際資料查詢。This may resemble more real-life data querying.
• 簡單內含項目變成基本內含項目：使用者可查詢不存在的資料。Simple Containment becomes Base Containment: Users might query for data that does not exist. 舉例來說，就兩個資料表間的相等聯結而言，我們會使用基底資料表長條圖來預估聯結選擇性，然後將述詞選擇性納入考量。For example, for an equality join between two tables, we use the base tables histograms to estimate the join selectivity, and then factor in the predicates selectivity.

``````SELECT ServerProperty('ProductVersion');
GO

ALTER DATABASE <yourDatabase>
SET COMPATIBILITY_LEVEL = 130;
GO

SELECT d.name, d.compatibility_level
FROM sys.databases AS d
WHERE d.name = 'yourDatabase';
GO
``````

``````ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;
GO

SELECT name, value
FROM sys.database_scoped_configurations
WHERE name = 'LEGACY_CARDINALITY_ESTIMATION';
GO
``````

``````SELECT CustomerId, OrderAddedDate
FROM OrderTable
OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
``````

``````ALTER DATABASE <yourDatabase>
SET QUERY_STORE = ON;
GO

SELECT q.actual_state_desc AS [actual_state_desc_of_QueryStore],
q.desired_state_desc,
q.query_capture_mode_desc
FROM sys.database_query_store_options AS q;
GO

ALTER DATABASE <yourDatabase>
SET QUERY_STORE CLEAR;
``````

``````DROP EVENT SESSION Test_the_CE_qoec_1 ON SERVER;
go

CREATE EVENT SESSION Test_the_CE_qoec_1
ON SERVER
(
ACTION (sqlserver.sql_text)
WHERE (
sql_text LIKE '%yourTable%'
and sql_text LIKE '%SUM(%'
)
)
(SET
filename = 'c:\temp\xe_qoec_1.xel',
);
GO

ALTER EVENT SESSION Test_the_CE_qoec_1
ON SERVER
STATE = START;  --STOP;
GO
``````

## 評估 CE 版本的步驟Steps to assess the CE version

1. 開啟 Management StudioManagement StudioOpen Management StudioManagement Studio. 確定您的 SQL ServerSQL Server 資料庫已設定為最高可用的相容性層級。Ensure your SQL ServerSQL Serverdatabase is set to the highest available compatibility level.

2. 執行下列預備步驟：Perform the following preliminary steps:

1. 開啟 Management StudioManagement StudioOpen Management StudioManagement Studio.

2. 執行 T-SQL，確定您的 SQL ServerSQL Server 資料庫已設定為最高可用的相容性層級。Run the T-SQL to ensure that your SQL ServerSQL Server database is set to the highest available compatibility level.

3. 確定您的資料庫已關閉其 `LEGACY_CARDINALITY_ESTIMATION` 組態。Ensure that your database has its `LEGACY_CARDINALITY_ESTIMATION` configuration turned OFF.

4. 「清除」您的查詢存放區。CLEAR your query store. 當然，請確定您的查詢存放區處於開啟狀態。Of course, ensure your query store is ON.

5. 執行陳述式：`SET NOCOUNT OFF;`Run the statement: `SET NOCOUNT OFF;`

3. 執行陳述式：`SET STATISTICS XML ON;`Run the statement: `SET STATISTICS XML ON;`

5. 記下結果窗格中 [訊息] 索引標籤上實際受影響的資料列數目。In the results pane, on the Messages tab, note the actual number of rows affected.

6. 在結果窗格的 [結果] 索引標籤上，按兩下包含 XML 格式統計資料的資料格。In the results pane on the Results tab, double-click the cell that contains the statistics in XML format. 圖形查詢計劃隨即顯示。A graphic query plan is displayed.

7. 以滑鼠右鍵按一下圖形查詢計劃中的第一個方塊，然後按一下 [屬性] 。Right-click the first box in the graphic query plan, and then click Properties.

8. 為了在稍後比較不同的組態，請記下下列屬性的值：For later comparison with a different configuration, note the values for the following properties:

• [CardinalityEstimationModelVersion] 。CardinalityEstimationModelVersion.

• [估計的資料列數目] 。Estimated Number of Rows.

• [估計的 I/O 成本] ，以及涉及實際效能 (而不是資料列計數預測) 的幾個類似 [估計] 屬性。Estimated I/O Cost, and several similar Estimated properties that involve actual performance rather than row count predictions.

• [邏輯作業] 和 [實體作業] 。Logical Operation and Physical Operation. [平行處理原則] 是正確值。Parallelism is a good value.

• [實際的執行模式] 。Actual Execution Mode. [批次] 是正確值，優於 [資料列] 。Batch is a good value, better than Row.

9. 比較估計的資料列數目與實際的資料列數目。Compare the estimated number of rows to the actual number of rows. CE 誤差是 1% (高或低)，或是 10%？Is the CE inaccurate by 1% (high or low), or by 10%?

10. 執行：`SET STATISTICS XML OFF;`Run: `SET STATISTICS XML OFF;`

11. 執行 T-SQL 將您的資料庫相容性層級降低一個層級 (例如從 130 降低到 120)。Run the T-SQL to decrease the compatibility level of your database by one level (such as from 130 down to 120).

12. 重新執行所有非預備步驟。Rerun all the non-preliminary steps.

13. 比較兩次執行的 CE 屬性值。Compare the CE property values from the two runs.

• 最新 CE 的誤差百分比是否小於舊版 CE？Is the inaccuracy percentage under the newest CE less than under the older CE?
14. 最後，比較這兩次執行的各種效能屬性值。Finally, compare the various performance property values from the two runs.

• 您的查詢是否在這兩個不同的 CE 估計下使用不同的計劃？Did your query use a different plan under the two differing CE estimations?

• 您的查詢在最新的 CE 下是否執行得較慢？Did your query run slower under the latest CE?

• 除非您的查詢在舊版 CE 下的執行效果更佳且使用不同的計劃，否則幾乎可以確定您需要最新的 CE。Unless your query runs better and with a different plan under the older CE, you almost certainly want the latest CE.

• 不過，如果您的查詢在舊版 CE 下執行更快速的計劃，請考慮強制系統使用更快速的計劃並忽略此 CE。However, if your query runs with a faster plan under the older CE, consider forcing the system to use the faster plan and to ignore the CE. 如此一來，您不但可以針對所有項目使用最新的 CE，同時也可以在偶爾的情況下保留更快速的計劃。This way you can have the latest CE on for everything, while keeping the faster plan in the one odd case.

## 如何啟用最佳查詢計劃How to activate the best query plan

1. 您可以將整個資料庫的相容性層級設定成比最新可用的值更低。You could set the compatibility level to a value lower than the latest available, for your whole database.

• 例如，將相容性層級設為 110 或更低來啟用 CE 70，不過這會使所有查詢受限於先前的 CE 模型。For example, setting the compatibility level 110 or lower activates CE 70, but it makes all queries subject to the previous CE model.

• 此外，設定較低的相容性層級，也會錯過最新版本中查詢最佳化工具的一些改善。Further, setting a lower compatibility level also misses a number of improvements in the query optimizer for latest versions.

2. 您可以使用 `LEGACY_CARDINALITY_ESTIMATION` 資料庫選項讓整個資料庫使用舊版 CE，同時保留查詢最佳化工具的改善。You could use `LEGACY_CARDINALITY_ESTIMATION` database option, to have the whole database use the older CE, while retaining other improvements in the query optimizer.

3. 您可以使用 `LEGACY_CARDINALITY_ESTIMATION` 查詢提示，讓單一資料庫使用舊版 CE，同時保留查詢最佳化工具的改善。You could use `LEGACY_CARDINALITY_ESTIMATION` query hint, to have a single query use the older CE, while retaining other improvements in the query optimizer.

### 如何強制執行特定查詢計劃How to force a particular query plan

• 執行 sp_query_store_force_planExecute sp_query_store_force_plan.

• Management StudioManagement Studio 中，展開您的 [查詢存放區] 節點，以滑鼠右鍵按一下 [Top Resource Consuming Nodes] (資源耗用量排名在前的節點) ，然後按一下 [View Top Resource Consuming Nodes] (檢視資源耗用量排名在前的節點) 。In Management StudioManagement Studio, expand your Query Store node, right-click Top Resource Consuming Nodes, and then click View Top Resource Consuming Nodes. 這會顯示標示為 [強制執行計畫][取消強制執行計畫] 的按鈕。The display shows buttons labeled Force Plan and Unforce Plan.

## CE 改善範例Examples of CE improvements

### 範例 A：CE 認為最大值可能比上次收集統計資料時還要高Example A. CE understands maximum value might be higher than when statistics were last gathered

``````SELECT CustomerId, OrderAddedDate
FROM OrderTable
``````

### 範例 B：CE 認為相同資料表上的篩選述詞通常相互關聯Example B. CE understands that filtered predicates on the same table are often correlated

CE 120 起，SQL ServerSQL Server 可認知相同資料表上的兩個資料行 [`Model`] 和 [`ModelVariant`] 之間可能會相互關聯。Starting with CE 120, SQL ServerSQL Server understands there might be a correlation between the two columns on the same table, `Model` and `ModelVariant`. CE 可更準確地的估計查詢所要傳回的資料列數目，而且查詢最佳化工具能產生更佳的計劃。The CE makes a more accurate estimation of how many rows will be returned by the query, and the query optimizer generates a more optimal plan.

``````SELECT Model, Purchase_Price
FROM dbo.Hardware
WHERE Model = 'Xbox' AND
ModelVariant = 'One';
``````

### 範例 C：CE 不會再假設來自不同資料表的篩選述詞之間有任何相互關聯Example C. CE no longer assumes any correlation between filtered predicates from different tables

``````SELECT s.ticket, s.customer, r.store
FROM dbo.Sales    AS s
CROSS JOIN dbo.Returns  AS r
WHERE s.ticket = r.ticket AND
s.type = 'toy' AND
r.date = '2016-05-11';
``````