# 基数估计 (SQL Server)Cardinality Estimation (SQL Server)

SQL ServerSQL Server 查询优化器是基于成本的查询优化器。The SQL ServerSQL Server Query Optimizer is a cost-based Query Optimizer. 也就是说，它选择估计处理成本最低的查询计划。This means that it selects query plans that have the lowest estimated processing cost to execute. 查询优化器基于以下两个主要因素来确定执行查询计划的开销：The Query Optimizer determines the cost of executing a query plan based on two main factors:

• 查询计划每个级别上处理的总行数，称为该计划的基数。The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
• 由查询中所使用的运算符规定的算法的开销模式。The cost model of the algorithm dictated by the operators used in the query.

SQL ServerSQL Server 中的基数估计 (CE)主 要派生自创建索引或统计信息时所创建的直方图（以手动或自动方式）。Cardinality estimation (CE) in SQL ServerSQL Server is derived primarily from histograms that are created when indexes or statistics are created, either manually or automatically. 有时，SQL ServerSQL Server 还使用查询的约束信息和逻辑重写来确定基数。Sometimes, SQL ServerSQL Server also uses constraint information and logical rewrites of queries to determine cardinality.

• 带谓词的查询，这些查询在同一表的不同列之间使用比较运算符。Queries with predicates that use comparison operators between different columns of the same table.
• 带谓词的查询，这些查询使用运算符且下列任何一种情况为 True：Queries with predicates that use operators, and any one of the following are true:
• 运算符两侧所涉及的列中没有统计信息。There are no statistics on the columns involved on either side of the operators.
• 统计信息中值的分布不均匀，但查询将查找高选择性的值集。The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. 特别是，当运算符是除相等 (=) 运算符以外的任何其他运算符时，这种情况可能为 True。This situation can be especially true if the operator is anything other than the equality (=) operator.
• 谓词使用不等于 (!=) 比较运算符或 `NOT` 逻辑运算符。The predicate uses the not equal to (!=) comparison operator or the `NOT` logical operator.
• 使用任意 SQL Server 内置函数或标量值用户定义函数（其参数不是常量值）的查询。Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.
• 包含通过算术或字符串串联运算符联接的列的查询。Queries that involve joining columns through arithmetic or string concatenation operators.
• 比较在编译或优化查询时其值未知的变量的查询。Queries that compare variables whose values are not known when the query is compiled and optimized.

• OLTP（联机事务处理）查询，由于该查询运行相当频繁，因此它的多个实例经常同时运行。An OLTP (online transaction processing) query that runs so frequently that multiple instance of it often run concurrently.
• 在 OLTP 工作期间运行的具有大量聚合函数的 SELECT 查询。A SELECT with substantial aggregation that runs during your OLTP business hours.

## CE 的版本Versions of the CE

• 独立性： 假设不同列上的数据分布是独立于彼此的，除非可获取相关性信息且信息可用。Independence: Data distributions on different columns are assumed to be independent of each other, unless correlation information is available and usable.
• 一致性： 不同值均匀分布且具有相同的频率。Uniformity: Distinct values are evenly spaced and that they all have the same frequency. 更确切地说，是在每个直方图步骤中，不同值均匀分布，并且每个值都具有相同的频率。More precisely, within each histogram step, distinct values are evenly spread and each value has same frequency.
• 包容（简单）： 用户查询已存在的数据。Containment (Simple): Users query for data that exists. 例如，对于两个表之间的等值联接，在联接直方图以评估联接选择性之前，考虑每个输入直方图中的谓词选择性1For example, for an equality join between two tables, factor in the predicates selectivity1 in each input histogram, before joining histograms to estimate the join selectivity.
• 包含： 对于 `Column = Constant` 的筛选器谓词，通常假定事实上关联列存在常数。Inclusion: For filter predicates where `Column = Constant`, the constant is assumed to actually exist for the associated column. 如果相应的直方图步骤非空，则假定该步骤的其中一个不同值匹配谓词的值。If a corresponding histogram step is non-empty, one of the step's distinct values is assumed to match the value from the predicate.

1满足谓词的行计数。1 Row count that satisfies the predicate.

• 独立性转变为相关性 ：不同列值的组合不一定独立。Independence becomes Correlation: The combination of the different column values are not necessarily independent. 这可能类似于更真实的数据查询。This may resemble more real-life data querying.
• 简单包含转变为基本包含 ：用户可能查询不存在的数据。Simple Containment becomes Base Containment: Users might query for data that does not exist. 例如，对于两个表之间的等值联接，我们使用基本表直方图来评估联接选择性，然后考虑谓词选择性因素。For example, for an equality join between two tables, we use the base tables histograms to estimate the join selectivity, and then factor in the predicates selectivity.

``````SELECT ServerProperty('ProductVersion');
GO

ALTER DATABASE <yourDatabase>
SET COMPATIBILITY_LEVEL = 130;
GO

SELECT d.name, d.compatibility_level
FROM sys.databases AS d
WHERE d.name = 'yourDatabase';
GO
``````

``````ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;
GO

SELECT name, value
FROM sys.database_scoped_configurations
WHERE name = 'LEGACY_CARDINALITY_ESTIMATION';
GO
``````

``````SELECT CustomerId, OrderAddedDate
FROM OrderTable
OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
``````

``````ALTER DATABASE <yourDatabase>
SET QUERY_STORE = ON;
GO

SELECT q.actual_state_desc AS [actual_state_desc_of_QueryStore],
q.desired_state_desc,
q.query_capture_mode_desc
FROM sys.database_query_store_options AS q;
GO

ALTER DATABASE <yourDatabase>
SET QUERY_STORE CLEAR;
``````

``````DROP EVENT SESSION Test_the_CE_qoec_1 ON SERVER;
go

CREATE EVENT SESSION Test_the_CE_qoec_1
ON SERVER
(
ACTION (sqlserver.sql_text)
WHERE (
sql_text LIKE '%yourTable%'
and sql_text LIKE '%SUM(%'
)
)
(SET
filename = 'c:\temp\xe_qoec_1.xel',
);
GO

ALTER EVENT SESSION Test_the_CE_qoec_1
ON SERVER
STATE = START;  --STOP;
GO
``````

## 评估 CE 版本的步骤Steps to assess the CE version

1. 打开 Management StudioManagement StudioOpen Management StudioManagement Studio. 确保将 SQL ServerSQL Server 数据库设为最高可用兼容性级别。Ensure your SQL ServerSQL Serverdatabase is set to the highest available compatibility level.

2. 执行以下初始步骤：Perform the following preliminary steps:

1. 打开 Management StudioManagement StudioOpen Management StudioManagement Studio.

2. 运行 T-SQL，确保将 SQL ServerSQL Server 数据库设为最高可用兼容性级别。Run the T-SQL to ensure that your SQL ServerSQL Server database is set to the highest available compatibility level.

3. 确保数据库已关闭其 `LEGACY_CARDINALITY_ESTIMATION` 配置。Ensure that your database has its `LEGACY_CARDINALITY_ESTIMATION` configuration turned OFF.

4. 清除查询存储。CLEAR your query store. 当然，要确保查询存储已打开。Of course, ensure your query store is ON.

5. 运行语句：`SET NOCOUNT OFF;`Run the statement: `SET NOCOUNT OFF;`

3. 运行语句：`SET STATISTICS XML ON;`Run the statement: `SET STATISTICS XML ON;`

5. 在结果窗格的“消息” 选项卡上，记下实际受影响的行数。In the results pane, on the Messages tab, note the actual number of rows affected.

6. 在结果窗格的“结果” 选项卡上，双击包含 XML 格式的统计信息的单元格。In the results pane on the Results tab, double-click the cell that contains the statistics in XML format. 将显示图形查询计划。A graphic query plan is displayed.

7. 在图形查询计划的第一个框中右键单击，然后单击“属性” 。Right-click the first box in the graphic query plan, and then click Properties.

8. 针对后面的与不同配置的比较，请记下以下属性的值：For later comparison with a different configuration, note the values for the following properties:

• CardinalityEstimationModelVersionCardinalityEstimationModelVersion.

• 估计的行数Estimated Number of Rows.

• 估计的 I/O 成本，以及一些涉及实际性能而不是行数预测的类似的估计 属性。Estimated I/O Cost, and several similar Estimated properties that involve actual performance rather than row count predictions.

• 逻辑操作物理操作Logical Operation and Physical Operation. “并行”是一个不错的选择。Parallelism is a good value.

• 实际执行模式Actual Execution Mode. “批处理”是一个不错的选择，优于“行” 。Batch is a good value, better than Row.

9. 将估计的行数与实际行数进行比较。Compare the estimated number of rows to the actual number of rows. CE 的不准确率偏高或偏低 1% 还是 10%？Is the CE inaccurate by 1% (high or low), or by 10%?

10. 运行：`SET STATISTICS XML OFF;`Run: `SET STATISTICS XML OFF;`

11. 运行 T-SQL 以便将数据库的兼容性级别降低一个级别（如从 130 到 120）。Run the T-SQL to decrease the compatibility level of your database by one level (such as from 130 down to 120).

12. 重新运行所有非初始步骤。Rerun all the non-preliminary steps.

13. 比较这两次运行的 CE 属性值。Compare the CE property values from the two runs.

• 最新 CE 下的不准确率是不是小于较旧 CE 下的不准确率？Is the inaccuracy percentage under the newest CE less than under the older CE?
14. 最后，比较这两次运行中的各个性能属性值。Finally, compare the various performance property values from the two runs.

• 在这两个不同的 CE 估计下你的查询是否使用了不同的计划？Did your query use a different plan under the two differing CE estimations?

• 在最新 CE 下你的查询是否运行较缓慢？Did your query run slower under the latest CE?

• 除非查询在较旧 CE 下运行地更好，并且使用不同的计划，否则几乎可以肯定地想要最新的 CE。Unless your query runs better and with a different plan under the older CE, you almost certainly want the latest CE.

• 但是，如果在较旧的 CE 下查询使用较快的计划运行，则将考虑强制系统使用较快计划而忽略 CE。However, if your query runs with a faster plan under the older CE, consider forcing the system to use the faster plan and to ignore the CE. 这种方式可以让你在任何情况下拥有最新 CE，同时在一种独特的情况下保持使用较快计划。This way you can have the latest CE on for everything, while keeping the faster plan in the one odd case.

## 如何激活最佳查询计划How to activate the best query plan

1. 可以将整个数据库的兼容性级别设置为低于最新级别的值。You could set the compatibility level to a value lower than the latest available, for your whole database.

• 例如，将兼容性级别设置为 110 或更低会激活 CE 70，但这会使所有查询都受制于以前的 CE 模型。For example, setting the compatibility level 110 or lower activates CE 70, but it makes all queries subject to the previous CE model.

• 此外，设置较低的兼容性级别也会遗漏最新版本中查询优化器的大量改进。Further, setting a lower compatibility level also misses a number of improvements in the query optimizer for latest versions.

2. 可以使用 `LEGACY_CARDINALITY_ESTIMATION` 数据库选项使整个数据库使用较旧 CE，同时保留查询优化器中的其他改进。You could use `LEGACY_CARDINALITY_ESTIMATION` database option, to have the whole database use the older CE, while retaining other improvements in the query optimizer.

3. 可以使用 `LEGACY_CARDINALITY_ESTIMATION` 查询提示，让单个查询使用较旧 CE，同时保留查询优化器中的其他改进。You could use `LEGACY_CARDINALITY_ESTIMATION` query hint, to have a single query use the older CE, while retaining other improvements in the query optimizer.

### 如何强制使用特定的查询计划How to force a particular query plan

• 执行 sp_query_store_force_planExecute sp_query_store_force_plan.

• Management StudioManagement Studio 中，展开“查询存储”节点，右键单击“资源使用排名靠前的节点”，然后单击“查看资源使用排名靠前的节点” 。In Management StudioManagement Studio, expand your Query Store node, right-click Top Resource Consuming Nodes, and then click View Top Resource Consuming Nodes. 此时将显示“强制使用计划”和“取消强制使用计划”按钮。 The display shows buttons labeled Force Plan and Unforce Plan.

## CE 改进示例Examples of CE improvements

### 示例 A. CE 认为最大值可能大于最近收集统计信息时的值Example A. CE understands maximum value might be higher than when statistics were last gathered

`OrderAddedDate` 的最大值为 `2016-04-30` 时，假定上次在 `2016-04-30` 收集 `OrderTable` 的统计信息。Suppose statistics were last gathered for `OrderTable` on `2016-04-30`, when the maximum `OrderAddedDate` was `2016-04-30`. CE 120（和更高版本）认为数据按升序排序的 `OrderTable` 中的列的值可能大于由统计信息记录的最大值 。The CE 120 (and above version) understands that columns in `OrderTable` which have ascending data might have values larger than the maximum recorded by the statistics. 这种假设改进了 Transact-SQLTransact-SQL SELECT 语句的查询计划，如下所示。This understanding improves the query plan for Transact-SQLTransact-SQL SELECT statements such as the following.

``````SELECT CustomerId, OrderAddedDate
FROM OrderTable
``````

### 示例 B. CE 认为同一个表的筛选谓词通常是相关的Example B. CE understands that filtered predicates on the same table are often correlated

``````SELECT Model, Purchase_Price
FROM dbo.Hardware
WHERE Model = 'Xbox' AND
ModelVariant = 'One';
``````

### 示例 C. CE 不再假设不同表的筛选谓词之间存在任何相关性Example C. CE no longer assumes any correlation between filtered predicates from different tables

``````SELECT s.ticket, s.customer, r.store
FROM dbo.Sales    AS s
CROSS JOIN dbo.Returns  AS r
WHERE s.ticket = r.ticket AND
s.type = 'toy' AND
r.date = '2016-05-11';
``````