基数估计 (SQL Server)Cardinality Estimation (SQL Server)

适用对象:是SQL Server 是Azure SQL 数据库 否Azure Synapse Analytics (SQL DW) 否并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database noAzure Synapse Analytics (SQL DW) noParallel Data Warehouse

SQL ServerSQL Server 查询优化器是基于成本的查询优化器。The SQL ServerSQL Server Query Optimizer is a cost-based Query Optimizer. 也就是说,它选择估计处理成本最低的查询计划。This means that it selects query plans that have the lowest estimated processing cost to execute. 查询优化器基于以下两个主要因素来确定执行查询计划的开销:The Query Optimizer determines the cost of executing a query plan based on two main factors:

  • 查询计划每个级别上处理的总行数,称为该计划的基数。The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
  • 由查询中所使用的运算符规定的算法的开销模式。The cost model of the algorithm dictated by the operators used in the query.

第一个因素(基数)用作第二个因素(开销模式)的输入参数。The first factor, cardinality, is used as an input parameter of the second factor, the cost model. 因此,增大基数将减少估计开销,从而加快执行计划。Therefore, improved cardinality leads to better estimated costs and, in turn, faster execution plans.

SQL ServerSQL Server 中的基数估计 (CE)主 要派生自创建索引或统计信息时所创建的直方图(以手动或自动方式)。Cardinality estimation (CE) in SQL ServerSQL Server is derived primarily from histograms that are created when indexes or statistics are created, either manually or automatically. 有时,SQL ServerSQL Server 还使用查询的约束信息和逻辑重写来确定基数。Sometimes, SQL ServerSQL Server also uses constraint information and logical rewrites of queries to determine cardinality.

在下列情况下,SQL ServerSQL Server 无法精确计算基数。In the following cases, SQL ServerSQL Server cannot accurately calculate cardinalities. 这会导致成本计算不准确,进而可能导致查询计划欠佳。This causes inaccurate cost calculations that may cause suboptimal query plans. 避免在查询中使用这些构造可以提高查询性能。Avoiding these constructs in queries may improve query performance. 有时,使用查询表达式或其他措施也可以提高查询性能,如下所述:Sometimes, alternative query formulations or other measures are possible and these are pointed out:

  • 带谓词的查询,这些查询在同一表的不同列之间使用比较运算符。Queries with predicates that use comparison operators between different columns of the same table.
  • 带谓词的查询,这些查询使用运算符且下列任何一种情况为 True:Queries with predicates that use operators, and any one of the following are true:
    • 运算符两侧所涉及的列中没有统计信息。There are no statistics on the columns involved on either side of the operators.
    • 统计信息中值的分布不均匀,但查询将查找高选择性的值集。The distribution of values in the statistics is not uniform, but the query seeks a highly selective value set. 特别是,当运算符是除相等 (=) 运算符以外的任何其他运算符时,这种情况可能为 True。This situation can be especially true if the operator is anything other than the equality (=) operator.
    • 谓词使用不等于 (!=) 比较运算符或 NOT 逻辑运算符。The predicate uses the not equal to (!=) comparison operator or the NOT logical operator.
  • 使用任意 SQL Server 内置函数或标量值用户定义函数(其参数不是常量值)的查询。Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument is not a constant value.
  • 包含通过算术或字符串串联运算符联接的列的查询。Queries that involve joining columns through arithmetic or string concatenation operators.
  • 比较在编译或优化查询时其值未知的变量的查询。Queries that compare variables whose values are not known when the query is compiled and optimized.

本文将阐释如何评估和选择系统的最佳 CE 配置。This article illustrates how you can assess and choose the best CE configuration for your system. 大多数系统受益于最新的 CE,因为它最准确。Most systems benefit from the latest CE because it is the most accurate. CE 将预测查询可能返回的行数。The CE predicts how many rows your query will likely return. 查询优化器使用基数预测来生成最佳查询计划。The cardinality prediction is used by the Query Optimizer to generate the optimal query plan. 通过更准确的估计,查询优化器通常可以更好地生成更优查询计划。With more accurate estimations, the Query Optimizer can usually do a better job of producing a more optimal query plan.

你的应用程序系统可能具有重要的查询,其计划由于新的 CE 而更改为较慢计划。Your application system could possibly have an important query whose plan is changed to a slower plan due to the new CE. 此类查询可能为以下任一种:Such a query might be like one of the following:

  • OLTP(联机事务处理)查询,由于该查询运行相当频繁,因此它的多个实例经常同时运行。An OLTP (online transaction processing) query that runs so frequently that multiple instance of it often run concurrently.
  • 在 OLTP 工作期间运行的具有大量聚合函数的 SELECT 查询。A SELECT with substantial aggregation that runs during your OLTP business hours.

你可以使用技术来识别因新的 CE 而执行变慢的查询。You have techniques for identifying a query that performs slower with the new CE. 你也可以选择如何解决此性能问题。And you have options for how to address the performance issue.

CE 的版本Versions of the CE

在 1998 年,CE 的重大更新是 SQL ServerSQL Server 7.0 的一部分,其兼容级别为 70。In 1998, a major update of the CE was part of SQL ServerSQL Server 7.0, for which the compatibility level was 70. 此版本的 CE 模型建立在四个基本假设之上:This version of the CE model is set on four basic assumptions:

  • 独立性: 假设不同列上的数据分布是独立于彼此的,除非可获取相关性信息且信息可用。Independence: Data distributions on different columns are assumed to be independent of each other, unless correlation information is available and usable.
  • 一致性: 不同值均匀分布且具有相同的频率。Uniformity: Distinct values are evenly spaced and that they all have the same frequency. 更确切地说,是在每个直方图步骤中,不同值均匀分布,并且每个值都具有相同的频率。More precisely, within each histogram step, distinct values are evenly spread and each value has same frequency.
  • 包容(简单): 用户查询已存在的数据。Containment (Simple): Users query for data that exists. 例如,对于两个表之间的等值联接,在联接直方图以评估联接选择性之前,考虑每个输入直方图中的谓词选择性1For example, for an equality join between two tables, factor in the predicates selectivity1 in each input histogram, before joining histograms to estimate the join selectivity.
  • 包含: 对于 Column = Constant 的筛选器谓词,通常假定事实上关联列存在常数。Inclusion: For filter predicates where Column = Constant, the constant is assumed to actually exist for the associated column. 如果相应的直方图步骤非空,则假定该步骤的其中一个不同值匹配谓词的值。If a corresponding histogram step is non-empty, one of the step's distinct values is assumed to match the value from the predicate.

1满足谓词的行计数。1 Row count that satisfies the predicate.

后续更新从 SQL Server 2014 (12.x)SQL Server 2014 (12.x) 开始,意味着兼容性级别为 120 及以上。Subsequent updates started with SQL Server 2014 (12.x)SQL Server 2014 (12.x), meaning compatibility levels 120 and above. 级别 120 及以上的 CE 更新中引入了已更新的假设和算法,非常适用于现代数据仓库和 OLTP 工作负荷。The CE updates for levels 120 and above incorporate updated assumptions and algorithms that work well on modern data warehousing and on OLTP workloads. 从 CE 70 假设开始,以下模型假设已自 CE 120 起更改:From the CE 70 assumptions, the following model assumptions were changed starting with CE 120:

  • 独立性转变为相关性 :不同列值的组合不一定独立。Independence becomes Correlation: The combination of the different column values are not necessarily independent. 这可能类似于更真实的数据查询。This may resemble more real-life data querying.
  • 简单包含转变为基本包含 :用户可能查询不存在的数据。Simple Containment becomes Base Containment: Users might query for data that does not exist. 例如,对于两个表之间的等值联接,我们使用基本表直方图来评估联接选择性,然后考虑谓词选择性因素。For example, for an equality join between two tables, we use the base tables histograms to estimate the join selectivity, and then factor in the predicates selectivity.

兼容性级别: 通过使用以下 COMPATIBILITY_LEVELTransact-SQLTransact-SQL 代码,可以确保数据库位于特定级别。Compatibility level: You can ensure your database is at a particular level by using the following Transact-SQLTransact-SQL code for COMPATIBILITY_LEVEL.

SELECT ServerProperty('ProductVersion');  
GO  
  
ALTER DATABASE <yourDatabase>  
SET COMPATIBILITY_LEVEL = 130;  
GO  
  
SELECT d.name, d.compatibility_level  
FROM sys.databases AS d  
WHERE d.name = 'yourDatabase';  
GO  

对于在兼容级别 120 及以上设置的 SQL ServerSQL Server 数据库,激活跟踪标志 9481 会强制系统使用 CE 版本 70。For a SQL ServerSQL Server database set at compatibility level 120 or above, activation of the trace flag 9481 forces the system to use the CE version 70.

旧版 CE: 对于在兼容级别 120 及以上设置的 SQL ServerSQL Server 数据库,CE 版本 70 可通过在数据库级别使用 ALTER DATABASE SCOPED CONFIGURATION 来激活。Legacy CE: For a SQL ServerSQL Server database set at compatibility level 120 and above, the CE version 70 can be can be activated by using the at the database level by using the ALTER DATABASE SCOPED CONFIGURATION.

ALTER DATABASE SCOPED CONFIGURATION 
SET LEGACY_CARDINALITY_ESTIMATION = ON;  
GO  
  
SELECT name, value  
FROM sys.database_scoped_configurations  
WHERE name = 'LEGACY_CARDINALITY_ESTIMATION';  
GO

或者从 SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1,查询提示 USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION') 开始。Or starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x) SP1, the Query Hint USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION').

SELECT CustomerId, OrderAddedDate  
FROM OrderTable  
WHERE OrderAddedDate >= '2016-05-01'
OPTION (USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION'));  

查询存储:SQL Server 2016 (13.x)SQL Server 2016 (13.x) 开始,查询存储是用于检查查询性能的一种方便的工具。Query store: Starting with SQL Server 2016 (13.x)SQL Server 2016 (13.x), the query store is a handy tool for examining the performance of your queries. Management StudioManagement Studio 中的数据库节点下的对象资源管理器中,当查询存储启用时,显示“查询存储”节点 。In Management StudioManagement Studio, in the Object Explorer under your database node, a Query Store node is displayed when the query store is enabled.

ALTER DATABASE <yourDatabase>  
SET QUERY_STORE = ON;  
GO  
  
SELECT q.actual_state_desc AS [actual_state_desc_of_QueryStore],  
        q.desired_state_desc,  
        q.query_capture_mode_desc  
FROM sys.database_query_store_options AS q;  
GO  
  
ALTER DATABASE <yourDatabase>  
SET QUERY_STORE CLEAR;  

提示

建议安装最新版本的 Management Studio 并且经常更新。We recommend that you install the latest release of Management Studio and update it often.

跟踪基数估计过程的另一种方法是使用名为 query_optimizer_estimate_cardinality 的扩展事件 。Another option for tracking the cardinality estimation process is to use the extended event named query_optimizer_estimate_cardinality. 以下 Transact-SQLTransact-SQL 代码示例在 SQL ServerSQL Server 上运行。The following Transact-SQLTransact-SQL code sample runs on SQL ServerSQL Server. 它将 .xel 文件写入 C:\Temp\(尽管可以更改路径)。It writes a .xel file to C:\Temp\ (although you can change the path). Management StudioManagement Studio 中打开此 .xel 文件时,其详细信息将以用户友好的方式显示。When you open the .xel file in Management StudioManagement Studio, its detailed information is displayed in a user friendly manner.

DROP EVENT SESSION Test_the_CE_qoec_1 ON SERVER;  
go  
  
CREATE EVENT SESSION Test_the_CE_qoec_1  
ON SERVER  
ADD EVENT sqlserver.query_optimizer_estimate_cardinality  
    (  
        ACTION (sqlserver.sql_text)  
            WHERE (  
                sql_text LIKE '%yourTable%'  
                and sql_text LIKE '%SUM(%'  
            )  
    )  
ADD TARGET package0.asynchronous_file_target   
        (SET  
            filename = 'c:\temp\xe_qoec_1.xel',  
            metadatafile = 'c:\temp\xe_qoec_1.xem'  
        );  
GO  
  
ALTER EVENT SESSION Test_the_CE_qoec_1  
ON SERVER  
STATE = START;  --STOP;  
GO  

有关为 SQL 数据库SQL Database 定制的扩展事件的信息,请参阅 SQL 数据库中的扩展事件For information about extended events as tailored for SQL 数据库SQL Database, see Extended events in SQL Database.

评估 CE 版本的步骤Steps to assess the CE version

以下步骤用于评估当使用最新 CE 时最重要的查询的执行是否不佳。Next are steps you can use to assess whether any of your most important queries perform less well under the latest CE. 其中一些步骤通过运行上一节中提供的代码示例来执行。Some of the steps are performed by running a code sample presented in a preceding section.

  1. 打开 Management StudioManagement StudioOpen Management StudioManagement Studio. 确保将 SQL ServerSQL Server 数据库设为最高可用兼容性级别。Ensure your SQL ServerSQL Serverdatabase is set to the highest available compatibility level.

  2. 执行以下初始步骤:Perform the following preliminary steps:

    1. 打开 Management StudioManagement StudioOpen Management StudioManagement Studio.

    2. 运行 T-SQL,确保将 SQL ServerSQL Server 数据库设为最高可用兼容性级别。Run the T-SQL to ensure that your SQL ServerSQL Server database is set to the highest available compatibility level.

    3. 确保数据库已关闭其 LEGACY_CARDINALITY_ESTIMATION 配置。Ensure that your database has its LEGACY_CARDINALITY_ESTIMATION configuration turned OFF.

    4. 清除查询存储。CLEAR your query store. 当然,要确保查询存储已打开。Of course, ensure your query store is ON.

    5. 运行语句:SET NOCOUNT OFF;Run the statement: SET NOCOUNT OFF;

  3. 运行语句:SET STATISTICS XML ON;Run the statement: SET STATISTICS XML ON;

  4. 运行重要的查询。Run your important query.

  5. 在结果窗格的“消息” 选项卡上,记下实际受影响的行数。In the results pane, on the Messages tab, note the actual number of rows affected.

  6. 在结果窗格的“结果” 选项卡上,双击包含 XML 格式的统计信息的单元格。In the results pane on the Results tab, double-click the cell that contains the statistics in XML format. 将显示图形查询计划。A graphic query plan is displayed.

  7. 在图形查询计划的第一个框中右键单击,然后单击“属性” 。Right-click the first box in the graphic query plan, and then click Properties.

  8. 针对后面的与不同配置的比较,请记下以下属性的值:For later comparison with a different configuration, note the values for the following properties:

    • CardinalityEstimationModelVersionCardinalityEstimationModelVersion.

    • 估计的行数Estimated Number of Rows.

    • 估计的 I/O 成本,以及一些涉及实际性能而不是行数预测的类似的估计 属性。Estimated I/O Cost, and several similar Estimated properties that involve actual performance rather than row count predictions.

    • 逻辑操作物理操作Logical Operation and Physical Operation. “并行”是一个不错的选择。Parallelism is a good value.

    • 实际执行模式Actual Execution Mode. “批处理”是一个不错的选择,优于“行” 。Batch is a good value, better than Row.

  9. 将估计的行数与实际行数进行比较。Compare the estimated number of rows to the actual number of rows. CE 的不准确率偏高或偏低 1% 还是 10%?Is the CE inaccurate by 1% (high or low), or by 10%?

  10. 运行:SET STATISTICS XML OFF;Run: SET STATISTICS XML OFF;

  11. 运行 T-SQL 以便将数据库的兼容性级别降低一个级别(如从 130 到 120)。Run the T-SQL to decrease the compatibility level of your database by one level (such as from 130 down to 120).

  12. 重新运行所有非初始步骤。Rerun all the non-preliminary steps.

  13. 比较这两次运行的 CE 属性值。Compare the CE property values from the two runs.

    • 最新 CE 下的不准确率是不是小于较旧 CE 下的不准确率?Is the inaccuracy percentage under the newest CE less than under the older CE?
  14. 最后,比较这两次运行中的各个性能属性值。Finally, compare the various performance property values from the two runs.

    • 在这两个不同的 CE 估计下你的查询是否使用了不同的计划?Did your query use a different plan under the two differing CE estimations?

    • 在最新 CE 下你的查询是否运行较缓慢?Did your query run slower under the latest CE?

    • 除非查询在较旧 CE 下运行地更好,并且使用不同的计划,否则几乎可以肯定地想要最新的 CE。Unless your query runs better and with a different plan under the older CE, you almost certainly want the latest CE.

    • 但是,如果在较旧的 CE 下查询使用较快的计划运行,则将考虑强制系统使用较快计划而忽略 CE。However, if your query runs with a faster plan under the older CE, consider forcing the system to use the faster plan and to ignore the CE. 这种方式可以让你在任何情况下拥有最新 CE,同时在一种独特的情况下保持使用较快计划。This way you can have the latest CE on for everything, while keeping the faster plan in the one odd case.

如何激活最佳查询计划How to activate the best query plan

假设在使用 CE 120 的或更高版本情况下,针对查询生成了效率较低的查询计划。Suppose that with CE 120 or above, a less efficient query plan is generated for your query. 以下是一些可以用来激活更好计划的方法:Here are some options you have to activate the better plan:

  1. 可以将整个数据库的兼容性级别设置为低于最新级别的值。You could set the compatibility level to a value lower than the latest available, for your whole database.

    • 例如,将兼容性级别设置为 110 或更低会激活 CE 70,但这会使所有查询都受制于以前的 CE 模型。For example, setting the compatibility level 110 or lower activates CE 70, but it makes all queries subject to the previous CE model.

    • 此外,设置较低的兼容性级别也会遗漏最新版本中查询优化器的大量改进。Further, setting a lower compatibility level also misses a number of improvements in the query optimizer for latest versions.

  2. 可以使用 LEGACY_CARDINALITY_ESTIMATION 数据库选项使整个数据库使用较旧 CE,同时保留查询优化器中的其他改进。You could use LEGACY_CARDINALITY_ESTIMATION database option, to have the whole database use the older CE, while retaining other improvements in the query optimizer.

  3. 可以使用 LEGACY_CARDINALITY_ESTIMATION 查询提示,让单个查询使用较旧 CE,同时保留查询优化器中的其他改进。You could use LEGACY_CARDINALITY_ESTIMATION query hint, to have a single query use the older CE, while retaining other improvements in the query optimizer.

为了实现更好的控制,可以强制系统在测试期间使用通过 CE 70 生成的计划 。For the finest control, you could force the system to use the plan that was generated with CE 70 during your testing. 固定首选计划后,可以将整个数据库设置为使用最新兼容性级别和 CE。 After you pin your preferred plan, you can set your whole database to use the latest compatibility level and CE. 该方法将在后面详细说明。The option is elaborated next.

如何强制使用特定的查询计划How to force a particular query plan

查询存储提供了不同方式来强制系统使用特定的查询计划:The query store gives you different ways that you can force the system to use a particular query plan:

  • 执行 sp_query_store_force_planExecute sp_query_store_force_plan.

  • Management StudioManagement Studio 中,展开“查询存储”节点,右键单击“资源使用排名靠前的节点”,然后单击“查看资源使用排名靠前的节点” 。In Management StudioManagement Studio, expand your Query Store node, right-click Top Resource Consuming Nodes, and then click View Top Resource Consuming Nodes. 此时将显示“强制使用计划”和“取消强制使用计划”按钮。 The display shows buttons labeled Force Plan and Unforce Plan.

有关查询存储的详细信息,请参阅 《Monitoring Performance By Using the Query Store》(使用查询存储监控性能)。For more information about the query store, see Monitoring Performance By Using the Query Store.

CE 改进示例Examples of CE improvements

本节介绍了从最新版本的 CE 中实施的改进中获益的示例查询。This section describes example queries that benefit from the enhancements implemented in the CE in recent releases. 这是背景信息,不需要你的具体操作。This is background information that does not call for specific action on your part.

示例 A. CE 认为最大值可能大于最近收集统计信息时的值Example A. CE understands maximum value might be higher than when statistics were last gathered

OrderAddedDate 的最大值为 2016-04-30 时,假定上次在 2016-04-30 收集 OrderTable 的统计信息。Suppose statistics were last gathered for OrderTable on 2016-04-30, when the maximum OrderAddedDate was 2016-04-30. CE 120(和更高版本)认为数据按升序排序的 OrderTable 中的列的值可能大于由统计信息记录的最大值 。The CE 120 (and above version) understands that columns in OrderTable which have ascending data might have values larger than the maximum recorded by the statistics. 这种假设改进了 Transact-SQLTransact-SQL SELECT 语句的查询计划,如下所示。This understanding improves the query plan for Transact-SQLTransact-SQL SELECT statements such as the following.

SELECT CustomerId, OrderAddedDate  
FROM OrderTable  
WHERE OrderAddedDate >= '2016-05-01';  

示例 B. CE 认为同一个表的筛选谓词通常是相关的Example B. CE understands that filtered predicates on the same table are often correlated

在下面的 SELECT 语句中我们看到 ModelModelVariant 的筛选谓词。In the following SELECT we see filtered predicates on Model and ModelVariant. 我们直观地了解到,当 Model 是“Xbox”时,ModelVariant 有可能是“One”,因为 Xbox 有一个名为 One 的变体。We intuitively understand that when Model is 'Xbox' there is a chance the ModelVariant is 'One', given that Xbox has a variant called One.

从 CE 120 开始,SQL ServerSQL Server 认为同一表中 ModelModelVariant 两个列之间存在相关性。Starting with CE 120, SQL ServerSQL Server understands there might be a correlation between the two columns on the same table, Model and ModelVariant. CE 对于查询将返回多少行进行更准确的估计,并且查询优化器将生成更优的计划。The CE makes a more accurate estimation of how many rows will be returned by the query, and the query optimizer generates a more optimal plan.

SELECT Model, Purchase_Price  
FROM dbo.Hardware  
WHERE Model = 'Xbox' AND  
      ModelVariant = 'One';  

示例 C. CE 不再假设不同表的筛选谓词之间存在任何相关性Example C. CE no longer assumes any correlation between filtered predicates from different tables

对现代工作负荷和实际业务数据的延伸性的新研究表明,从不同表中筛选的谓词通常没有相互关联性。With extense new research on modern workloads and actual business data reveal that predicate filters from different tables usually do not correlate with each other. 在下面的查询中,CE 假设 s.typer.date 之间没有相关性。In the following query, the CE assumes there is no correlation between s.type and r.date. 因此,CE 对于返回的行数有一个偏低的估计值。Therefore the CE makes a lower estimate of the number of rows returned.

SELECT s.ticket, s.customer, r.store  
FROM dbo.Sales    AS s  
CROSS JOIN dbo.Returns  AS r  
WHERE s.ticket = r.ticket AND  
      s.type = 'toy' AND  
      r.date = '2016-05-11';  

另请参阅See Also

监视和优化性能 Monitor and Tune for Performance
使用 SQL Server 2014 基数估算器优化查询计划Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator
查询提示 Query Hints
USE HINT 查询提示 USE HINT Query Hints
使用查询优化助手升级数据库 Upgrading Databases by using the Query Tuning Assistant
相关视图、函数和过程 Monitoring Performance By Using the Query Store
查询处理体系结构指南Query Processing Architecture Guide