标量 UDF 内联Scalar UDF Inlining

适用于:是SQL Server是Azure SQL 数据库否Azure SQL 数据仓库否并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

本文介绍了标量 UDF 内联,这是智能查询处理功能套件下的一项功能。This article introduces Scalar UDF inlining, a feature under the intelligent query processing suite of features. 此功能提高了在 SQL ServerSQL Server(从 SQL Server 2019(预览版)SQL Server 2019 preview 开始)和 SQL 数据库SQL Database 中调用标量 UDF 的查询性能。This feature improves the performance of queries that invoke scalar UDFs in SQL ServerSQL Server (starting with SQL Server 2019(预览版)SQL Server 2019 preview) and SQL 数据库SQL Database.

基于用户定义的 T-SQL 标量T-SQL Scalar User-Defined Functions

在 Transact-SQL 中实现并返回单个数据值的用户定义函数称为 T-SQL 标量用户定义函数。User-Defined Functions that are implemented in Transact-SQL and return a single data value are referred to as T-SQL Scalar User-Defined Functions. T-SQL 的 UDF 是所有 SQL 查询实现代码重用和模块化的简洁方法。T-SQL UDFs are an elegant way to achieve code reuse and modularity across SQL queries. 某些计算(如复杂的业务规则)在命令性 UDF 窗体中更易表示。Some computations (such as complex business rules) are easier to express in imperative UDF form. UDF 有助于构建复杂的逻辑,而无需编写复杂 SQL 查询的专业知识。UDFs help in building up complex logic without requiring expertise in writing complex SQL queries.

标量 UDF 的性能Performance of Scalar UDFs

标量 UDF 通常会由于以下原因造成性能欠佳。Scalar UDFs typically end up performing poorly due to the following reasons.

  • 迭代调用: 以迭代方式调用 UDF,每个符合条件的元组一次。Iterative invocation: UDFs are invoked in an iterative manner, once per qualifying tuple. 由于函数调用,这会导致重复上下文切换的额外成本。This incurs additional costs of repeated context switching due to function invocation. 尤其是在其定义中执行 SQL 查询的 UDF 会受到严重影响。Especially, UDFs that execute SQL queries in their definition are severely affected.
  • 缺乏成本计算: 在优化期间,只有关系运算符会计算成本,标量运算符则不计算成本。Lack of costing: During optimization, only relational operators are costed, while scalar operators are not. 在引入标量 UDF 之前,其他标量运算符通常很便宜并且不需要成本计算。Prior to the introduction of scalar UDFs, other scalar operators were generally cheap and did not require costing. 为标量运算添加的小 CPU 成本就足够了。A small CPU cost added for a scalar operation was enough. 有些情况下实际成本很高,但仍然没有得到充分代表。There are scenarios where the actual cost is significant, and yet still remains underrepresented.
  • 解释型执行: UDF 以一批语句的形式进行计算,并按逐个语句执行。Interpreted execution: UDFs are evaluated as a batch of statements, executed statement-by-statement. 编译每个语句本身,并缓存编译的计划。Each statement itself is compiled, and the compiled plan is cached. 尽管此缓存策略能够通过避免重新编译节省一些时间,但每个语句仍需单独执行。Although this caching strategy saves some time as it avoids recompilations, each statement executes in isolation. 不执行跨语句优化。No cross-statement optimizations are carried out.
  • 串行执行: SQL Server 不允许在调用 UDF 的查询中进行查询内并行操作。Serial execution: SQL Server does not allow intra-query parallelism in queries that invoke UDFs.

标量 UDF 自动内联Automatic Inlining of Scalar UDFs

标量 UDF 内联功能的目标是提高调用 T-SQL 标量 UDF 的查询性能,其中 UDF 执行是主要瓶颈。The goal of the Scalar UDF inlining feature is to improve performance of queries that invoke T-SQL scalar UDFs, where UDF execution is the main bottleneck.

使用此新功能,标量 UDF 会自动转换为标量表达式或在调用查询中替换 UDF 运算符的标量子查询。With this new feature, scalar UDFs are automatically transformed into scalar expressions or scalar subqueries that are substituted in the calling query in place of the UDF operator. 然后优化这些表达式和子查询。These expressions and subqueries are then optimized. 因此,查询计划将不再具有用户定义的函数运算符,但其效果将在计划中观察到,如视图或内联 TVF。As a result, the query plan will no longer have a user-defined function operator, but its effects will be observed in the plan, like views or inline TVFs.

示例 1 - 单个语句标量 UDFExample 1 - Single statement scalar UDF

请考虑以下查询Consider the following query

SELECT L_SHIPDATE, O_SHIPPRIORITY, SUM (L_EXTENDEDPRICE *(1 - L_DISCOUNT)) 
FROM LINEITEM
INNER JOIN ORDERS
  ON O_ORDERKEY = L_ORDERKEY 
GROUP BY L_SHIPDATE, O_SHIPPRIORITY ORDER BY L_SHIPDATE;

此查询计算订单项的折扣价格总和,并显示按发货日期和发货优先级分组的结果。This query computes the sum of discounted prices for line items and presents the results grouped by the shipping date and shipping priority. 表达式 L_EXTENDEDPRICE *(1 - L_DISCOUNT) 是给定行项的折扣价格的公式。The expression L_EXTENDEDPRICE *(1 - L_DISCOUNT) is the formula for the discounted price for a given line item. 可以将这些公式提取到函数中以利用模块化和重用优势。Such formulas can be extracted into functions for the benefit of modularity and reuse.

CREATE FUNCTION dbo.discount_price(@price DECIMAL(12,2), @discount DECIMAL(12,2)) 
RETURNS DECIMAL (12,2) AS
BEGIN
  RETURN @price * (1 - @discount);
END

现在可以修改查询以调用此 UDF。Now the query can be modified to invoke this UDF.

SELECT L_SHIPDATE, O_SHIPPRIORITY, SUM (dbo.discount_price(L_EXTENDEDPRICE, L_DISCOUNT)) 
FROM LINEITEM
INNER JOIN ORDERS
  ON O_ORDERKEY = L_ORDERKEY 
GROUP BY L_SHIPDATE, O_SHIPPRIORITY ORDER BY L_SHIPDATE

由于之前所述的原因,在查询中使用 UDF 表现较差。Due to the reasons outlined earlier, the query with the UDF performs poorly. 现在,使用标量 UDF 内联,UDF 主体中的标量表达式将直接替换为查询。Now, with scalar UDF inlining, the scalar expression in the body of the UDF is substituted directly in the query. 运行此查询的结果显示在下表:The results of running this query are shown in the below table:

查询:Query: 不使用 UDF 进行查询Query without UDF 使用 UDF 进行查询(不内联)Query with UDF (without inlining) 使用标量 UDF 内联进行查询Query with scalar UDF inlining
执行时间:Execution time: 1.6 秒1.6 seconds 29 分 11 秒29 minutes 11 seconds 1.6 秒1.6 seconds

这些数字基于 10 GB CCI 数据库(使用 TPC-H 架构),在具有双处理器(12 核)、96 GB RAM且由 SSD 支持的计算机上运行。These numbers are based on a 10-GB CCI database (using the TPC-H schema), running on a machine with dual processor (12 core), 96-GB RAM, backed by SSD. 这些数字包括冷程序缓存和缓冲池的编译和执行时间。The numbers include compilation and execution time with a cold procedure cache and buffer pool. 使用了默认配置,且未创建其他索引。The default configuration was used, and no other indexes were created.

示例 2 - 多语句标量 UDFExample 2 - Multi-statement scalar UDF

使用多个 T-SQL 语句(如变量赋值和条件分支)实现的标量 UDF 也可以进行内联。Scalar UDFs that are implemented using multiple T-SQL statements such as variable assignments and conditional branching can also be inlined. 考虑以下给定客户密钥并且确定该客户的服务类别的标量 UDF。Consider the following scalar UDF that, given a customer key, determines the service category for that customer. 它首先使用 SQL 查询计算客户所下订单的总价来确定类别。It arrives at the category by first computing the total price of all orders placed by the customer using a SQL query. 然后,使用 IF-ELSE 逻辑确定基于总价的类别。Then, it uses an IF-ELSE logic to decide the category based on the total price.

CREATE OR ALTER FUNCTION dbo.customer_category(@ckey INT) 
RETURNS CHAR(10) AS
BEGIN
  DECLARE @total_price DECIMAL(18,2);
  DECLARE @category CHAR(10);

  SELECT @total_price = SUM(O_TOTALPRICE) FROM ORDERS WHERE O_CUSTKEY = @ckey;

  IF @total_price < 500000
    SET @category = 'REGULAR';
  ELSE IF @total_price < 1000000
    SET @category = 'GOLD';
  ELSE 
    SET @category = 'PLATINUM';

  RETURN @category;
END

现在,请考虑使用调用此 UDF 的查询。Now, consider a query that invokes this UDF.

SELECT C_NAME, dbo.customer_category(C_CUSTKEY) FROM CUSTOMER;

SQL Server 2017 (兼容级别 140 及更早版本)中此查询的执行计划如下:The execution plan for this query in SQL Server 2017 (compatibility level 140 and earlier) is as follows:

没有内联的查询计划

正如计划所示,SQL Server 在这里采用了一个简单的策略:对于 CUSTOMER 表中的每个元组,调用 UDF 并输出结果。As the plan shows, SQL Server adopts a simple strategy here: for every tuple in the CUSTOMER table, invoke the UDF and output the results. 此策略还不成熟且效率低下。This strategy is naïve and inefficient. 通过内联,这些 UDF 被转换为等效的标量子查询,它们在调用查询中代替 UDF。With inlining, such UDFs are transformed into equivalent scalar subqueries, which are substituted in the calling query in place of the UDF.

对于相同的查询,UDF 内联计划如下所示。For the same query, the plan with the UDF inlined looks as below.

使用内联的查询计划

如同之前提到的,查询计划不再具有用户定义的函数运算符,但其效果能在计划中观察到,如视图或内联 TVF。As mentioned earlier, the query plan no longer has a user-defined function operator, but its effects are now observable in the plan, like views or inline TVFs. 以下是一些来自上述计划的关键观测值:Here are some key observations from the above plan:

  1. SQL Server 推断出了 CUSTOMERORDERS 之间的隐式联接,并通过联接运算符将其显式化。SQL Server has inferred the implicit join between CUSTOMER and ORDERS and made that explicit via a join operator.
  2. SQL Server 也推断出了隐式 GROUP BY O_CUSTKEY on ORDERS 并使用 IndexSpool + StreamAggregate 实现了它。SQL Server has also inferred the implicit GROUP BY O_CUSTKEY on ORDERS and has used the IndexSpool + StreamAggregate to implement it.
  3. SQL Server 现在在所有运算符中都使用并行。SQL Server is now using parallelism across all operators.

根据 UDF 中逻辑的复杂性,所生成的查询计划也可能变得更大更复杂。Depending upon the complexity of the logic in the UDF, the resulting query plan might also get bigger and more complex. 我们可以看到,UDF 中的操作现在不再是黑盒,因此查询优化器能够降低成本并优化这些操作。As we can see, the operations inside the UDF are now no longer a black box, and hence the query optimizer is able to cost and optimize those operations. 此外,由于 UDF 不再在计划中,因此将用完全避免函数调用开销的计划来取代迭代 UDF 调用。Also, since the UDF is no longer in the plan, iterative UDF invocation is replaced by a plan that completely avoids function call overhead.

可内联标量 UDF 要求Inlineable Scalar UDFs requirements

如果满足所有以下条件,则标量 T-SQL UDF 可以内联:A scalar T-SQL UDF can be inline if all of the following conditions are true:

  • UDF 使用以下构造编写:The UDF is written using the following constructs:
    • DECLARESET:变量声明和赋值。DECLARE, SET: Variable declaration and assignments.
    • SELECT设置用户帐户 :具有单个/多个变量赋值的 SQL 查询1SELECT: SQL query with single/multiple variable assignments1.
    • IF/ELSE:具有任意级别嵌套的分支。IF/ELSE: Branching with arbitrary levels of nesting.
    • RETURN设置用户帐户 :单个或多个返回语句。RETURN: Single or multiple return statements.
    • UDF设置用户帐户 :嵌套/递归函数调用2UDF: Nested/recursive function calls2.
    • 其他:关系操作,例如 EXISTSISNULLOthers: Relational operations such as EXISTS, ISNULL.
  • UDF 不会调用任何与时间相关的内部函数(例如 GETDATE())或具有副作用的函数3(例如 NEWSEQUENTIALID())。The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects3 (such as NEWSEQUENTIALID()).
  • UDF使用 EXECUTE AS CALLER 子句(如果未指定 EXECUTE AS 子句,则为默认行为)。The UDF uses the EXECUTE AS CALLER clause (the default behavior if the EXECUTE AS clause is not specified).
  • UDF 不引用表变量或表值参数。The UDF does not reference table variables or table-valued parameters.
  • 调用标量 UDF 的查询不会在其 GROUP BY 子句中引用标量 UDF 调用。The query invoking a scalar UDF does not reference a scalar UDF call in its GROUP BY clause.
  • 使用 DISTINCT 子句在其 SELECT 列表中调用标量 UDF 的查询不会在其 ORDER BY 子句中引用标量 UDF 调用。The query invoking a scalar UDF in its select list with DISTINCT clause does not reference a scalar UDF call in its ORDER BY clause.
  • UDF 不是本机编译的(支持互操作)。The UDF is not natively compiled (interop is supported).
  • UDF 不用于计算列或检查约束定义。The UDF is not used in a computed column or a check constraint definition.
  • UDF 不引用用户定义类型。The UDF does not reference user-defined types.
  • 没有签名添加到 UDF。There are no signatures added to the UDF.
  • UDF 不是配分函数。The UDF is not a partition function.

1 带变量累计/聚合(例如 SELECT @val += col1 FROM table1)的 SELECT 不支持内联。1 SELECT with variable accumulation/aggregation (for example, SELECT @val += col1 FROM table1) is not supported for inlining.

2 递归 UDF 将仅内联到某个深度。2 Recursive UDFs will be inlined to a certain depth only.

3 结果取决于当前系统时间的内部函数与时间相关。3 Intrinsic functions whose results depend upon the current system time are time-dependent. 举例来说,可以更新某些内部全局状态的内部函数就是具有副作用的函数。An intrinsic function that may update some internal global state is an example of a function with side effects. 每次调用此类函数都会根据内部状态返回不同的结果。Such functions return different results each time they are called, based on the internal state.

检查 UDF 是否可以内联Checking whether or not a UDF can be inlined

对于每个 T-SQL 标量 UDF,sys.sql_modules 目录视图都包含一个名为 is_inlineable 的属性,指示 UDF 是否可内联。For every T-SQL scalar UDF, the sys.sql_modules catalog view includes a property called is_inlineable, which indicates whether a UDF is inlineable or not. 值为 1 则表明可内联,0 则表示不可内联。A value of 1 indicates that it is inlineable, and 0 indicates otherwise. 对于所有内联 TVF,此属性的值均为 1。This property will have a value of 1 for all inline TVFs as well. 对于所有其他模块,此值都将为 0。For all other modules, the value will be 0.

备注

如果标量 UDF 可内联,这并不意味着它将始终进行内联。If a scalar UDF is inlineable, it does not imply that it will always be inlined. SQL Server 将基于每个查询和每个 UDF 决定是否内联 UDF。SQL Server will decide (on a per-query, per-UDF basis) whether to inline a UDF or not. 例如,如果 UDF 定义遇到数千行代码,SQL Server 可能会选择不内联它。For instance, if the UDF definition runs into thousands of lines of code, SQL Server might choose not to inline it. 另一个例子是 GROUP BY 子句中的 UDF - 不会内联。Another example is a UDF in a GROUP BY clause - which will not be inlined. 在编译引用标量 UDF 的查询时做出这种决定。This decision is made when the query referencing a scalar UDF is compiled.

检查是否发生内联Checking whether inlining has happened or not

如果满足所有前置条件,且 SQL Server 决定执行内联,则它会将 UDF 转换为关系表达式。If all the preconditions are satisfied and SQL Server decides to perform inlining, it transforms the UDF into a relational expression. 可以很容易从查询计划中弄清楚是否内联:From the query plan, it is easy to figure out whether inlining has happened or not:

  • 对于已成功内联的 UDF,计划 XML 将没有 <UserDefinedFunction> XML 节点。The plan xml will not have a <UserDefinedFunction> xml node for a UDF that has been inlined successfully.
  • 发出某些 XEvent。Certain XEvents are emitted.

启用标量 UDF 内联Enabling scalar UDF inlining

可以通过对数据库启用兼容性级别 150 使工作负荷自动符合标量 UDF 内联。You can make workloads automatically eligible for scalar UDF inlining by enabling compatibility level 150 for the database.  可使用 Transact-SQL 进行此设置。  You can set this using Transact-SQL. 例如: For example:

ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150;

除此之外,不需要对 UDF 或查询进行其他更改即可利用此功能。Apart from this, there are no other changes required to be made to UDFs or queries to take advantage of this feature.

在不更改兼容级别的情况下禁用标量 UDF 内联Disabling Scalar UDF inlining without changing the compatibility level

可在数据库、语句或 UDF 范围内禁用标量 UDF 内联,同时将数据库兼容性级别维持在 150 或更高。Scalar UDF inlining can be disabled at the database, statement, or UDF scope while still maintaining database compatibility level 150 and higher. 要在数据库范围内禁用标量 UDF 内联,请在适用数据库的上下文中执行以下语句:To disable scalar UDF inlining at the database scope, execute the following statement within the context of the applicable database:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF;

要在数据库范围内重新启用标量 UDF 内联,请在适用数据库的上下文中执行以下语句:To re-enable scalar UDF inlining for the database, execute the following statement within the context of the applicable database:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;

启用此选项后,此设置将在 sys.database_scoped_configurations 中显示为已启用。When ON, this setting will appear as enabled in sys.database_scoped_configurations. 还可通过将 DISABLE_TSQL_SCALAR_UDF_INLINING 指定为 USE HINT 查询提示来禁用特定查询的标量 UDF 内联。You can also disable scalar UDF inlining for a specific query by designating DISABLE_TSQL_SCALAR_UDF_INLINING as a USE HINT query hint. 例如:For example:

SELECT L_SHIPDATE, O_SHIPPRIORITY, SUM (dbo.discount_price(L_EXTENDEDPRICE, L_DISCOUNT)) 
FROM LINEITEM
INNER JOIN ORDERS
  ON O_ORDERKEY = L_ORDERKEY 
GROUP BY L_SHIPDATE, O_SHIPPRIORITY ORDER BY L_SHIPDATE
OPTION (USE HINT('DISABLE_TSQL_SCALAR_UDF_INLINING'));

USE HINT 查询提示优先于数据库范围配置或兼容级别设置。A USE HINT query hint takes precedence over the database scoped configuration or compatibility level setting.

也可使用 CREATE FUNCTIONALTER FUNCTION 语句中的 INLINE 子句为特定 UDF 禁用标量 UDF 内联。Scalar UDF inlining can also be disabled for a specific UDF using the INLINE clause in the CREATE FUNCTION or ALTER FUNCTION statement. 例如:For example:

CREATE OR ALTER FUNCTION dbo.discount_price(@price DECIMAL(12,2), @discount DECIMAL(12,2))
RETURNS DECIMAL (12,2)
WITH INLINE = OFF
AS
BEGIN
    RETURN @price * (1 - @discount);
END

执行上述语句后,此 UDF 永远不会被内联到任何调用它的查询中。Once the above statement is executed, this UDF will never be inlined into any query that invokes it. 要重新启用此 UDF 的内联,请执行以下语句:To re-enable inlining for this UDF, execute the following statement:

CREATE OR ALTER FUNCTION dbo.discount_price(@price DECIMAL(12,2), @discount DECIMAL(12,2))
RETURNS DECIMAL (12,2)
WITH INLINE = ON
AS
BEGIN
    RETURN @price * (1 - @discount);
END

备注

INLINE 子句不是强制性的。The INLINE clause is not mandatory. 如果未指定 INLINE 子句,则会根据是否可以内联 UDF 将其自动设置为 ON/OFFIf INLINE clause is not specified, it is automatically set to ON/OFF based on whether the UDF can be inlined. 如果指定了 INLINE=ON 但发现 UDF 不符合内联条件,则会引发错误。If INLINE=ON is specified but the UDF is found ineligible for inlining, an error will be thrown.

重要说明Important Notes

如本文所述,标量 UDF 内联将带有标量 UDF 的查询转换为具有等效标量子查询的查询。As described in this article, scalar UDF inlining transforms a query with scalar UDFs into a query with an equivalent scalar subquery. 由于此转换,用户可能会注意到以下情况中的某些行为差异:Due to this transformation, users may notice some differences in behavior in the following scenarios:

  1. 内联将导致相同查询文本的不同查询哈希。Inlining will result in a different query hash for the same query text.
  2. UDF 内部语句中某些可能已被隐藏的警告(例如除以零等),可能会因内联而显示出来。Certain warnings in statements inside the UDF (such as divide by zero etc.) which might have been hidden earlier, might show up due to inlining.
  3. 查询级别联接提示可能不再有效,因为内联可能会引入新联接。Query level join hints might not be valid anymore, as inlining may introduce new joins. 必须改为使用本地联接提示。Local join hints will have to be used instead.
  4. 无法索引引用内联标量UDF的视图。Views that reference inline scalar UDFs cannot be indexed. 如果需要在此类视图上创建索引,请禁用对引用的 UDF 的内联。If you need to create an index on such views, disable inlining for the referenced UDFs.
  5. 动态数据屏蔽与 UDF 内联的行为可能存在一些差异。There might be some differences in the behavior of Dynamic Data masking with UDF inlining. 在某些情况下(取决于 UDF 中的逻辑),内联可能是更保守的 w.r.t 屏蔽输出列。In certain situations (depending upon the logic in the UDF), inlining might be more conservative w.r.t masking output columns. 如果 UDF 中引用的列不是输出列,则它们不会被屏蔽。In scenarios where the columns referenced in a UDF are not output columns, they will not be masked.
  6. 如果 UDF 引用内置函数(如 SCOPE_IDENTITY()),内置函数返回的值将随内联而变化。If a UDF references built-in functions such as SCOPE_IDENTITY(), the value returned by the built-in function will change with inlining. 这种行为上的更改是因为内联更改了 UDF 中语句的范围。This change in behavior is because inlining changes the scope of statements inside the UDF.

另请参阅See Also

SQL Server 数据库引擎和 Azure SQL 数据库的性能中心Performance Center for SQL Server Database Engine and Azure SQL Database

查询处理体系结构指南Query Processing Architecture Guide

Showplan 逻辑运算符和物理运算符参考Showplan Logical and Physical Operators Reference

联接Joins

演示自适应查询处理Demonstrating Adaptive Query Processing