联接 (SQL Server)Joins (SQL Server)

适用对象:yesSQL ServeryesAzure SQL 数据库yesAzure SQL 数据仓库yes并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

SQL ServerSQL Server 使用内存中的排序和哈希联接技术执行排序、交集、并集、差分等操作。performs sort, intersect, union, and difference operations using in-memory sorting and hash join technology. SQL ServerSQL Server 利用这种类型的查询计划支持垂直表分区(有时称为分列存储)。Using this type of query plan, SQL ServerSQL Server supports vertical table partitioning, sometimes called columnar storage.

SQL ServerSQL Server 使用三种类型的联接操作:employs three types of join operations:

  • 嵌套循环联接Nested Loops joins
  • 合并联接Merge joins
  • 哈希联接Hash joins

联接基础知识Join Fundamentals

通过联接,可以从两个或多个表中根据各个表之间的逻辑关系来检索数据。By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. 联接指明了 Microsoft SQL Server 应如何使用一个表中的数据来选择另一个表中的行。Joins indicate how Microsoft SQL Server should use data from one table to select the rows in another table.

联接条件可通过以下方式定义两个表在查询中的关联方式:A join condition defines the way two tables are related in a query by:

  • 指定每个表中要用于联接的列。Specifying the column from each table to be used for the join. 典型的联接条件在一个表中指定一个外键,而在另一个表中指定与其关联的键。A typical join condition specifies a foreign key from one table and its associated key in the other table.
  • 指定用于比较各列的值的逻辑运算符(例如 = 或 <>)。Specifying a logical operator (for example, = or <>,) to be used in comparing values from the columns.

可以在 FROMWHERE 子句中指定内部联接。Inner joins can be specified in either the FROM or WHERE clauses. 只能在 FROM 子句中指定外部联接。Outer joins can be specified in the FROM clause only. 联接条件与 WHEREHAVING 搜索条件相结合,用于控制从 FROM 子句所引用的基表中选定的行。The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause.

FROM 子句中指定联接条件有助于将这些联接条件与 WHERE 子句中可能指定的其他任何搜索条件分开,建议用这种方法来指定联接。Specifying the join conditions in the FROM clause helps separate them from any other search conditions that may be specified in a WHERE clause, and is the recommended method for specifying joins. 简化的 ISO FROM 子句联接语法如下:A simplified ISO FROM clause join syntax is:

FROM first_table join_type second_table [ON (join_condition)]

join_type 指定执行的联接类型:内部、外部或交叉联接。join_type specifies what kind of join is performed: an inner, outer, or cross join. join_condition 定义用于对每一对联接行进行求值的谓词。join_condition defines the predicate to be evaluated for each pair of joined rows. 下面是 FROM 子句联接规范示例:The following is an example of a FROM clause join specification:

FROM Purchasing.ProductVendor JOIN Purchasing.Vendor
     ON (ProductVendor.BusinessEntityID = Vendor.BusinessEntityID)

下面是使用此联接的一个简单 SELECT 语句:The following is a simple SELECT statement using this join:

SELECT ProductID, Purchasing.Vendor.BusinessEntityID, Name
FROM Purchasing.ProductVendor JOIN Purchasing.Vendor
    ON (Purchasing.ProductVendor.BusinessEntityID = Purchasing.Vendor.BusinessEntityID)
WHERE StandardPrice > $10
  AND Name LIKE N'F%'
GO

此选择会返回某个公司所提供的一组产品以及供应商信息,该公司名以字母 F 开头,并且产品价格在 10 美元以上。The select returns the product and supplier information for any combination of parts supplied by a company for which the company name starts with the letter F and the price of the product is more than $10.

当在单个查询中引用多个表时,所有列引用都必须是明确的。When multiple tables are referenced in a single query, all column references must be unambiguous. 在上面的示例中,ProductVendor 和 Vendor 表都具有一个名为 BusinessEntityID 的列。In the previous example, both the ProductVendor and Vendor table have a column named BusinessEntityID. 在查询所引用的两个或多个表中,任何重复的列名都必须用表名加以限定。Any column name that is duplicated between two or more tables referenced in the query must be qualified with the table name. 此示例中对 Vendor 列的所有引用均已限定。All references to the Vendor columns in the example are qualified.

如果某个列名在查询用到的两个或多个表中不重复,则对该列的引用就必用表名加以限定。When a column name is not duplicated in two or more tables used in the query, references to it do not have to be qualified with the table name. 如上例所示。This is shown in the previous example. 由于没有指明提供每个列的表,因此这样的 SELECT 语句有时会难以理解。Such a SELECT statement is sometimes difficult to understand because there is nothing to indicate the table that provided each column. 如果所有的列都用它们的表名加以限定,将会提高查询的可读性。The readability of the query is improved if all columns are qualified with their table names. 如果使用了表的别名,将会进一步提高可读性,尤其是当表名自身必须用数据库名和所有者名加以限定时。The readability is further improved if table aliases are used, especially when the table names themselves must be qualified with the database and owner names. 下例与上例相同,只不过分配了表的别名并且用表的别名对列加以限定,从而提高了可读性:The following is the same example, except that table aliases have been assigned and the columns qualified with table aliases to improve readability:

SELECT pv.ProductID, v.BusinessEntityID, v.Name
FROM Purchasing.ProductVendor AS pv 
JOIN Purchasing.Vendor AS v
    ON (pv.BusinessEntityID = v.BusinessEntityID)
WHERE StandardPrice > $10
    AND Name LIKE N'F%';

上例是在 FROM 子句中指定联接条件的,这是首选的方法。The previous examples specified the join conditions in the FROM clause, which is the preferred method. 下列查询包含相同的联接条件,该联接条件在 WHERE 子句中指定:The following query contains the same join condition specified in the WHERE clause:

SELECT pv.ProductID, v.BusinessEntityID, v.Name
FROM Purchasing.ProductVendor AS pv, Purchasing.Vendor AS v
WHERE pv.BusinessEntityID=v.BusinessEntityID
    AND StandardPrice > $10
    AND Name LIKE N'F%';

联接选择列表可以引用联接表中的所有列或任意一部分列。The select list for a join can reference all the columns in the joined tables, or any subset of the columns. 选择列表不必包含联接中每个表的列。The select list is not required to contain columns from every table in the join. 例如,在三表联接中,只能用一个表作为中间表来联接另外两个表,而选择列表不必引用该中间表的任何列。For example, in a three-table join, only one table can be used to bridge from one of the other tables to the third table, and none of the columns from the middle table have to be referenced in the select list.

虽然联接条件通常使用相等比较 (=),但也可以像指定其他谓词一样指定其他比较运算符或关系运算符。Although join conditions usually have equality comparisons (=), other comparison or relational operators can be specified, as can other predicates. 有关详细信息,请参阅比较运算符 (Transact-SQL)WHERE (Transact-SQL)For more information, see Comparison Operators (Transact-SQL) and WHERE (Transact-SQL).

SQL ServerSQL Server 处理联接时,查询引擎会从多种可行的方法中选择最有效的方法来处理联接。When SQL ServerSQL Server processes joins, the query engine chooses the most efficient method (out of several possibilities) of processing the join. 由于各种联接的实际执行过程会采用多种不同的优化,因此无法可靠地预测。The physical execution of various joins can use many different optimizations and therefore cannot be reliably predicted.

联接条件中用到的列不必具有相同的名称或相同的数据类型。Columns used in a join condition are not required to have the same name or be the same data type. 但如果数据类型不相同,则必须兼容,或者是可由 SQL Server 进行隐式转换的类型。However, if the data types are not identical, they must be compatible, or be types that SQL Server can implicitly convert. 如果数据类型不能进行隐式转换,则联接条件必须使用 CAST 函数显式转换数据类型。If the data types cannot be implicitly converted, the join condition must explicitly convert the data type using the CAST function. 有关隐式和显式转换的详细信息,请参阅数据类型转换(数据库引擎)For more information about implicit and explicit conversions, see Data Type Conversion (Database Engine).

大多数使用联接的查询可以用子查询(嵌套在其他查询中的查询)重写,并且大多数子查询可以重写为联接。Most queries using a join can be rewritten using a subquery (a query nested within another query), and most subqueries can be rewritten as joins. 有关子查询的详细信息,请参阅子查询For more information about subqueries, see Subqueries.

备注

不能在 ntext、text 或 image 列上直接联接表。Tables cannot be joined directly on ntext, text, or image columns. 但可以使用 SUBSTRING 在 ntext、text 或 image 列上间接联接表。However, tables can be joined indirectly on ntext, text, or image columns by using SUBSTRING.
例如,SELECT * FROM t1 JOIN t2 ON SUBSTRING(t1.textcolumn, 1, 20) = SUBSTRING(t2.textcolumn, 1, 20) 可对表 t1 和 t2 中每个文本列的前 20 个字符进行两表内部联接。For example, SELECT * FROM t1 JOIN t2 ON SUBSTRING(t1.textcolumn, 1, 20) = SUBSTRING(t2.textcolumn, 1, 20) performs a two-table inner join on the first 20 characters of each text column in tables t1 and t2.
此外,另一种可以采用的比较两个表中 ntext 或 text 列的方法是用 WHERE 子句比较这些列的长度,例如:WHERE DATALENGTH(p1.pr_info) = DATALENGTH(p2.pr_info)In addition, another possibility for comparing ntext or text columns from two tables is to compare the lengths of the columns with a WHERE clause, for example: WHERE DATALENGTH(p1.pr_info) = DATALENGTH(p2.pr_info)

了解嵌套循环联接Understanding Nested Loops joins

如果一个联接输入很小(不到 10 行),而另一个联接输入很大而且已在其联接列上创建了索引,则索引 Nested Loops 连接是最快的联接操作,因为它们需要的 I/O 和比较都最少。If one join input is small (fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons.

嵌套循环联接也称为嵌套迭代,它将一个联接输入用作外部输入表(显示为图形执行计划中的顶端输入),将另一个联接输入用作内部(底端)输入表。The nested loops join, also called nested iteration, uses one join input as the outer input table (shown as the top input in the graphical execution plan) and one as the inner (bottom) input table. 外部循环逐行处理外部输入表。The outer loop consumes the outer input table row by row. 内部循环会针对每个外部行执行,在内部输入表中搜索匹配行。The inner loop, executed for each outer row, searches for matching rows in the inner input table.

最简单的情况是,搜索时扫描整个表或索引;这称为单纯嵌套循环联接。In the simplest case, the search scans an entire table or index; this is called a naive nested loops join. 如果搜索时使用索引,则称为索引嵌套循环联接。If the search exploits an index, it is called an index nested loops join. 如果将索引生成为查询计划的一部分(并在查询完成后立即将索引破坏),则称为临时索引嵌套循环联接。If the index is built as part of the query plan (and destroyed upon completion of the query), it is called a temporary index nested loops join. 查询优化器考虑了所有这些不同情况。All these variants are considered by the Query Optimizer.

如果外部输入较小而内部输入较大且预先创建了索引,则嵌套循环联接尤其有效。A nested loops join is particularly effective if the outer input is small and the inner input is preindexed and large. 在许多小事务中(如那些只影响较小的一组行的事务),索引嵌套循环联接优于合并联接和哈希联接。In many small transactions, such as those affecting only a small set of rows, index nested loops joins are superior to both merge joins and hash joins. 但在大型查询中,嵌套循环联接通常不是最佳选择。In large queries, however, nested loops joins are often not the optimal choice.

了解合并联接Understanding Merge joins

如果两个联接输入并不小但已在二者联接列上排序(例如,如果它们是通过扫描已排序的索引获得的),则合并联接是最快的联接操作。If the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), a merge join is the fastest join operation. 如果两个联接输入都很大,而且这两个输入的大小差不多,则预先排序的合并联接提供的性能与哈希联接相近。If both join inputs are large and the two inputs are of similar sizes, a merge join with prior sorting and a hash join offer similar performance. 但是,如果这两个输入的大小相差很大,则哈希联接操作通常快得多。However, hash join operations are often much faster if the two input sizes differ significantly from each other.

合并联接要求两个输入都在合并列上排序,而合并列由联接谓词的等效 (ON) 子句定义。The merge join requires both inputs to be sorted on the merge columns, which are defined by the equality (ON) clauses of the join predicate. 通常,查询优化器扫描索引(如果在适当的一组列上存在索引),或在合并联接的下面放一个排序运算符。The query optimizer typically scans an index, if one exists on the proper set of columns, or it places a sort operator below the merge join. 在极少数情况下,虽然可能有多个等效子句,但只用其中一些可用的等效子句获得合并列。In rare cases, there may be multiple equality clauses, but the merge columns are taken from only some of the available equality clauses.

由于每个输入都已排序,因此 Merge Join 运算符将从每个输入获取一行并将其进行比较。Because each input is sorted, the Merge Join operator gets a row from each input and compares them. 例如,对于内联接操作,如果行相等则返回。For example, for inner join operations, the rows are returned if they are equal. 如果行不相等,则废弃值较小的行并从该输入获得另一行。If they are not equal, the lower-value row is discarded and another row is obtained from that input. 这一过程将重复进行,直到处理完所有的行为止。This process repeats until all rows have been processed.

合并联接操作可以是常规操作,也可以是多对多操作。The merge join operation may be either a regular or a many-to-many operation. 多对多合并联接使用临时表存储行。A many-to-many merge join uses a temporary table to store rows. 如果每个输入中有重复值,则在处理其中一个输入中的每个重复项时,另一个输入必须重绕到重复项的开始位置。If there are duplicate values from each input, one of the inputs will have to rewind to the start of the duplicates as each duplicate from the other input is processed.

如果存在驻留谓词,则所有满足合并谓词的行都将对该驻留谓词取值,而只返回那些满足该驻留谓词的行。If a residual predicate is present, all rows that satisfy the merge predicate evaluate the residual predicate, and only those rows that satisfy it are returned.

合并联接本身的速度很快,但如果需要排序操作,选择合并联接就会非常费时。Merge join itself is very fast, but it can be an expensive choice if sort operations are required. 然而,如果数据量很大且能够从现有 B 树索引中获得预排序的所需数据,则合并联接通常是最快的可用联接算法。However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm.

了解哈希联接Understanding Hash joins

哈希联接可以有效处理未排序的大型非索引输入。Hash joins can efficiently process large, unsorted, nonindexed inputs. 它们对复杂查询的中间结果很有用,因为:They are useful for intermediate results in complex queries because:

  • 中间结果未经索引(除非已经显式保存到磁盘上然后创建索引),而且通常不为查询计划中的下一个操作进行适当的排序。Intermediate results are not indexed (unless explicitly saved to disk and then indexed) and often are not suitably sorted for the next operation in the query plan.
  • 查询优化器只估计中间结果的大小。Query optimizers estimate only intermediate result sizes. 由于对于复杂查询,估计可能有很大的误差,因此如果中间结果比预期的大得多,则处理中间结果的算法不仅必须有效而且必须适度弱化。Because estimates can be very inaccurate for complex queries, algorithms to process intermediate results not only must be efficient, but also must degrade gracefully if an intermediate result turns out to be much larger than anticipated.

哈希联接可以减少使用非规范化。The hash join allows reductions in the use of denormalization. 非规范化一般通过减少联接操作获得更好的性能,尽管这样做有冗余之险(如不一致的更新)。Denormalization is typically used to achieve better performance by reducing join operations, in spite of the dangers of redundancy, such as inconsistent updates. 哈希联接则减少使用非规范化的需要。Hash joins reduce the need to denormalize. 哈希联接使垂直分区(用单独的文件或索引代表单个表中的几组列)得以成为物理数据库设计的可行选项。Hash joins allow vertical partitioning (representing groups of columns from a single table in separate files or indexes) to become a viable option for physical database design.

哈希联接有两种输入:生成输入和探测输入。The hash join has two inputs: the build input and probe input. 查询优化器指派这些角色,使两个输入中较小的那个作为生成输入。The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

哈希联接用于多种设置匹配操作:内部联接;左外部联接、右外部联接和完全外部联接;左半联接和右半联接;交集;并集和差异。Hash joins are used for many types of set-matching operations: inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference. 此外,哈希联接的某种变形可以进行重复删除和分组,例如 SUM(salary) GROUP BY departmentMoreover, a variant of the hash join can do duplicate removal and grouping, such as SUM(salary) GROUP BY department. 这些修改对生成和探测角色只使用一个输入。These modifications use only one input for both the build and probe roles.

以下几节介绍了不同类型的哈希联接:内存中的哈希联接、Grace 哈希联接和递归哈希联接。The following sections describe different types of hash joins: in-memory hash join, grace hash join, and recursive hash join.

内存中的哈希联接In-Memory Hash Join

哈希联接先扫描或计算整个生成输入,然后在内存中生成哈希表。The hash join first scans or computes the entire build input and then builds a hash table in memory. 根据计算得出的哈希键的哈希值,将每行插入哈希存储桶。Each row is inserted into a hash bucket depending on the hash value computed for the hash key. 如果整个生成输入小于可用内存,则可以将所有行都插入哈希表中。If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. 生成阶段之后是探测阶段。This build phase is followed by the probe phase. 一次一行地对整个探测输入进行扫描或计算,并为每个探测行计算哈希键的值,扫描相应的哈希存储桶并生成匹配项。The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key's value is computed, the corresponding hash bucket is scanned, and the matches are produced.

Grace 哈希联接Grace Hash Join

如果生成输入大于内存,哈希联接将分为几步进行。If the build input does not fit in memory, a hash join proceeds in several steps. 这称为“Grace 哈希联接”。This is known as a grace hash join. 每一步都分为生成阶段和探测阶段。Each step has a build phase and probe phase. 首先,消耗整个生成和探测输入并将其分区(使用哈希键上的哈希函数)为多个文件。Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. 对哈希键使用哈希函数可以保证任意两个联接记录一定位于相同的文件对中。Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. 因此,联接两个大输入的任务简化为相同任务的多个较小的实例。Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. 然后将哈希联接应用于每对分区文件。The hash join is then applied to each pair of partitioned files.

递归哈希联接Recursive Hash Join

如果生成输入非常大,以至于标准外部合并的输入需要多个合并级别,则需要多个分区步骤和多个分区级别。If the build input is so large that inputs for a standard external merge would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. 如果只有某些分区较大,则只需对那些分区使用附加的分区步骤。If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. 为了使所有分区步骤尽可能快,将使用大的异步 I/O 操作以便单个线程就能使多个磁盘驱动器繁忙工作。In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

备注

如果生成输入仅稍大于可用内存,则内存中的哈希联接和 Grace 哈希联接的元素将结合在一个步骤中,生成混合哈希联接。If the build input is only slightly larger than the available memory, elements of in-memory hash join and grace hash join are combined in a single step, producing a hybrid hash join.

在优化过程中不能始终确定使用哪种哈希联接。It is not always possible during optimization to determine which hash join is used. 因此,SQL Server 开始时使用内存中的哈希联接,然后根据生成输入的大小逐渐转换到 Grace 哈希联接和递归哈希联接。Therefore, SQL Server starts by using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.

如果查询优化器错误地预计两个输入中哪个较小并由此确定哪个作为生成输入,生成角色和探测角色将动态反转。If the Query Optimizer anticipates wrongly which of the two inputs is smaller and, therefore, should have been the build input, the build and probe roles are reversed dynamically. 哈希联接确保使用较小的溢出文件作为生成输入。The hash join makes sure that it uses the smaller overflow file as build input. 这一技术称为角色反转。This technique is called role reversal. 至少一个文件溢出到磁盘后,哈希联接中才会发生角色反转。Role reversal occurs inside the hash join after at least one spill to the disk.

备注

角色反转的发生独立于任何查询提示或结构。Role reversal occurs independent of any query hints or structure. 角色反转不会显示在查询计划中;角色反转对于用户是透明的。Role reversal does not display in your query plan; when it occurs, it is transparent to the user.

哈希援助Hash Bailout

术语“哈希援助”有时用于描述 Grace 哈希联接或递归哈希联接。The term hash bailout is sometimes used to describe grace hash joins or recursive hash joins.

备注

递归哈希联接或哈希援助会导致服务器性能降低。Recursive hash joins or hash bailouts cause reduced performance in your server. 如果跟踪中显示许多哈希警告事件,请更新正在联接的列上的统计信息。If you see many Hash Warning events in a trace, update statistics on the columns that are being joined.

有关哈希援助的详细信息,请参阅 Hash Warning 事件类For more information about hash bailout, see Hash Warning Event Class.

NULL 值和联接Null Values and Joins

联接表的列中的 null 值(如果有)互相不匹配。When there are null values in the columns of the tables being joined, the null values do not match each other. 如果其中一个联接表的列中出现空值,只能通过外部联接返回这些空值(除非 WHERE 子句不包括空值)。The presence of null values in a column from one of the tables being joined can be returned only by using an outer join (unless the WHERE clause excludes null values).

下面的两个表中,每个表中要参与联接的列中均包含 NULL 值:Here are two tables that each have NULL in the column that will participate in the join:

table1                          table2
a           b                   c            d
-------     ------              -------      ------
      1        one                 NULL         two
   NULL      three                    4        four
      4      join4

将列 a 中的值与列 c 中的值进行比较的联接在包含 NULL 值的列上不会获得匹配项:A join that compares the values in column a against column c does not get a match on the columns that have values of NULL:

SELECT *
FROM table1 t1 JOIN table2 t2
   ON t1.a = t2.c
ORDER BY t1.a;
GO

而是只返回列 a 和 c 中具有 4 的一行:Only one row with 4 in column a and c is returned:

a           b      c           d      
----------- ------ ----------- ------ 
4           join4  4           four   

(1 row(s) affected)

另外,从基表返回的空值与从外部联接返回的空值很难区分开。Null values returned from a base table are also difficult to distinguish from the null values returned from an outer join. 例如,下面的 SELECT 语句对这两个表执行左向外部联接:For example, the following SELECT statement does a left outer join on these two tables:

SELECT *
FROM table1 t1 LEFT OUTER JOIN table2 t2
   ON t1.a = t2.c
ORDER BY t1.a;
GO

下面是结果集:Here is the result set.

a           b      c           d      
----------- ------ ----------- ------ 
NULL        three  NULL        NULL 
1           one    NULL        NULL 
4           join4  4           four   

(3 row(s) affected)

从结果中很难区分数据中的 NULL 值和表示联接失败的 NULL 值。The results do not make it easy to distinguish a NULL in the data from a NULL that represents a failure to join. 如果联接的数据有空值,最好用常规联接从结果中删除这些空值。When null values are present in data being joined, it is usually preferable to omit them from the results by using a regular join.

另请参阅See Also

Showplan 逻辑运算符和物理运算符参考 Showplan Logical and Physical Operators Reference
比较运算符 (Transact-SQL) Comparison Operators (Transact-SQL)
数据类型转换(数据库引擎) Data Type Conversion (Database Engine)
子查询 Subqueries
自适应联接Adaptive Joins