聯結 (SQL Server)Joins (SQL Server)

適用於: 是SQL Server 是Azure SQL Database 是Azure SQL 資料倉儲 是平行處理資料倉儲 APPLIES TO: yesSQL Server yesAzure SQL Database yesAzure SQL Data Warehouse yesParallel Data Warehouse

SQL ServerSQL Server 會使用記憶體內排序和雜湊聯結技術,來執行排序、交叉、聯集及差異作業。performs sort, intersect, union, and difference operations using in-memory sorting and hash join technology. 使用這類查詢計畫,SQL ServerSQL Server 可支援資料表垂直分割 (有時候稱為分欄儲存)。Using this type of query plan, SQL ServerSQL Server supports vertical table partitioning, sometimes called columnar storage.

SQL ServerSQL Server 採用四種聯結作業:employs four types of join operations:

  • 巢狀迴圈聯結Nested Loops joins
  • 合併聯結Merge joins
  • 雜湊聯結Hash joins
  • 自適性聯結 (開頭為 SQL Server 2017 (14.x)SQL Server 2017 (14.x))Adaptive joins (Starting with SQL Server 2017 (14.x)SQL Server 2017 (14.x))

聯結基本概念Join Fundamentals

透過使用聯結,您可以根據資料表之間的邏輯關聯性從二或多個資料表中擷取資料。By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. 聯結指出 SQL ServerSQL Server 如何使用一個資料表的資料,以選取另一個資料表的資料列。Joins indicate how SQL ServerSQL Server should use data from one table to select the rows in another table.

聯結條件定義兩個資料表在查詢中相關的方式:A join condition defines the way two tables are related in a query by:

  • 指定每個資料表中要用於聯結的資料行。Specifying the column from each table to be used for the join. 典型的聯結條件會指定一個資料表的外部索引鍵,以及其在另一個資料表關聯的索引鍵。A typical join condition specifies a foreign key from one table and its associated key in the other table.
  • 指定邏輯運算子 (例如 = 或 <>),以便用來比較資料行的數值。Specifying a logical operator (for example, = or <>,) to be used in comparing values from the columns.

您可以在 FROMWHERE 子句中指定內部聯結。Inner joins can be specified in either the FROM or WHERE clauses. 外部聯結則只能在 FROM 字句中指定。Outer joins can be specified in the FROM clause only. 聯結條件會與 WHEREHAVING 搜尋條件結合,以控制從 FROM 子句所參考之基底資料表中選取的資料列。The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause.

FROM 子句中指定聯結條件有助於將它們與 WHERE 子句中可能指定的任何其他搜尋條件分開,建議您以此方式指定聯結。Specifying the join conditions in the FROM clause helps separate them from any other search conditions that may be specified in a WHERE clause, and is the recommended method for specifying joins. 簡化的 ISO FROM 子句聯結語法為:A simplified ISO FROM clause join syntax is:

FROM first_table join_type second_table [ON (join_condition)]

join_type 會指定所執行聯結的種類:內部、外部或交叉聯結。join_type specifies what kind of join is performed: an inner, outer, or cross join. join_condition 會定義要針對每一對聯結資料列評估的述詞。join_condition defines the predicate to be evaluated for each pair of joined rows. 下列是 FROM 子句聯結規格的範例:The following is an example of a FROM clause join specification:

FROM Purchasing.ProductVendor JOIN Purchasing.Vendor
     ON (ProductVendor.BusinessEntityID = Vendor.BusinessEntityID)

下列範例則是使用此聯結的簡單 SELECT 陳述式:The following is a simple SELECT statement using this join:

SELECT ProductID, Purchasing.Vendor.BusinessEntityID, Name
FROM Purchasing.ProductVendor JOIN Purchasing.Vendor
    ON (Purchasing.ProductVendor.BusinessEntityID = Purchasing.Vendor.BusinessEntityID)
WHERE StandardPrice > $10
  AND Name LIKE N'F%'
GO

此選取將傳回公司名稱以字母 F 開頭之公司所供應的零件,並且產品的價格大於 $10 之任何組合的產品與供應商資訊。The select returns the product and supplier information for any combination of parts supplied by a company for which the company name starts with the letter F and the price of the product is more than $10.

若在單一查詢中參考多個資料表,所有的資料行參考都不可以模擬兩可。When multiple tables are referenced in a single query, all column references must be unambiguous. 在先前的範例中,ProductVendor 和 Vendor 資料表都有名稱為 BusinessEntityID 的資料行。In the previous example, both the ProductVendor and Vendor table have a column named BusinessEntityID. 查詢所參考的二或多個資料表中,若有任何資料行名稱重複,則必須以資料表名稱來加以限定。Any column name that is duplicated between two or more tables referenced in the query must be qualified with the table name. 在範例中,所有對 Vendor 資料行的參考都有加以限定。All references to the Vendor columns in the example are qualified.

當查詢所使用的二或多個資料表中,資料行名稱都沒有重複,對資料行的參考就不需要以資料表名稱加以限定。When a column name is not duplicated in two or more tables used in the query, references to it do not have to be qualified with the table name. 如先前的範例所示。This is shown in the previous example. 這樣的 SELECT 陳述式有時候很難了解,因為無法指出提供每個資料行的資料表。Such a SELECT statement is sometimes difficult to understand because there is nothing to indicate the table that provided each column. 如果以資料表名稱限定所有資料行,可增進查詢的可讀性。The readability of the query is improved if all columns are qualified with their table names. 如果還使用資料表別名,特別是當資料表名稱本身還需要以資料庫和擁有者名稱加以限定時,可進一步增進可讀性。The readability is further improved if table aliases are used, especially when the table names themselves must be qualified with the database and owner names. 下列是相同的範例,但是指定了資料表別名,並且以資料表別名限定資料行,以增進可讀性:The following is the same example, except that table aliases have been assigned and the columns qualified with table aliases to improve readability:

SELECT pv.ProductID, v.BusinessEntityID, v.Name
FROM Purchasing.ProductVendor AS pv 
JOIN Purchasing.Vendor AS v
    ON (pv.BusinessEntityID = v.BusinessEntityID)
WHERE StandardPrice > $10
    AND Name LIKE N'F%';

上個範例在 FROM 子句中指定聯結條件,這是較好的方式。The previous examples specified the join conditions in the FROM clause, which is the preferred method. 下列查詢包含指定於 WHERE 子句中的相同聯結條件:The following query contains the same join condition specified in the WHERE clause:

SELECT pv.ProductID, v.BusinessEntityID, v.Name
FROM Purchasing.ProductVendor AS pv, Purchasing.Vendor AS v
WHERE pv.BusinessEntityID=v.BusinessEntityID
    AND StandardPrice > $10
    AND Name LIKE N'F%';

聯結的選取清單可參考聯結資料表中的所有資料行,或是資料行的任何子集。The select list for a join can reference all the columns in the joined tables, or any subset of the columns. 若要包含聯結中每個資料表的資料行,您並不需要用到選取清單。The select list is not required to contain columns from every table in the join. 例如,在三個資料表的聯結中,一個資料表可作為第二個資料表與第三個資料表的橋樑,並且中間的資料表不需要有任何資料行參考於選取清單中。For example, in a three-table join, only one table can be used to bridge from one of the other tables to the third table, and none of the columns from the middle table have to be referenced in the select list.

雖然聯結條件通常擁有相等比較 (=),您也可以指定其他的比較或關聯式運算子,就如同其他述詞一樣。Although join conditions usually have equality comparisons (=), other comparison or relational operators can be specified, as can other predicates. 如需詳細資訊,請參閱比較運算子 (Transact-SQL)WHERE (Transact-SQL)For more information, see Comparison Operators (Transact-SQL) and WHERE (Transact-SQL).

SQL ServerSQL Server 處理聯結時,查詢引擎將從多種可能性選擇最有效率的處理聯結方式。When SQL ServerSQL Server processes joins, the query engine chooses the most efficient method (out of several possibilities) of processing the join. 不同聯結的實際執行可能會使用許多不同的最佳化方式,因此無法可靠地預測。The physical execution of various joins can use many different optimizations and therefore cannot be reliably predicted.

用於聯結條件的資料行,並不需要擁有相同的名稱或相同的資料類型。Columns used in a join condition are not required to have the same name or be the same data type. 不過,如果資料類型不一致,則必須是相容的類型,或是 SQL 可用隱含方式轉換的類型。However, if the data types are not identical, they must be compatible, or be types that SQL Server can implicitly convert. 若無法以隱含方式轉換資料類型,聯結條件就必須使用 CAST 函數來明確轉換資料類型。If the data types cannot be implicitly converted, the join condition must explicitly convert the data type using the CAST function. 如需有關隱含及明確轉換的詳細資訊,請參閱資料類型轉換 (資料庫引擎)For more information about implicit and explicit conversions, see Data Type Conversion (Database Engine).

大多數使用聯結的查詢都可使用子查詢重新改寫 (查詢巢狀於另一個查詢之內),而大多數的子查詢也可重新改寫成聯結。Most queries using a join can be rewritten using a subquery (a query nested within another query), and most subqueries can be rewritten as joins. 如需有關子查詢的詳細資訊,請參閱子查詢For more information about subqueries, see Subqueries.

注意

您無法直接在 ntext、text 或 image 資料行上聯結資料表。Tables cannot be joined directly on ntext, text, or image columns. 不過,可以藉由使用 SUBSTRING,以間接方式在 ntext、text 或 image 資料行上聯結資料表。However, tables can be joined indirectly on ntext, text, or image columns by using SUBSTRING.
例如,SELECT * FROM t1 JOIN t2 ON SUBSTRING(t1.textcolumn, 1, 20) = SUBSTRING(t2.textcolumn, 1, 20) 會對資料表 t1 和 t2 中每個 text 資料行的前 20 個字元執行兩個資料表的內部聯結。For example, SELECT * FROM t1 JOIN t2 ON SUBSTRING(t1.textcolumn, 1, 20) = SUBSTRING(t2.textcolumn, 1, 20) performs a two-table inner join on the first 20 characters of each text column in tables t1 and t2.
此外,另一個比較兩個資料表之 ntext 或 text 資料行的可能方式,就是使用 WHERE 子句來比較資料行的長度,例如:WHERE DATALENGTH(p1.pr_info) = DATALENGTH(p2.pr_info)In addition, another possibility for comparing ntext or text columns from two tables is to compare the lengths of the columns with a WHERE clause, for example: WHERE DATALENGTH(p1.pr_info) = DATALENGTH(p2.pr_info)

認識巢狀迴圈聯結Understanding Nested Loops joins

如果一個聯結輸入相當小 (少於 10 個資料列),但另一個聯結輸入相當大而且在聯結資料行建有索引的話,索引巢狀迴圈是最快速的聯結作業,因為它們需要的 I/O 最少,需要的比較也最少。If one join input is small (fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons.

巢狀迴圈聯結 (也稱為「巢狀反覆運算」 ) 會使用一個聯結輸入作為外部輸入資料表 (在圖形化執行計劃中顯示為上方輸入),另一個聯結作為內部 (下方) 輸入資料表。The nested loops join, also called nested iteration, uses one join input as the outer input table (shown as the top input in the graphical execution plan) and one as the inner (bottom) input table. 外部迴圈會逐列消耗外部輸入資料表。The outer loop consumes the outer input table row by row. 內部迴圈 (針對外部每一列各執行一次) 會在內部輸入資料表中搜尋符合的資料列。The inner loop, executed for each outer row, searches for matching rows in the inner input table.

在最簡單的情況下,搜尋會掃描整個資料表或索引;這稱為「單純巢狀迴圈聯結」 。In the simplest case, the search scans an entire table or index; this is called a naive nested loops join. 如果搜尋會利用索引,就稱為「索引巢狀迴圈聯結」 。If the search exploits an index, it is called an index nested loops join. 如果索引是隨著查詢計劃一起建立 (並且在查詢完成時終結),則稱為「暫存索引巢狀迴圈聯結」 。If the index is built as part of the query plan (and destroyed upon completion of the query), it is called a temporary index nested loops join. 「查詢最佳化工具」會將所有這些變化都列入考慮。All these variants are considered by the Query Optimizer.

如果外部輸入相當小,內部輸入已預先建立索引而且很大的話,巢狀迴圈聯結會特別有效率。A nested loops join is particularly effective if the outer input is small and the inner input is preindexed and large. 在許多小型交易中 (如只影響一小組資料列的交易),索引巢狀迴圈聯結比合併聯結與雜湊聯結要好得多。In many small transactions, such as those affecting only a small set of rows, index nested loops joins are superior to both merge joins and hash joins. 然而在大型查詢中,巢狀迴圈聯結多半不是最佳選擇。In large queries, however, nested loops joins are often not the optimal choice.

當巢狀迴圈聯結運算子的 OPTIMIZED 屬性設定為 True 時,代表會在內部資料表過大時使用巢狀迴圈聯結 (或稱批次排序) 來將 I/O 最小化,而不論資料表平行與否。When the OPTIMIZED attribute of a Nested Loops join operator is set to True, it means that an Optimized Nested Loops (or Batch Sort) is used to minimize I/O when the inner side table is large, regardless of it being parallelized or not. 因為排序本身是隱藏作業,所以在分析執行計劃時,此最佳化在指定計劃內可能不會很明顯。The presence of this optimization in a given plan may not be very obvious when analyzing an execution plan, given the sort itself is a hidden operation. 但如果仔細在計劃 XML 中尋找屬性 OPTIMIZED 的話,能看出來巢狀迴圈聯結可能在嘗試重新排序輸入資料列來改善 I/O 效能。But by looking in the plan XML for the attribute OPTIMIZED, this indicates the Nested Loops join may try to reorder the input rows to improve I/O performance.

認識合併聯結Understanding Merge joins

如果兩個聯結輸入都不小,而且依聯結資料行排序 (例如,是由掃描排序的索引所取得) 的話,合併聯結就是最快速的聯結作業。If the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), a merge join is the fastest join operation. 如果兩個聯結輸入都相當大,而且兩個輸入的大小類似,先行排序再進行的合併聯結所提供的效能大致類似雜湊聯結。If both join inputs are large and the two inputs are of similar sizes, a merge join with prior sorting and a hash join offer similar performance. 然而,如果兩個輸入的大小差異極大的話,雜湊聯結作業多半快得多。However, hash join operations are often much faster if the two input sizes differ significantly from each other.

合併聯結需要在合併資料行上將兩項輸入都加以排序,這是由聯結述詞的相等 (ON) 子句來定義的。The merge join requires both inputs to be sorted on the merge columns, which are defined by the equality (ON) clauses of the join predicate. 查詢最佳化工具通常會掃描索引 (如果適當的資料行集合建有索引的話),或在合併聯結下放置排序運算子。The query optimizer typically scans an index, if one exists on the proper set of columns, or it places a sort operator below the merge join. 在極少的情況下,可以有多個相等子句,但只會從部份可用的相等子句中取得合併資料行。In rare cases, there may be multiple equality clauses, but the merge columns are taken from only some of the available equality clauses.

因為每個輸入都會經過排序,所以合併聯結運算子會從每個輸入各取得一列資料列,然後加以比較。Because each input is sorted, the Merge Join operator gets a row from each input and compares them. 例如,對於內部聯結作業,會傳回是否相等的資料列。For example, for inner join operations, the rows are returned if they are equal. 如果不相等,就丟棄數值較小的資料列,再從該輸入取得另一個資料列。If they are not equal, the lower-value row is discarded and another row is obtained from that input. 這個處理程序會一再重複,直到處理過所有資料列為止。This process repeats until all rows have been processed.

合併聯結作業可能是一般的,也可能是多對多的作業。The merge join operation may be either a regular or a many-to-many operation. 多對多合併聯結會使用暫存資料表來儲存資料列。A many-to-many merge join uses a temporary table to store rows. 如果每個輸入有重複的值,在處理其他輸入的每個重複項時,必須將輸入之一倒回重複項的開始。If there are duplicate values from each input, one of the inputs will have to rewind to the start of the duplicates as each duplicate from the other input is processed.

如果出現殘餘述詞的話,滿足合併述詞的所有資料列都會計算殘餘述詞,只有滿足的資料列才會傳回。If a residual predicate is present, all rows that satisfy the merge predicate evaluate the residual predicate, and only those rows that satisfy it are returned.

合併聯結本身非常快,但如果需要排序作業的話,有時候代價會很高。Merge join itself is very fast, but it can be an expensive choice if sort operations are required. 但如果資料量很大,而且可以從現有的 B-tree 索引取得已排序的預期資料,那麼合併聯結往往是可用的最快速聯結演算法。However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm.

認識雜湊聯結Understanding Hash joins

雜湊聯結可以有效率地處理大型、未排序、無索引的輸入。Hash joins can efficiently process large, unsorted, nonindexed inputs. 它們在做為複雜查詢的中繼結果方面很有用,因為:They are useful for intermediate results in complex queries because:

  • 中繼結果沒有索引 (除非明確地儲存到磁碟,然後建立索引),而且通常產生時也不會做適當的排序供查詢計畫的下一個作業使用。Intermediate results are not indexed (unless explicitly saved to disk and then indexed) and often are not suitably sorted for the next operation in the query plan.
  • 查詢最佳化工具只估計中繼結果的大小。Query optimizers estimate only intermediate result sizes. 因為複雜查詢的估計值可能非常不準確,所以處理中繼結果的演算法不僅必須要有效率,而且萬一中繼結果顯著大於預期時,它的效能還不得惡化得太明顯。Because estimates can be very inaccurate for complex queries, algorithms to process intermediate results not only must be efficient, but also must degrade gracefully if an intermediate result turns out to be much larger than anticipated.

雜湊聯結允許少使用反正規化。The hash join allows reductions in the use of denormalization. 反正規化一般是利用減少聯結作業,來達成較佳的效能,但它會有備援性的危險,例如不一致的更新。Denormalization is typically used to achieve better performance by reducing join operations, in spite of the dangers of redundancy, such as inconsistent updates. 雜湊聯結降低了反正規化的需要。Hash joins reduce the need to denormalize. 雜湊聯結讓垂直分割 (以不同的檔案或索引來呈現一個資料表中的資料行群組) 可以成為實體資料庫設計可實行的選項。Hash joins allow vertical partitioning (representing groups of columns from a single table in separate files or indexes) to become a viable option for physical database design.

雜湊聯結有兩個輸入:建置輸入與探查輸入。The hash join has two inputs: the build input and probe input. 查詢最佳化工具會指派這些角色,使兩個輸入中比較小的一個做為建置輸入。The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

雜湊聯結用於許多種集合比對作業:內部聯結;左方、右方與完整外部聯結;左方、右方半聯結;交集;聯合;與差異。Hash joins are used for many types of set-matching operations: inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference. 此外,雜湊聯結的變體還可以執行移除重複項及進行群組的作業,例如 SUM(salary) GROUP BY departmentMoreover, a variant of the hash join can do duplicate removal and grouping, such as SUM(salary) GROUP BY department. 這些修改方式只使用一個輸入,兼做組建與探查角色。These modifications use only one input for both the build and probe roles.

下列章節描述不同類型的雜湊聯結:In-Memory 雜湊聯結、寬限雜湊聯結和遞迴雜湊聯結。The following sections describe different types of hash joins: in-memory hash join, grace hash join, and recursive hash join.

In-Memory 雜湊聯結In-Memory Hash Join

雜湊聯結會先掃描或計算整個建置輸入,然後在記憶體中建立雜湊表。The hash join first scans or computes the entire build input and then builds a hash table in memory. 根據以雜湊鍵計算所得的雜湊值,將每一列插入雜湊桶中。Each row is inserted into a hash bucket depending on the hash value computed for the hash key. 如果整個建置輸入小於可用記憶體,就可以將所有資料列都插入雜湊表。If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. 這個組建階段後跟著探查階段。This build phase is followed by the probe phase. 這時候會以逐列方式掃描或計算整個探查輸入,針對每個探查列計算雜湊鍵值、掃描對應的雜湊桶,並產生符合項目。The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key's value is computed, the corresponding hash bucket is scanned, and the matches are produced.

寬限雜湊聯結Grace Hash Join

如果記憶體放不下建置輸入,就會以幾個步驟進行雜湊聯結。If the build input does not fit in memory, a hash join proceeds in several steps. 這就是所謂的寬限雜湊聯結。This is known as a grace hash join. 每個步驟分別有組建階段與探查階段。Each step has a build phase and probe phase. 最開始,會消耗整個組建與探查輸入,並分割 (對雜湊鍵使用雜湊函數) 成多個檔案。Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. 對雜湊鍵使用雜湊函數會保證任何兩個聯結資料錄一定在同一對檔案中。Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. 因此,本來是聯結兩個大型輸入的工作,變成是做好幾個相同的工作,但每個工作都變得比較小。Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. 然後再對每對分割檔案套用雜湊聯結。The hash join is then applied to each pair of partitioned files.

遞迴雜湊聯結Recursive Hash Join

如果建置輸入太大,使得標準外部聯結排序的輸入需要多個合併層級,就需要多個分割步驟與多個分割層級。If the build input is so large that inputs for a standard external merge would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. 如果只有一些分割很大,就僅對這部分使用額外的分割步驟。If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. 為了使所有分割步驟都盡可能達到最快的速度,所以使用大型的非同步 I/O 作業,使單一執行緒可以讓多部磁碟機保持忙碌。In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

注意

如果建置輸入只比可用記憶體稍微大一點,就會將 In-memory 雜湊聯結和寬限雜湊聯結的元素合併到單一步驟中,而產生混合型雜湊聯結。If the build input is only slightly larger than the available memory, elements of in-memory hash join and grace hash join are combined in a single step, producing a hybrid hash join.

在最佳化期間不一定能夠判斷要使用哪一個雜湊聯結。It is not always possible during optimization to determine which hash join is used. 因此,SQL Server 會先從使用 In-Memory 雜湊聯結開始,然後再依據建置輸入的大小,逐漸轉換成寬限雜湊聯結和遞迴雜湊聯結。Therefore, SQL Server starts by using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.

如果「查詢最佳化工具」誤判兩個輸入中哪個較小,因而選錯建置輸入,則建置和探查角色會動態地顛倒。If the Query Optimizer anticipates wrongly which of the two inputs is smaller and, therefore, should have been the build input, the build and probe roles are reversed dynamically. 雜湊聯結會確定使用溢位較小的檔案做為建置輸入。The hash join makes sure that it uses the smaller overflow file as build input. 這項技術稱為角色反轉。This technique is called role reversal. 當至少有一個雜湊聯結蔓延至磁碟之後,角色反轉就會在雜湊聯結內發生。Role reversal occurs inside the hash join after at least one spill to the disk.

注意

角色反轉的發生與任何查詢提示或結構無關。Role reversal occurs independent of any query hints or structure. 角色反轉不會顯示在查詢計畫中;當它發生時,使用者就會看到。Role reversal does not display in your query plan; when it occurs, it is transparent to the user.

雜湊釋出Hash Bailout

雜湊釋出一詞有時候是用來描述寬限雜湊聯結或遞迴雜湊聯結。The term hash bailout is sometimes used to describe grace hash joins or recursive hash joins.

注意

遞迴雜湊聯結或 Hash Bailout 會導致伺服器的效能降低。Recursive hash joins or hash bailouts cause reduced performance in your server. 如果您在追蹤內看到許多 「雜湊警告」事件,請在要聯結的資料行上更新統計資料。If you see many Hash Warning events in a trace, update statistics on the columns that are being joined.

如需有關雜湊釋出的詳細資訊,請參閱雜湊警告事件類別For more information about hash bailout, see Hash Warning Event Class.

了解自適性聯結Understanding Adaptive joins

批次模式自適性聯結可讓選擇的雜湊聯結巢狀迴圈聯結方法,延後到已掃描的第一個輸入之後Batch mode Adaptive Joins enable the choice of a Hash Join or Nested Loops join method to be deferred until after the first input has been scanned. 自適性聯結運算子定義的閾值是用於決定何時要切換至巢狀迴圈計劃。The Adaptive Join operator defines a threshold that is used to decide when to switch to a Nested Loops plan. 因此,查詢計劃可在執行期間動態切換至較佳的聯結策略,而不需經過重新編譯。A query plan can therefore dynamically switch to a better join strategy during execution without having to be recompiled.

提示

經常在小型和大型聯結輸入掃描間變動的工作負載,由此功能獲益最大。Workloads with frequent oscillations between small and large join input scans will benefit most from this feature.

執行階段決策以下列步驟為基礎:The runtime decision is based on the following steps:

  • 如果組建聯結輸入的資料列計數小到巢狀迴圈聯結會比雜湊聯結更佳的情況,則計劃就會切換成巢狀迴圈演算法。If the row count of the build join input is small enough that a Nested Loops join would be more optimal than a Hash join, the plan switches to a Nested Loops algorithm.
  • 如果組建聯結輸入超過特定的資料列計數閾值,則不會切換,且您的計劃會繼續執行雜湊聯結。If the build join input exceeds a specific row count threshold, no switch occurs and your plan continues with a Hash join.

下列查詢用來說明自適性聯結範例:The following query is used to illustrate an Adaptive Join example:

SELECT [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE [fo].[Quantity] = 360;

此查詢會傳回 336 個資料列。The query returns 336 rows. 透過啟用即時查詢統計資料,會顯示下列計劃:Enabling Live Query Statistics displays the following plan:

查詢結果 336 個資料列

在計劃中,請注意:In the plan, note the following:

  1. 使用了資料行存放區索引掃描,為雜湊聯結建置階段提供資料列。A columnstore index scan used to provide rows for the Hash join build phase.
  2. 新的自適性聯結運算子。The new Adaptive Join operator. 此運算子定義的閾值是用於決定何時要切換至巢狀迴圈計劃。This operator defines a threshold that is used to decide when to switch to a Nested Loops plan. 本例中的閾值是 78 個資料列。For this example, the threshold is 78 rows. 凡是 >= 78 個資料列的計劃都會使用雜湊聯結。Anything with >= 78 rows will use a Hash join. 如果小於該閾值,則會使用巢狀迴圈聯結。If less than the threshold, a Nested Loops join will be used.
  3. 因為查詢傳回 336 個資料列 (超過閾值),所以第二個分支會表示標準雜湊聯結作業的探查階段。Since the query returns 336 rows, this exceeded the threshold and so the second branch represents the probe phase of a standard Hash join operation. 請注意,即時查詢統計資料會顯示流經運算子的資料列,本例中為 "672 of 672"。Notice that Live Query Statistics shows rows flowing through the operators - in this case "672 of 672".
  4. 而最後一個分支是叢集索引搜尋,供未超過閾值的巢狀迴圈聯結所使用。And the last branch is a Clustered Index Seek for use by the Nested Loops join had the threshold not been exceeded. 請注意,我們看到的顯示是"0 of 336" 資料列 (分支未使用)。Notice that we see "0 of 336" rows displayed (the branch is unused).

現在對比使用相同查詢的計劃,但當 Quantity 值在資料表中只有一個資料列時:Now contrast the plan with the same query, but when the Quantity value only has one row in the table:

SELECT [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE [fo].[Quantity] = 361;

查詢會傳回一個資料列。The query returns one row. 啟用即時查詢統計資料會顯示下列計劃:Enabling Live Query Statistics displays the following plan:

查詢結果一個資料列

在計劃中,請注意:In the plan, note the following:

  • 因為傳回一個資料列,現在叢集索引搜尋中有資料列通過。With one row returned, the Clustered Index Seek now has rows flowing through it.
  • 而且,因為雜湊聯結建置階段並未繼續,所以不會有任何資料列通過第二個分支。And since the Hash Join build phase did not continue, there are no rows flowing through the second branch.

自適性聯結備註Adaptive Join remarks

自調性聯結導入的記憶體需求,會比索引巢狀的迴圈聯結相等計劃更高。Adaptive joins introduce a higher memory requirement than an indexed Nested Loops Join equivalent plan. 系統會以巢狀迴圈有如雜湊聯結一般的方式要求額外的記憶體。The additional memory is requested as if the Nested Loops was a Hash join. 在時快時慢作業的建置階段與巢狀迴圈資料流相等聯結的比較中,會另外產生額外負荷。There is also overhead for the build phase as a stop-and-go operation versus a Nested Loops streaming equivalent join. 加上額外的成本,隨組建輸入資料列計數浮動的案例而變動。With that additional cost comes flexibility for scenarios where row counts may fluctuate in the build input.

批次模式自適性聯結適合初次執行的陳述式使用,而且一旦編譯,連續執行仍會根據編譯的自適性聯結閾值和流經外部輸入建置階段的執行階段資料列聯結自動調整。Batch mode Adaptive joins work for the initial execution of a statement, and once compiled, consecutive executions will remain adaptive based on the compiled Adaptive Join threshold and the runtime rows flowing through the build phase of the outer input.

如果自適性聯結切換成巢狀迴圈作業,便會使用已由雜湊聯結組建讀取的資料列。If an Adaptive Join switches to a Nested Loops operation, it uses the rows already read by the Hash Join build. 運算子「不會」 再次重新讀取外部參考資料列。The operator does not re-read the outer reference rows again.

追蹤自適性聯結活動Tracking Adaptive join activity

自適性聯結運算子有下列計劃運算子屬性:The Adaptive Join operator has the following plan operator attributes:

計劃屬性Plan attribute DescriptionDescription
AdaptiveThresholdRowsAdaptiveThresholdRows 顯示從雜湊聯結切換至巢狀迴圈聯結所使用的閾值。Shows the threshold use to switch from a hash join to nested loop join.
EstimatedJoinTypeEstimatedJoinType 可能的聯結類型。What the join type is likely to be.
ActualJoinTypeActualJoinType 在實際的計劃中,顯示根據閾值最後選擇的聯結演算法。In an actual plan, shows what join algorithm was ultimately chosen based on the threshold.

評估計劃會顯示自適性聯結計劃圖形,以及定義的自適性聯結閾值和預估的聯結類型。The estimated plan shows the Adaptive Join plan shape, along with a defined Adaptive Join threshold and estimated join type.

提示

查詢存放區擷取並可強制執行批次模式自適性聯結計劃。Query Store captures and is able to force a batch mode Adaptive Join plan.

符合自適性聯結的陳述式Adaptive join eligible statements

讓邏輯聯結符合批次模式自適性聯結有幾個條件:A few conditions make a logical join eligible for a batch mode Adaptive Join:

  • 資料庫相容性層級為 140 以上。The database compatibility level is 140 or higher.
  • 查詢是 SELECT 陳述式 (資料修改陳述式目前不適合)。The query is a SELECT statement (data modification statements are currently ineligible).
  • 聯結能夠由索引巢狀迴圈聯結或雜湊聯結實體演算法執行。The join is eligible to be executed both by an indexed Nested Loops join or a Hash join physical algorithm.
  • 雜湊聯結使用批次模式,不論是透過存在於整體查詢中的資料行存放區索引,或是由聯結直接參考的資料行存放區索引資料表。The Hash join uses Batch mode - either through the presence of a Columnstore index in the query overall or a Columnstore indexed table being referenced directly by the join.
  • 產生的巢狀迴圈聯結和雜湊聯結替代解決方案應該有相同的第一個子系 (外部參考)。The generated alternative solutions of the Nested Loops join and Hash join should have the same first child (outer reference).

自適性閾值資料列Adaptive threshold rows

下圖顯示雜湊聯結成本與巢狀迴圈聯結替代方案成本之間的交集範例。The following chart shows an example intersection between the cost of a Hash join versus the cost of a Nested Loops join alternative. 在此交集點決定的閾值,會隨之決定用於聯結作業的實際演算法。At this intersection point, the threshold is determined that in turn determines the actual algorithm used for the join operation.

聯結閾值

停用自適性聯結而不變更相容性層級Disabling Adaptive joins without changing the compatibility level

您可以在資料庫或陳述式的範圍停用自適性聯結,同時仍將資料庫相容性層級維持在 140 以上。Adaptive joins can be disabled at the database or statement scope while still maintaining database compatibility level 140 and higher.
若要針對源自資料庫的所有查詢執行停用自適性聯結,請在適用資料庫的內容中執行下列程式碼:To disable Adaptive joins for all query executions originating from the database, execute the following within the context of the applicable database:

-- SQL Server 2017
ALTER DATABASE SCOPED CONFIGURATION SET DISABLE_BATCH_MODE_ADAPTIVE_JOINS = ON;

-- Azure SQL Database, SQL Server 2019 and higher
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ADAPTIVE_JOINS = OFF;

啟用時,此設定在 sys.database_scoped_configurations 中會顯示為已啟用。When enabled, this setting will appear as enabled in sys.database_scoped_configurations. 若要針對源自資料庫的所有查詢執行重新啟用自適性聯結,請在適用資料庫的內容中執行下列程式碼:To re-enable adaptive joins for all query executions originating from the database, execute the following within the context of the applicable database:

-- SQL Server 2017
ALTER DATABASE SCOPED CONFIGURATION SET DISABLE_BATCH_MODE_ADAPTIVE_JOINS = OFF;

-- Azure SQL Database, SQL Server 2019 and higher
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ADAPTIVE_JOINS = ON;

您也可以將 DISABLE_BATCH_MODE_ADAPTIVE_JOINS 指定為 USE HINT 查詢提示,以針對特定查詢停用自適性聯結。Adaptive joins can also be disabled for a specific query by designating DISABLE_BATCH_MODE_ADAPTIVE_JOINS as a USE HINT query hint. 例如:For example:

SELECT s.CustomerID,
       s.CustomerName,
       sc.CustomerCategoryName
FROM Sales.Customers AS s
LEFT OUTER JOIN Sales.CustomerCategories AS sc
       ON s.CustomerCategoryID = sc.CustomerCategoryID
OPTION (USE HINT('DISABLE_BATCH_MODE_ADAPTIVE_JOINS')); 

注意

USE HINT 查詢提示的優先順序高於資料庫範圍設定或追蹤旗標設定。A USE HINT query hint takes precedence over a database scoped configuration or trace flag setting.

Null 值與聯結Null Values and Joins

當資料表的資料行中有 Null 值時,Null 值彼此並不相符。When there are null values in the columns of the tables being joined, the null values do not match each other. 若所要聯結之其中一個資料表的資料行中出現 Null 值,將只能藉由使用外部聯結來傳回該值 (除非 WHERE 子句會排除 Null 值)。The presence of null values in a column from one of the tables being joined can be returned only by using an outer join (unless the WHERE clause excludes null values).

以下列出兩個資料表,而每個資料表都有 NULL 值出現在參與於聯結的資料行中:Here are two tables that each have NULL in the column that will participate in the join:

table1                          table2
a           b                   c            d
-------     ------              -------      ------
      1        one                 NULL         two
   NULL      three                    4        four
      4      join4

比較資料行 a 與資料行 c 之數值的聯結並不會針對擁有 NULL 值的資料行進行比對:A join that compares the values in column a against column c does not get a match on the columns that have values of NULL:

SELECT *
FROM table1 t1 JOIN table2 t2
   ON t1.a = t2.c
ORDER BY t1.a;
GO

唯一傳回的只有在資料行 a 與 c 中有 4 的資料列:Only one row with 4 in column a and c is returned:

a           b      c           d      
----------- ------ ----------- ------ 
4           join4  4           four   

(1 row(s) affected)

基底資料表傳回的 Null 值,也很難與外部聯結傳回的 Null 值作區分。Null values returned from a base table are also difficult to distinguish from the null values returned from an outer join. 例如,下列 SELECT 陳述式將對這兩個資料表進行左方外部聯結:For example, the following SELECT statement does a left outer join on these two tables:

SELECT *
FROM table1 t1 LEFT OUTER JOIN table2 t2
   ON t1.a = t2.c
ORDER BY t1.a;
GO

以下為結果集:Here is the result set.

a           b      c           d      
----------- ------ ----------- ------ 
NULL        three  NULL        NULL 
1           one    NULL        NULL 
4           join4  4           four   

(3 row(s) affected)

此結果並無法讓您輕鬆地區分資料中的 NULL 與代表聯結失敗的 NULL。The results do not make it easy to distinguish a NULL in the data from a NULL that represents a failure to join. 當 Null 值出現在被聯結的資料中時,您最好透過使用一般聯結將它們從結果中刪除。When null values are present in data being joined, it is usually preferable to omit them from the results by using a regular join.

另請參閱See Also

執行程序邏輯和實體運算子參考 Showplan Logical and Physical Operators Reference
比較運算子 (Transact-SQL) Comparison Operators (Transact-SQL)
資料類型轉換 (資料庫引擎) Data Type Conversion (Database Engine)
子查詢 Subqueries
自適性聯結Adaptive Joins