合并到(Azure Databricks 上的 Delta Lake)Merge Into (Delta Lake on Azure Databricks)

将基于源表的一组更新、插入和删除操作合并到目标增量表中。Merge a set of updates, insertions, and deletions based on a source table into a target Delta table.

MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [<time_travel_version>] [AS source_alias]
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

wherewhere

<matched_action>  =
  DELETE  |
  UPDATE SET *  |
  UPDATE SET column1 = value1 [, column2 = value2 ...]

<not_matched_action>  =
  INSERT *  |
  INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])

<time_travel_version>  =
  TIMESTAMP AS OF timestamp_expression |
  VERSION AS OF version
  • 可以有任意数量的 WHEN MATCHEDWHEN NOT MATCHED 子句。There can be any number of WHEN MATCHED and WHEN NOT MATCHED clauses.

备注

在7.2 和更低 Databricks Runtime 中, MERGE 最多可以有2个 WHEN MATCHED 子句和最多1个 WHEN NOT MATCHED 子句。In Databricks Runtime 7.2 and below, MERGE can have at most 2 WHEN MATCHED clauses and at most 1 WHEN NOT MATCHED clause.

  • WHEN MATCHED 当源行与目标表行根据匹配条件匹配时,将执行子句。WHEN MATCHED clauses are executed when a source row matches a target table row based on the match condition. 这些子句具有以下语义。These clauses have the following semantics.
    • WHEN MATCHED 子句最多可以有 UPDATE 和一个 DELETE 操作。WHEN MATCHED clauses can have at most on UPDATE and one DELETE action. UPDATE中的操作 merge 仅更新匹配目标行的指定列。The UPDATE action in merge only updates the specified columns of the matched target row. DELETE操作将删除匹配的行。The DELETE action will delete the matched row.
    • 每个 WHEN MATCHED 子句都可以有一个可选条件。Each WHEN MATCHED clause can have an optional condition. 如果存在此子句条件,则 UPDATE DELETE 只有当子句条件为 true 时,才对任何匹配的源目标行对行执行或操作。If this clause condition exists, the UPDATE or DELETE action is executed for any matching source-target row pair row only when when the clause condition is true.
    • 如果有多个 WHEN MATCHED 子句,则按顺序对其进行计算, (即子句的顺序) 。If there are multiple WHEN MATCHED clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). WHEN MATCHED除最后一个子句之外的所有子句都必须有条件。All WHEN MATCHED clauses, except the last one, must have conditions.
    • 如果两个 WHEN MATCHED 子句都具有条件,并且两个条件都不满足匹配的源目标行对,则匹配的目标行将保持不变。If both WHEN MATCHED clauses have conditions and neither of the conditions are true for a matching source-target row pair, then the matched target row is left unchanged.
    • 若要使用源数据集的相应列更新目标增量表的所有列,请使用 UPDATE SET *To update all the columns of the target Delta table with the corresponding columns of the source dataset, use UPDATE SET *. UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]对于目标增量表的所有列,这等效于。This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...] for all the columns of the target Delta table. 因此,此操作假定源表与目标表中的列具有相同的列,否则,查询将引发分析错误。Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
      • 启用自动架构迁移时,此行为会发生更改。This behavior changes when automatic schema migration is enabled. 有关详细信息,请参阅 自动架构演变See Automatic schema evolution for details.
  • WHEN NOT MATCHED 当源行与基于匹配条件的任何目标行都不匹配时,将执行子句。WHEN NOT MATCHED clauses are executed when a source row does not match any target row based on the match condition. 这些子句具有以下语义。These clauses have the following semantics.
    • WHEN NOT MATCHED 子句只能具有 INSERT 操作。WHEN NOT MATCHED clauses can only have the INSERT action. 新行是根据指定的列和对应的表达式生成的。The new row is generated based on the specified column and corresponding expressions. 不需要指定目标表中的所有列。All the columns in the target table do not need to be specified. 对于未指定的目标列, NULL 将插入。For unspecified target columns, NULL will be inserted.

      备注

      在6.5 和更低 Databricks Runtime 中,必须提供操作的目标表中的所有列 INSERTIn Databricks Runtime 6.5 and below, you must provide all columns in the target table for the INSERT action.

    • 每个 WHEN NOT MATCHED 子句都可以有一个可选条件。Each WHEN NOT MATCHED clause can have an optional condition. 如果子句条件存在,则只有当该行的条件为 true 时,才插入源行。If the clause condition is present, a source row is inserted only if that condition is true for that row. 否则,将忽略源列。Otherwise, the source column is ignored.

    • 如果有多个 WHEN NOT MATCHED 子句,则按顺序对其进行计算, (即子句的顺序) 。If there are multiple WHEN NOT MATCHED clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). WHEN NOT MATCHED除最后一个子句之外的所有子句都必须有条件。All WHEN NOT MATCHED clauses, except the last one, must have conditions.

    • 若要插入目标增量表的所有列以及源数据集的相应列,请使用 INSERT *To insert all the columns of the target Delta table with the corresponding columns of the source dataset, use INSERT *. INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])对于目标增量表的所有列,这等效于。This is equivalent to INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...]) for all the columns of the target Delta table. 因此,此操作假定源表与目标表中的列具有相同的列,否则,查询将引发分析错误。Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.

      备注

      启用自动架构迁移时,此行为会发生更改。This behavior changes when automatic schema migration is enabled. 有关详细信息,请参阅 自动架构演变See Automatic schema evolution for details.

重要

MERGE如果源数据集的多个行匹配并尝试更新目标增量表的相同行,则操作可能会失败。A MERGE operation can fail if multiple rows of the source dataset match and attempt to update the same rows of the target Delta table. 根据合并的 SQL 语义,此类更新操作是不明确的,因为不确定应该使用哪个源行来更新匹配的目标行。According to the SQL semantics of merge, such an update operation is ambiguous as it is unclear which source row should be used to update the matched target row. 您可以预处理源表以消除多个匹配项的可能。You can preprocess the source table to eliminate the possibility of multiple matches. 请参阅 变更数据捕获示例-它将变更数据集预处理 (也就是说,源数据集) 在将更改应用到目标增量表之前,只保留每个密钥的最新更改。See the Change data capture example—it preprocesses the change dataset (that is, the source dataset) to retain only the latest change for each key before applying that change into the target Delta table.

备注

在 Databricks Runtime 7.3 LTS 及更高版本中,无条件删除匹配时允许多个匹配项 (,因为即使存在多个匹配项) ,无条件删除也不是不明确的。In Databricks Runtime 7.3 LTS and above, multiple matches are allowed when matches are unconditionally deleted (since unconditional delete is not ambiguous even if there are multiple matches).

示例Examples

可用于 MERGE 复杂的操作,例如删除重复数据、插入更改数据、应用 SCD 类型2操作等。有关一些示例,请参阅 合并示例You can use MERGE for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Merge examples for a few examples.