Merge Into (Delta Lake on Azure Databricks)
Merge a set of updates, insertions, and deletions based on a source table into a target Delta table.
MERGE INTO [db_name.]target_table [AS target_alias] USING [db_name.]source_table [<time_travel_version>] [AS source_alias] ON <merge_condition> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ] [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
<matched_action> = DELETE | UPDATE SET * | UPDATE SET column1 = value1 [, column2 = value2 ...] <not_matched_action> = INSERT * | INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]) <time_travel_version> = TIMESTAMP AS OF timestamp_expression | VERSION AS OF version
- There can be 1, 2, or 3
WHENclauses. Of these, at most 2 can be
WHEN MATCHEDclauses, and at most 1 can be
WHEN NOT MATCHEDclause.
WHEN NOT MATCHEDmust be the last clause.
WHEN MATCHEDclauses are executed when a source row matches a target table row based on the match condition. These clauses have the following semantics.
WHEN MATCHEDclauses can have at most on
mergeonly updates the specified columns of the matched target row. The
DELETEaction will delete the matched row.
WHEN MATCHEDclause can have an optional condition. If this clause condition exists, the
DELETEaction is executed for any matching source-target row pair row only when when the clause condition is true.
- If there are two
WHEN MATCHEDclauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). The first clause must have a clause condition (otherwise the second clause will never be executed).
- If both
WHEN MATCHEDclauses have conditions and neither of the conditions are true for a matching source-target row pair, then the matched target row is left unchanged.
- To update all the columns of the target Delta table with the corresponding columns of the source dataset, use
UPDATE SET *. This is equivalent to
UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
WHEN NOT MATCHEDclause is executed when a source rows does not match any target row based on the match condition. This clause has the following semantics.
whenNotMatchedclause can only have the
INSERTaction. The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns,
NULLwill be inserted.
Prior to Databricks Runtime 6.6,
INSERTaction required all columns in the target table to be provided.
WHEN MATCHEDclause can have an optional condition. If the clause condition is present, a source row is inserted only if that condition is true for that row. Otherwise, the source column is ignored.
To insert all the columns of the target Delta table with the corresponding columns of the source dataset, use
INSERT *. This is equivalent to
INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
You can use
MERGE for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Merge examples for a few examples.