question

MicheleAndrico-8674 avatar image
0 Votes"
MicheleAndrico-8674 asked MarkKromer-MSFT answered

How to incrementally load data to Delta Lake on Azure Data Factory?

Hello everyone,

I'm trying to incrementally ingest data from a Delta table (Delta Lake format) stored in an Azure Data Lake gen2 to another Delta table in a different Azure Data Lake gen2 using Azure Data Factory's Data Flow.

With a Filter activity I retrieve only the new data to ingest (based on technical fields) and, as you can see in the attached snapshot, source data is correctly filtered through Data Flow.

The Data Flow process produces a new version of the Delta table, but the new parquet files created contains either updated and unchanged data. Is it possible to create new table versions which contains only new/updated data?

87355-data-flow-monitor-snapshot.png


azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MarkKromer-MSFT avatar image
0 Votes"
MarkKromer-MSFT answered

Just use an Alter Row with Update/Upsert rules. Delta Lake will maintain the ledger of table versions. You can then access past versions of the tables through the Delta source transformation using "time travel" features like version # or timestamps.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for the answer. I already tried your proposal, but I still have a doubt: when creating a new table version is it possible to avoid the creation of parquet files containing unchanged data already present in parquet files which refer to old Delta table versions? (e.g. old table version in the sink folder contains 10m rows ->in the source table only 10 rows get updated -> new parquet files created in the sink folder contains only the 10 updated rows)

Thank you in advance

0 Votes 0 ·

Not that I am aware of. The Delta Lake logging and version control handles that.

1 Vote 1 ·