Hi,
I understand the concept of incremental load to data lake with each days data stored as different file in the data lake storage.
My question is how to handle to records from source which are updated and not inserted in the incremental load to data lake storage
For example, say I have a record from requests table in onpremise sql server database with the status as open.
When the ADF pipeline runs today, this data is stored in the data lake storage in a csv file.
Tomorrow the status of the record changes to pending and when the ADF pipeline runs again, the modified record is read and copied in a new file in the data lake storage with status as pending.
Now in the Data lake storage, I have 2 files with the same request record but with different status.
How to handle such scenarios in Data lake storage to have a single record without duplication.