question

BT-HM avatar image
0 Votes"
BT-HM asked BT-HM answered

How to reset the change data capture information? (Azure synapse dataflow)

Hello,

I am using the ADLS Gen2 source with the "Enable change data capture" option set to true.

  • At what level is the information stored to determine which files have changed?

  • What do I need to change to initiate a new full load?

  • Does this function work with parameterized sources? I would like to use it in a foreach activity.

azure-data-factoryazure-synapse-analyticsazure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered

Hi @BT-HM ,

Thank you for posting query in Microsoft Q&A Platform.

Q. At what level is the information stored to determine which files have changed?
If we enable change data capture in source transformation, Azure data factory remembers the processed files along with last modified information from last run. And in new run It will process only new files added to folder or modified files.
Please check below video where detailed demo available for same.
https://www.youtube.com/watch?v=Y9J5J2SRt5k

Q. What do I need to change to initiate a new full load?
You can consider disabling change data capture option under source transformation and publishing changes before new run.

Q. Does this function work with parameterized sources? I would like to use it in a foreach activity.
You mean parameterizing source dataset for different folders in storage? If yes, then I guess it should work. Please feel free to share if any errors in it.

Hope this helps. Please let us know if any further queries. Thank you.


Please consider hitting Accept Answer button. Accepted answers help community as well.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

BT-HM avatar image
0 Votes"
BT-HM answered

Hello @ShaikMaheer-MSFT ,

thank you very much for your reply. I am still not quite sure how this works. For example:

I have a parameterized dataflow with a common data model inline data source. I add two data flow activities to a new pipeline. When I run the pipeline, I get a full load twice.
When I clone the pipeline, the first run of the new pipeline does not copy any files. If I clone the pipeline and change the names of the data flow activities, it starts with a full load.
However, if I change the names of the activities of the pipeline that has already been run, no files are copied.

What also doesn't work is using the data flow activity in a foreach loop. The first item is fully loaded, but the other items are not fully loaded.



Translated with www.DeepL.com/Translator (free version)

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.