Hi @MRD
Welcome to Microsoft Q&A platform and thanks for posting your question here.
To avoid duplicate inserts in ADF data flows with SCD2 scenarios during retries, there are a few approaches you can take:
Firstly, you can utilize ADF's built-in retry functionality for the data flow activity within your pipeline. This involves defining the number of retries and the wait interval between attempts. You can also enable state persistence on your Integration Runtime to store the data flow's execution state between retries. However, state persistence only works for certain data flow types and might not guarantee complete avoidance of duplicates for SCD2 inserts in all cases, especially if the failure happened after some inserts.
Secondly, you can implement a custom retry logic with a high water mark table. This involves creating a separate table to store the "high water mark" of the data flow execution. This table would have a single column that keeps track of the last successfully processed record identifier. In your data flow, before performing the insert operation, you would query the high water mark table and filter the incoming data to exclude records processed before the last successful run. You would then update the high water mark table after a successful insert batch. This approach offers more control over retry behavior and avoids duplicate inserts for SCD2 scenarios.
Thirdly, you could consider implementing CDC at the source level if feasible. This allows ADF to capture only the changes in the source data, eliminating the need for full data transfers and reducing the risk of duplicates during retries. Alternatively, you could evaluate if switching to a different data integration tool might be a viable option for your specific needs.
Choosing the right approach depends on your specific data flow type, the nature of transient errors, and the complexity of your SCD2 logic. It's important to consider the trade-offs between development effort, performance overhead, and the level of control you require over retry behavior.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.