Understanding the Structure of Incremental D365 FO data in Data Lake Gen2

Arbuckle, Grant 0 Reputation points
2024-04-16T12:19:26.2366667+00:00

I am a data engineer new to working with Azure and I have set up an ETL process to read incremental data out of Data Lake Gen2 storage and push to Azure SQL Database. I am using Azure Synapse Link to expose Dynamics 365 FO tables to the data lake. I'm using python and duckdb to query the data lake parquet files and return only new records that do not exist in the Azure SQL DB at a regular interval, then Azure Data Factory to upsert the equivalent tables in Azure SQL.

My question is - how is the synapse SQL pool querying the data lake for the FO table data? I have noticed that records that are deleted in our D365 FO environment are not showing as deleted when querying deltalake tables in the synapse workspace, well after the regular spark pool synchronization has refreshed everything. Sometimes I see records with an "IsDelete" column set to true to signify that the record is deleted, but often I don't.Can someone please help me understand how synapse sql is querying the data, and why it seems so inconsistent? It's really difficult to tie my row counts and values in the SQL DB when synapse sql is acting unexpectedly.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,344 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,376 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 9,685 Reputation points Microsoft Vendor
    2024-04-17T07:25:53.13+00:00

    Hi @Arbuckle, Grant

    Thanks for the question and using MS Q&A platform.

    It's great to hear that you're working with Azure Synapse Link to expose Dynamics 365 FO tables to the data lake and using Python and DuckDB to query the data lake parquet files. I'll do my best to help you understand how Synapse SQL is querying the data and why it seems inconsistent.

    When you use Azure Synapse Link to expose Dynamics 365 FO tables to the data lake, it creates a set of delta lake tables in the data lake. These delta lake tables are used to store the incremental changes to the data in Dynamics 365 FO. The delta lake tables are organized into a folder structure that represents the table hierarchy in Dynamics 365 FO.

    When you query the delta lake tables using Synapse SQL, it uses the Apache Spark engine to read the delta lake files and return the results. The Spark engine uses a technique called predicate pushdown to optimize the query performance. Predicate pushdown is a technique where the filter conditions in the SQL query are pushed down to the data lake files, so only the relevant data is read from the files.

    Regarding the issue you mentioned about deleted records not showing up in the delta lake tables, it's possible that the delta lake tables are not being updated correctly. One possible reason for this could be that the delta lake tables are not being refreshed frequently enough. Another reason could be that the delta lake tables are not being updated correctly when records are deleted in Dynamics 365 FO.

    To troubleshoot this issue, you can try to manually refresh the delta lake tables and see if the deleted records show up. You can also check the logs for the Azure Synapse Link to see if there are any errors or warnings related to the delta lake tables.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.