Copying complex JSON data dynamically using Mapping Data Flows in Azure

Curiel, H. (Hosé) 20 Reputation points
2024-04-02T09:01:14.78+00:00

In my current data engineering project, I am tasked with building a pipeline that can efficiently handle various JSON structures. One of the key requirements is the ability to dynamically flatten nested arrays within the JSON data using parameters, and subsequently map these flattened arrays to target columns automaticly. Is it posible to do this whit Mapping dataflows?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,387 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,586 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee
    2024-04-19T11:45:37.3333333+00:00

    Curiel, H. (Hosé) ,

    Thankyou for posting your query on Microsoft Q&A platform.

    Looking at the schema of your input data, it seems there are multiple arrays that needs to be flattened. At a time , we can select only one array to unroll by in a flatten transformation or if there is commonality between the column names/type etc then use rule based mapping to flatten arrays based on rules and flatten structures based on hierarchy levels.

    Flatten pattern

    Additionally, parameterizing the array to flatten would be too complex for this input data.

    I would suggest you to write your custom code in Python or c# using Azure functions and execute it with custom activity in ADF.Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 15,521 Reputation points
    2024-04-02T11:15:25.55+00:00

    Yes, it is possible to handle complex JSON structures, including dynamically flattening nested arrays and mapping these to target columns automatically using Mapping Data Flows or Azure Synapse Analytics.

    But always keep in mind that without sharing your data, I will only to guide you in a general way.

    5 main steps to follow in your case :

    1.First, you need to ingest your JSON data. You can do this by creating a linked service to where your JSON resides and then creating a dataset pointing to the JSON files.

    2.Within your data factory or Synapse workspace, create a new Mapping Data Flow.

    3.Add a Source transformation where you select your JSON dataset.

    Here, you can specify the input schema.

    For you case, you might initially treat it as a single column of type string if the structure is highly dynamic or nested.

    4.To dynamically flatten nested arrays, you can use the Flatten transformation.

    The Flatten transformation allows you to unroll nested arrays in your JSON data into a tabular format. If the structure of your nested arrays varies, you can parameterize the column or path you want to flatten based on pipeline parameters or dataset parameters, making the process dynamic.

    5.Use the Derived Column transformation to dynamically generate or transform columns based on your requirements.

    If you need to map these flattened columns to target columns automatically, you can use expressions in the Derived Column transformation. Y

    6.Finally, use a Sink transformation to map the transformed data to your target system

    In the sink, you can map the columns from your data flow to the target columns. If the mapping needs to be dynamic, consider using dataset parameters or linked service parameters to modify the sink behavior based on the input data.

    Recommendation of the experts:

    • Enable schema drift in your source and sink transformations if you expect the structure of your JSON to change over time. This allows ADF to automatically manage changes in the schema.
    • Consider optimizing your data flows for performance, especially when dealing with large volumes of data. Techniques include partitioning your data and tuning the Data Flow performance settings.
    • Use the Debug feature within Mapping Data Flows to test and iterate on your design. This allows you to see sample data output and ensure your transformations are working as expected.