How can i dynamically Flatten A Complex Nested JSON Structure in ADF

Noman Qureshi 20 Reputation points
2024-04-29T05:22:57.75+00:00

I am in the process of designing a solution. I need to load Rest API based complex JSON data into a flat file. I want to use the same pipeline for fetching different API result objects. My question is

  1. how can i dynamically flatten variant JSON files into their respective flat files using ADF ? so to have generic pattern calling different API's using MetaData ADF and loading the flatten result into a table ? I was thinking to use python/databricks to handle file with different schemas and only get ADF to make API calls and return the results ?

Data Flows approach ties you JSON schema and you can't really have generic pipeline handling all kind of API request ?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,955 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,685 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,561 Reputation points Microsoft Employee
    2024-04-29T17:30:21.4233333+00:00

    Hi Noman Qureshi ,

    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    As per your query , it seems you are looking for a way to dynamically flatten complex nested JSON data from REST APIs into flat files using Azure Data Factory.

    Your approach to making API calls using Azure Data Factory (ADF) and then using Databricks/Python to dynamically parse and flatten different JSON schemas is a viable solution, else, you would need to consider using mapping dataflow which would not be feasible for dynamic flattening the jsons into multiple files.

    Kindly consider the below high level steps to design your pipeline:

    • Set up a parameterized pipeline in ADF that can take API endpoint URLs and any necessary parameters as inputs.
    • Use the Web activity or the HTTP activity to make the API calls and store the JSON response in an intermediary storage location, such as Azure Blob Storage or Azure Data Lake Storage.
    • Create a Databricks notebook or a Python script that can handle various JSON structures.
    • The notebook/script will read the JSON file from the blob/ADLS
    • Use Python's flexibility with libraries like pandas and json_normalize to flatten the JSON data dynamically. You can determine the structure on-the-fly or use metadata-driven logic to map the JSON fields to the flat file format.
    • After processing, the notebook/script writes the flattened data into the final flat file format in the desired location.
    • Maintain a metadata store (this could be a database table, or a JSON file, etc.) that contains the mapping of different APIs to their respective schema definitions and the logic needed to flatten them.
    • use the ADF's Databricks activity to invoke the Databricks notebook or a custom activity if you're using a standalone Python script.

    Reference: Dynamically processing complex json file in pyspark

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


0 additional answers

Sort by: Most helpful