Hi Noman Qureshi ,
Welcome to Microsoft Q&A platform and thanks for posting your query here.
As per your query , it seems you are looking for a way to dynamically flatten complex nested JSON data from REST APIs into flat files using Azure Data Factory.
Your approach to making API calls using Azure Data Factory (ADF) and then using Databricks/Python to dynamically parse and flatten different JSON schemas is a viable solution, else, you would need to consider using mapping dataflow which would not be feasible for dynamic flattening the jsons into multiple files.
Kindly consider the below high level steps to design your pipeline:
- Set up a parameterized pipeline in ADF that can take API endpoint URLs and any necessary parameters as inputs.
- Use the Web activity or the HTTP activity to make the API calls and store the JSON response in an intermediary storage location, such as Azure Blob Storage or Azure Data Lake Storage.
- Create a Databricks notebook or a Python script that can handle various JSON structures.
- The notebook/script will read the JSON file from the blob/ADLS
- Use Python's flexibility with libraries like
pandas
andjson_normalize
to flatten the JSON data dynamically. You can determine the structure on-the-fly or use metadata-driven logic to map the JSON fields to the flat file format. - After processing, the notebook/script writes the flattened data into the final flat file format in the desired location.
- Maintain a metadata store (this could be a database table, or a JSON file, etc.) that contains the mapping of different APIs to their respective schema definitions and the logic needed to flatten them.
- use the ADF's Databricks activity to invoke the Databricks notebook or a custom activity if you're using a standalone Python script.
Reference: Dynamically processing complex json file in pyspark
Hope it helps. Kindly accept the answer by clicking on Accept answer
button. Thankyou