question

DibyenduBose-8539 avatar image
0 Votes"
DibyenduBose-8539 asked MarkKromer-MSFT commented

Parse bigJson (nested Json) files in Azure Data Lake Gen 2 and saving data into separate normalized tables in Azure SQL

We have BigJson files (complex nested Json files 10 levels deep) stored in Azure Data Lake gen 2. We need to read the Json files, parse them and store the data into separate tables in Azure SQL DB.
So, it's NOT like parsing the Json into one single flattened denormalized table in SQL DW.
Can we do this by Azure Data Factory? What is the best way to achieve this?

azure-data-factoryazure-synapse-analyticsazure-databricks
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered MarkKromer-MSFT commented

Hello @DibyenduBose-8539 and welcome to Microsoft Q&A.

If I understand you right, you want to not only flatten a very deeply nested JSON, but you want to split it into multiple tables. This is beyond the ability of Copy Activity, but I think Data Flow can accomplish this. This task sounds non-trivial, especially if the nesting forks.

In the tags, you included Data Factory, Synapse, and Databricks. Can I take this to mean you are comfortable using a code-based solution , not just a visual interface? If that is so, more options become available, such as writing custom code to do the work in Synapse and Databricks notebooks / jars. There is also custom code in the Custom Activity.

If all else fails there is using the SQL JSON functions.

Whether do go Data Flow or custom code depends upon volume and complexity.

As I don't know the details of your JSON structure, nor the criteria or desired output, I can't provide more precise advice.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for your response @MartinJaffer-MSFT. Yes, I have ADF, ADB and Synapse as options, but would definitely prefer ADF over the other two as it is UI based. I have been exploring ADF Dataflows and am able to figure out how to flatten the JSON into one single denormalized table through a series of flatten activities unrolling one node after another.
What I am not able to comprehend is how to populate the data into separate Azure SQL DB tables. Do I need to populate the data after each flatten activity? Is that possible even?

0 Votes 0 ·

After you flatten the data in data flows, you can add as many sinks as you like by using "new branch" and each sink can have a different table to write to. If you need to intelligently split the rows to different sinks, use a Conditional Split transformation.

1 Vote 1 ·