question

KailashRavuri-2642 avatar image
0 Votes"
KailashRavuri-2642 asked AbheeshSKurup-2191 commented

ADF - Dataflow - flatten json with drifting schema

I have a requirement to convert the json into csv(or a SQL table) or any other flatten structure using Data Flow in Azure Data Factory. I need to take the property names at some hierarchy and values of the child properties at lower of hierrarchy from the source json and add them both as column/row values in csv or any other flatten structure.

Is there a way to flatten the following rather inconvenient type of JSON object in Azure Data Factory (V2)? Top-level keys in the data represent dynamically generated dates, which are different in each file. Hence, the schema is drifting.

Source Data Rules/Constraints :

 Parent level data property names will change dynamically (e.g. ABCDataPoints,CementUse, CoalUse, ABCUseIndicators names are dynamic)
 The hierarchy always remains same as in below sample json.

I need some help in defining Json path/expression to get the names ABCDataPoints,CementUse, CoalUse, ABCUseIndicators etc. I am able to figure out how to retrieve the values for the properties Value,ValueDate,ValueScore,AsReported.

Source Data sample
{
"ABCDataPoints": {
"CementUse": {
"Value": null,
"ValueDate": null,
"ValueScore": null,
"AsReported": [],
"Sources": []
},
"CoalUse": {
"Value": null,
"ValueDate": null,
"AsReported": [],
"Sources": []
}
},
"ABCUseIndicators": {
"EnvironmentalControversies": {
"Value": false,
"ValueDate": "2021-03-06T23:22:49.870Z"
},
"RenewableEnergyUseRatio": {
"Value": null,
"ValueDate": null,
"ValueScore": null
}
},
"XYZDataPoints": {
"AccountingControversiesCount": {
"Value": null,
"ValueDate": null,
"AsReported": [],
"Sources": []
},
"AdvanceNotices": {
"Value": null,
"ValueDate": null,
"Sources": []
}
},
"XYXIndicators": {
"AccountingControversies": {
"Value": false,
"ValueDate": "2021-03-06T23:22:49.870Z"
},
"AntiTakeoverDevicesAboveTwo": {
"Value": 4,
"ValueDate": "2021-03-06T23:22:49.870Z",
"ValueScore": "0.8351945854483925"
}
}

Destination Sample
94453-rq345.png


azure-data-factory
rq345.png (15.0 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @KailashRavuri-2642 and welcome to Microsoft Q&A.

There is a flatten transformation. Are you having difficulties with it?


0 Votes 0 ·

Hi
The Flatten transformation only works on only statically specified single unroll root and Unroll root. In my case it have multiple level of child items and properties names are dynamic as well. So the built-in Flatten transformation didn't worked out.

0 Votes 0 ·
MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered

I see. I've uploaded your sample data, and been trying different things for several hours. I'm not sure your ask is possible.

@MarkKromer-MSFT do you have any ideas for this dynamic flatten? I'd like to just pivot, but column names aren't exactly available.

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

We are making the unroll properties in the Flatten transformation configurable with expressions and parameters. The work is almost done, going through final testing. That will help you to solve this use case in a more dynamic way.

1 Vote 1 ·

@MarkKromer-MSFT Thanks for the update. Does the new changes solves the structure I mention above. If you look at my sample Json the child objects are not arrays they are just objects inside objects and also I need to take Parent property names

1 Vote 1 ·

Flatten only unrolls arrays. To turn a hierarchy into relational structures, use a derived column and set your columns to the json properties in your incoming json source.

1 Vote 1 ·

Thanks @MartinJaffer-MSFT for trying out, yes we also tried in lot of ways and it is not currently possible in features supported by ADF at this moment.

1 Vote 1 ·
AbheeshSKurup-2191 avatar image
0 Votes"
AbheeshSKurup-2191 answered RadhikaNair-3461 commented

@MarkKromer-MSFT Is Flatten transformation now configurable with expressions and parameters. I have used Flattening but now I understand that I have a dynamic data coming so ..just checking.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Yes. Use the rule-based mapping feature in the Flatten transformation.

1 Vote 1 ·

I have a similar issue where I have dynamic json and enabled schema drift in my source, To flatten the json I am coding it as array(byName('Columnname')) in unroll by dynamic content section. Here columnname is the array which need to be unrolled. I am getting this error -

{"message":"Job failed due to reason: at : scala.MatchError: array(byName('Columname')) (of class com.microsoft.dataflow.FunctionNode). Details:scala.MatchError: array(byName('Columname')) (of class com.microsoft.dataflow.FunctionNode)\n\tat com.microsoft.dataflow.transformers.FoldDownUtils$.unstackUnrollByRecurse(FoldDown.scala:79)\n\tat

Kindly help as to how to unroll array of dynamic json with schema drift option enabled at source

0 Votes 0 ·

Thanks it worked

0 Votes 0 ·