question

DanielHeilig-7461 avatar image
0 Votes"
DanielHeilig-7461 asked SaurabhSharma-msft commented

ADF data flow cannot parse JSON schema correctly: FASTFAIL exception error

Hi everyhwere,

I am using the YouTube Analytics and Reporting API to get performance data from a Youtube channel and to store it in Azure's Data Factory (ADF) programmitcally. From the YouTube API, I get the following JSON output format:

 {
   "kind": "youtubeAnalytics#resultTable",
   "columnHeaders": [
     {
       "name": string,
       "dataType": string,
       "columnType": string
     },
   ],
   "rows": [
     [
       {value}, {value}, ...
     ]
   ]
 }



See link here: https://developers.google.com/youtube/analytics/reference/reports/query#response

In a first step, I used the copy activity to copy the JSON into the blob storage. Everythine fine so far. Now, I wanted to create a data flow, flatten the json and write the values into derived columns before saving it in a data sink.

Issue #1
In the first step of my data flow - defining the source - I select the JSON from the blob storage and when I click on data preview, I will get this error message: SparkException: Malformed records are detected in schema inference. Parse Mode: FAILFAST, see screenshot:

84882-unbenannt.png

I checked the JSON's schema on https://jsonformatter.curiousconcept.com/ at generally, it seems to be fine so I don't understand why ADF cannot read it properly.
I realized that ADF is putting my whole JSON into [ ] brackets which possibly causing the Spark exception error. When I delete the [ ] brackets manually by editing the JSON in the blob storage (go get the native JSON structure of the API back again), the data preview is working as excpected. In a nutshell: ADF is adding something which it later on does not like any longer, hmm hmmm. Any ideas what can be done here?

Issue #2
As soon as the parsing of the JSON is successful, I noticed that rows are not defined as object but as an array (in the data source linked service it's even defined as string), see both screenshots. Somehow, it should be defined as objects so that I am able to call the values within the rows, as it is for the "columnHeaders". I hope you get my pain point. Can anyone please tell me what I need to do here?

84848-bildschirmfoto-2021-04-06-um-112001.png
84835-bildschirmfoto-2021-04-06-um-112023.png

Looking forward to your replies. thanks in advance!


azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DanielHeilig-7461 avatar image
0 Votes"
DanielHeilig-7461 answered

Okay, I am one step closer to the overall solution:

  1. I realized that I screwed up one thing when copying the JSON into the blob storage. Actually, in the sink setting I used the data type "arrays of objects" which I think caused the additional [ ] brackets. When I change it to "group of objects" the brackets will not be added and in the data flow the JSON schema can be read properly.

  2. The only thing which is still pending: how can make the "rows" processible for ADF? I have a pretty ugly workaround that I just parse the whole row "as is" but ADF is complaining that this is a complex data structure. I really would like to define rows as an object or complex data type but I don't know how. I have the impression that the "mapping" settings in the copy activity could be helpful but I honestly have no idea how to apply the settings.

Thanks,
Daniel

84818-bildschirmfoto-2021-04-06-um-140832.png







5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Kiran-MSFT avatar image
0 Votes"
Kiran-MSFT answered

You can use a derived tx or flatten tx in dataflow to reshape the JSON structure

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DanielHeilig-7461 avatar image
0 Votes"
DanielHeilig-7461 answered SaurabhSharma-msft commented

Hi Kiran,

Thanks for your feedback. Yes, I want to use the flatten and derive to columns steps in the data flow and actually, I am able to do it for JSON's with a clear schema BUT obviously my JSON example above is not recognized as a clear schema.

The value of the columnHeader key is an array of string, see screenshot attached.
I would expect that the rows key is also an array of strings, showing me the value in the same way as for columnHeaders but this is not the case and I don't understand why. Any ideas? As long as the rows key is not defined as array of strings, I can not flatten and derive it into columns.

85123-bildschirmfoto-2021-04-06-um-112001.png



· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

If you look closer, the schema indeed mirrors the json. rows is an array of arrays not array of strings. columnHeader is an array of structures.

Dataflow is showing the right schema. I am not sure what you are seeing in the dataset or copy activity.

0 Votes 0 ·

Thank you @Kiran-MSFT ,
@danielheilig-7461 Please let us know if you still have any issues.

0 Votes 0 ·