question

KoenvanWielink-0539 avatar image
0 Votes"
KoenvanWielink-0539 asked KoenvanWielink-0539 commented

Azure Data Factory HTTP connector not returning XML output

For one of our clients we are trying to extract data from a somewhat old fashioned REST API which returns an XML response. We have already discovered that the REST API connector cannot be used for this purpose as it can only handle JSON responses. We therefore switched to the HTTP connector instead which is giving back data, which we were then able to write to an Azure SQL DB sink. The strange thing is that the ADF output is always in JSON, despite the API response being XML (confirmed in Postman).

At first we didn't think this was a problem but I just discovered that array handling is not being done correctly. If multiple records are returned everything is fine, but when only a single record is returned by the API the square brackets around the array are removed and ADF no longer recognizes it correctly in the mapping because it expects an array. As a result the single record is not written to the SQL database.

We have tried using the Content-Type: application/xml and Accept: application/xml header to force an XML output, but that also didn't work. As the API really returns XML, it seems that ADF is converting this to JSON on its own accord and not doing it correctly. Is there any way that we can handle an XML API in ADF using standard connectors? We don't have programmers on our team who can create custom connectors so that's not an option. Any insights would be greatly appreciated.

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SathyamoorthyVijayakumar-MSFT avatar image
0 Votes"
SathyamoorthyVijayakumar-MSFT answered

Hello @KoenvanWielink-0539

Welcome to the Microsoft Q&A platform.

You are right that the XML that has been read is converted to JSON and is used by the service. You could use @xml() to convert string to XML representation - that may not meet your goal - as it cannot be a directly consumed by other activities.

134900-image.png

I had done small repro at my end, and I was able to replicate the behavior you have observed.

For Testing Purpose, I had created two XML for input.

134935-image.png

I was able to overcome the behavior by having two copy activities with different mappings - one for multiple rows and one for single rows.

134962-image.png

For XML1 - Schema : BookStore --> Book[]
You will have to create explicit Mapping for the Copy Activity 1 ( Multiple rows as per Above screenshot).
[ You can do Import Schema in both scenarios - But you should be able to get the Multiple rows and Single row response to do it automatically. ]

134850-image.png

The above is just an illustration per my source. You could modify as per your XML Structure

Note : Book here is an array

Now for the Copy Activity 2 ( "Single Row") - This executes on the failure of the Copy Activity 1 (Multiple Rows) .

134936-on-failure.gif

When only one row is returned - Book will be considered as an Object and not as array - Multiple Rows Copy Activity fails as it cannot find the Book [] . Single Row Copy Activity with the below mapping executes.

For XML2 - JSON Schema -Bookstore --> Book

134877-image.png

Note: Book here is an object.

Execution :

  1. When Multiple Rows are Returned as a result of HTTP Connector - Multiple Rows Copy Activity succeeds and Single Row Copy Activity is ignored
    134955-image.png

  2. When Single Row is Returned as a result of HTTP Connector - Multiple Rows Copy Activity fails because of schema mapping and Single Row Copy Activity is executed
    134899-image.png

In both the cases the row(s) are written to the same Sink (SQL DB in your Case)

Hope this helps. Do let us know if you any further queries.



  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators



image.png (11.1 KiB)
image.png (63.3 KiB)
image.png (9.5 KiB)
image.png (44.2 KiB)
on-failure.gif (104.9 KiB)
image.png (41.9 KiB)
image.png (8.0 KiB)
image.png (8.4 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KoenvanWielink-0539 avatar image
0 Votes"
KoenvanWielink-0539 answered KoenvanWielink-0539 commented

Good morning Sathyamoorthy,

Thanks for the lengthy response. Unfortunately this solution won't work for us. The first problem is that we have multiple nested arrays within the datasets, all of which could present the same problem. It is not feasible to create separate success/failure flows for each of the arrays in all of our datasets.
Secondly the funny thing is that the flow does not fail, it just doesn't write any records. So even if we were to set up the success/failure flows as described by you, it would never reach the failure path as no error is generated.
I'm afraid that it's not going to work to read this API directly from ADF. As such I've spent last night setting up a logic app that does the same, and which will write the XML output to files into an ADLSv2 container. From there I should hopefully be able to process the XML files correctly via ADF.

Best regards,

Koen

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@KoenvanWielink-0539 - Thanks for getting back. Sorry to hear that the above might not work. From my testing, failure path will be generated only when there is explicit mapping.

Also, I'm glad that you're working on an alternative - One thing I did not understand - Are you planning to write XML response to an XML File in ADLSv2 & consume it? - if this is the case - won't the same XML Structure preserved - which will result in the same behavior when fed into ADF (when you read an XML file directly - if m not wrong it is processed as JSON again)? or are you storing the responses files to a file, process it further and then use ADF in moving it to SQL?

0 Votes 0 ·
KoenvanWielink-0539 avatar image KoenvanWielink-0539 SathyamoorthyVijayakumar-MSFT ·

Hi Sathya,

Unfortunately you are correct. I was under the impression that this would work if the source was ADLS, but it's resulting in the same behavior when I read the XML file. I guess I have to transform the data first.
Thanks again for the help.

Best regards,

Koen

1 Vote 1 ·