question

tnw004-3504 avatar image
0 Votes"
tnw004-3504 asked RenatoMedeiros-5042 commented

Data Factory: Multiple collection reference for XML copy to SQL

I am attempting to set up a data factory pipeline which will ingest XML data to an Azure SQL Database. The XML follows this structure:

 <schools>
     <school>
         <students>
             <student></student>
             <student></student>
             <student></student>
         </students>
     </school>
     <school>
         <students>
             <student></student>
             <student></student>
             <student></student>
         </students>
     </school>
 <schools>

I've set up multiple tables in SQL to accept this data. Put simply, there is a schools table which will take all the schools and a students table to accept all the students.

I set up the copy task in data factory and had to set the "Collection reference" to <school> in order to get it to iterate over the schools. If I don't do this it only loads in the first school and ignores the rest.

This works just fine for loading in schools. The problem is with the next copy task which looks at the same XML and tries to copy all the students from all the schools into the students table.

If I set the collection reference to <school> it will only copy in the first student from every school and ignores the rest of the students. If I set the collection reference to <student> it will copy all the students in the first school but ignore the rest of the schools and students.

I'd like to iterate over ALL schools AND students in order to load in all students from all schools but I don't see any easy way to do this. Is there some way to set multiple collection references to both schools and students?

azure-data-factory
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @tnw004-3504 and welcome to Microsoft Q&A.

I feel you difficulty. I took your framework and added some values. I was unable to make it work with Copy Activity. Copy Activity only does one collection reference.

I am confident this can be done with Mapping Data Flow. Would you be willing to use Data Flow? If so, let me know and I'll work out the steps in Data Flow.

I am pondering whether this can be accomplished by using multiple copy activities. I am 60% confident it can be done.

Thank you for your patience. Let me know which path you prefer.
Martin

0 Votes 0 ·

Hi Martin, thanks for your response. I should have included that I am working in gov cloud where, as far as I know, Data Flow is not available yet.

I'm thinking that I'll have to bulk load the XML into SQL and shred it from there instead.

1 Vote 1 ·

1 Answer

MartinJaffer-MSFT avatar image
1 Vote"
MartinJaffer-MSFT answered RenatoMedeiros-5042 commented

@tnw004-3504 I am not sure if this will help, but you can split the shredding, so some is done by data factory, and the rest in your SQL.

I took your sample, and added values to look like:

<schools>
         <school name="school1">
             <students>
                 <student>abc</student>
                 <student>def</student>
                 <student>ghi</student>
             </students>
         </school>
         <school name="school2">
             <students>
                 <student>123</student>
                 <student>456</student>
                 <student>789</student>
             </students>
         </school>
    </schools>


and mapped as
20985-image.png

  • I disabled detect data type in source

and the result is

 "[\"abc\",\"def\",\"ghi\"]","school1",
 "[\"123\",\"456\",\"789\"]","school2",

Then, all you need do is replace \" with " and shred the array.





image.png (105.7 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MartinJaffer-MSFT hello my friend.

I'm facing similar issue like this one, but my case is a bit different ( I ll use the same SCHOOL) example.

When there are more then 1 school, it works fine...it brings all the shcool and copy to SQL DB perfectly...but when I have only 1 school, it doesen t work..the copy activity ignores that file. How could I do that using mapping data flow? Could you please show me? I have no idea :(


1 Vote 1 ·