question

MariyaSusnerwala-3039 avatar image
0 Votes"
MariyaSusnerwala-3039 asked SaurabhSharma-msft commented

Copy files of different formats in different folders based using Azure Data Factory

I am new to Azure Data Factory and I am trying to solve a particular use case. I have to copy files from the source folder to the target folder both of which are in the same storage account. The files in the source folder are of different format (csv, txt, xml) and have date appended at the end, eg: addresses_2020-11-01.csv (date format: yyyy-mm-dd)

I have to create a pipeline that will sort and store files in the dynamic folders in this hierarchy: ex: csv->yyyy->mm->dd. My understanding is first I have to filter the files into different formats and then use the split function to split the substring where there is _ and then dynamically create the folder based on the year, month, and day in the filename. Below is the screenshot of the pipeline that I have created so far: I am not able to display the screenshot but the link opens the screenshot.


![78879-image.png][1]


What I have done:

Use Get Metadata to extract childitems
Filter the output from Get Metadata into csv, txt, and xml files
Use For each activity that contains a Copy activity. This activity copies files from filter activity into respective folders (csv, txt..) since the wildcard contains .txt, .csv, *.xml
I am not sure what is the correct way to move forward once the files are filtered so that dynamic folders are created based on the dates in the filename. I think I need to use set Variable activity along with copy activity but not sure how to accomplish this. Any help will be appreciated.

Thanks!!

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 answered MariyaSusnerwala-3039 commented

Hey
@MariyaSusnerwala-3039

In case it is just file transfer as is, then use the binary file format across source and sink.
You can use Getmeta data activity to get the child items and pass it to a foreach activity which would contain the copy activity.
you can make the sink dataset dynamic thereby allowing to provide names as per needed

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Nandan,

Thanks for your quick response. I am doing exactly what you are saying. My pipeline:
Get Metadata->Filter files based on format ->For each loop which contains Copy activity.

79218-screenshot-507.png


I am stuck at the copy activity where I think I need to use the split function to split the substring so that the yyyy-mm-dd is extracted and folder hierarchy is created and then I want to store that file in that folder.
79180-screenshot-509.png

I understand that I will need to use concat to create folder hierarchy and then a substring split function to get the date but I am not able to write that expression with much success. It will be very helpful if you can provide me an example of concat and split function that would allow me to accomplish this.

Thanks a lot for your help!


0 Votes 0 ·
screenshot-507.png (82.9 KiB)
screenshot-509.png (118.0 KiB)
NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 answered SaurabhSharma-msft commented

Hey @MariyaSusnerwala-3039,
Sorry for the delayed response.

You can use the below logic to achieve the same:

 @concat(split(pipeline().parameters.FileName,'.')[1],'/',split(split(pipeline().parameters.FileName,'_')[1],'-')[0],'/',split(split(pipeline().parameters.FileName,'_')[1],'-')[1],'/',split(split(split(pipeline().parameters.FileName,'_')[1],'-')[2],'.')[0])

You can replace the pipeline().parameters with item().filename as it would be within for activity

79946-res.png



res.png (5.6 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey @MariyaSusnerwala-3039,
Did it resolve your query?

0 Votes 0 ·

@mariyasusnerwala-3039 Following up as we have not heard back from you.

0 Votes 0 ·