question

pfiadeiro avatar image
0 Votes"
pfiadeiro asked pfiadeiro commented

Copy activity data lake merge copy behavior

Hi,

I have a question regarding the copy behavior option merge when it comes to use data lake as a sink in a copy activity.

Let's assume we have the following 2 csv files

File 1:
1, John
2, Sarah

File 2:
3, Bob
4, Janet

If I pick these 2 files as my source and try to merge them into a file named FinalFile.csv, I understand that the rows may not be in a sequential order. For example, having something like:

1, John
3, Bob
2, Sarah
4, Janet

My understanding is that the merge behavior doesn't really guarantee that files are merged in a sequential manner and records may be mixed.

Let's now say that I'm picking File2 as my source and File1 as my sink. I'd kind of expect that the final output would be:

1, John
2, Sarah
3, Bob
4, Janet

since we're using a file with content already in it and just merging one source file. However, it does seem that only rows from File2 will be there and rows that were on the file are gone. Basically, it seems that it does an overwrite and not an append.

Question: based on what I mentioned above, is there any way to guarantee the order of the records using the copy activity? I'm aware that I could potentially use a data flow or an Azure Function to achieve this but don't really want to go down that route unless there's no other option. I guess one potential option (which I haven't tried yet) is to provide a file with a list of file to the source of the copy activity and set the degree of parallelism to 1, hoping it would merge files by the order they're in the file with the filenames. However, not really keen on this option.

Thanks
Pedro



azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SaurabhSharma-msft avatar image
0 Votes"
SaurabhSharma-msft answered pfiadeiro commented

Hi @pfiadeiro,

Thanks for using Microsoft Q&A !!
This is unfortunately not possible with copy activity and you may need to go through Azure Data flow/Azure Function to sort the records before writing to sink. Also, you could try your copy activity with parallelism if you know the file will be picked in order along with the sorted records.

Thanks
Saurabh

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @SaurabhSharma-msft

Thanks for confirming this, I wanted to make sure I wasn't missing any possible option and that what I mentioned are the available alternatives.

Cheers,
Pedro

0 Votes 0 ·