question

HarithaMaddi-4180 avatar image
0 Votes"
HarithaMaddi-4180 asked HarithaMaddi-4180 commented

Approach to copy dynamically growing files and containers from one ADLS Gen2 storage account to another

Hi Team,

We are looking at approach to copy data on a daily basis from one ADLS Gen2 storage account to another ADLS Gen2 storage account. Containers in our storage account are dynamically growing and folder structure is not pre-defined. We need to append timestamp to the file name during the copy operation as we need to retain only 30 copies of every file (last 30days).

We consider below approaches but ran into challenges

AzCopy - Recursive file copy allowed but rename operation on files is not permitted

ADF - Limitation on recursive file copy as we need to decide on the sub folders layers initially to trigger copy data activity of files inside foreach loops for containers

Snapshots - It can create the file only in the same storage account but we need to have copy of the file in different storage account

Please suggest the suitable workaround for the same.

Thanks,
Haritha

azure-data-factoryazure-data-lake-storage
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @HarithaMaddi-4180 and welcome to Microsoft Q&A.

There is one more option you have not considered, Azure Data Share. Unfortunately, I don't think it does rename either.

However if we combined that with Event Grid, we could do renaming on the sink, triggered by file creation/copy/write.

That is, use one mechanism to cause the copy to other storage account to happen. Another mechanism to rename. This is working under the assumption that the date-to-append is tightly coupled with the date-of-copy.

0 Votes 0 ·

Thanks @MartinJaffer-MSFT for the response.

I will look at more details using Azure Data Share and Event Grid but I would like to understand on performance using this approach as we have more than 500 files today and they keep increasing in future. Also, we would like to look at the cost as well as we will not be using these components for any other purpose in our eco-system.

Can you please share more insights from these perspectives as well in this approach.

0 Votes 0 ·

@HarithaMaddi-4180 I had another thought, one which does not need more services.

What if instead of adding the date to the file name, we add date to the container name. Each time you want to make a copy, you first create a new container with the date. Then you can use AzCopy to do the recursive into the new container.

This also makes retiring data as simple as deleting a container.

Downstream consumers would not need to worry about changing file names. The date would be in the container name, keeping each day's data in a separate container.

There is no limit on the number of containers you can have.

1 Vote 1 ·
Show more comments

0 Answers