Copying data from Amazon S3 to Azure blob storage.

Vinay5 46 Reputation points
2021-03-17T14:30:35.477+00:00

Hello ,

I have a requirement of copying S3 data to Azure blob.
The folder structure in Amazon S3 is as below.
Year folder📁- Month folder 📁- Day folder📁- Date folder📁 ( Ex: 2018-03-10). So there is data from 2017 to 2021 for almost every day. I should be creating a ADF pipeline which copies the data to Azure based on the below scenarios.
1.loading data between two dates. (Ex - 2018-10-10 to 2018-12-10).
2 Once the history data is loaded, I should implement a logic for the delta loading. The data comes to S3 daily.

Could some please assist on achieving the above.

Thank you in advance.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,449 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,644 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,381 Reputation points Microsoft Employee
    2021-03-18T21:22:01.763+00:00

    Hello @Vinay5 ,
    Thanks for the ask and using the Microsoft Q&A platform .

    1.loading data between two dates. (Ex - 2018-10-10 to 2018-12-10).

    Since the dates in you are case are known , I will try to create a array and add the date in that ( I used excel , but you can use any text editor) , then use a for each loop ( FE loop ) and inside that I will add a copy activity with a parameterized dataset . The parameter which we will pass is the is the date element from the array .
    Below animation and snapshot should help .

    79257-s3-issue.png

    79318-image.png

    1. When you say 'Delta loading " , I am assuming that you want to copy the files every day , on a fixed schedule . If that case then you use the same pipeline , remove array and FE each and create a the date like 2021-03-18 when dynamic expression with the date expression . You will have to use the dataset paramterized .

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members


  2. Vinay5 46 Reputation points
    2021-03-19T09:42:26.843+00:00

    Hi Himanshu,

    Thanks for your answer. While I was waiting for the reply to my question, I tried the logic given in the below link and it worked.

    https://learn.microsoft.com/en-us/answers/questions/162895/adf-pipeline-to-increment-the-date-back-to-six-mon.html

    However, I got stuck at delta loading.
    The files in S3 bucket gets added randomly everyday and I have to copy them to Azure as in when they arrive.
    Could you please elaborate on how to achieve this requires.