Azure data factory - Copy filterred bulk data from one container to another container in Azure blob storage

Kumar, Dinesh 26 Reputation points
2022-04-07T20:12:27.64+00:00

Hello,

I would like to create Azure data factory pipeline to copy the filtered bulk data from one container to another container in the same Azure blob storage.

Here are the high level details:

  1. In Excel file one of the column contains list of some 600+ random number pattern Eg. 123, 345, 444, etc...
  2. In pipeline, I used 'Lookup' activity to load and store the number pattern column data from Excel into an Array variable.
  3. Now I have to copy the bulk data from one of the source container (which stored millions of the files) and filter the files name which only 'contains' the number pattern matching (stored in Array step#1 & #2) and copy the matching files to another container in Azure blob storage.

I created the "Metadata" activity to copy the files using 'Copy Data' activity but unable to filter out the files which contains the number stored in the Array Variable.

Can you please assist and advise on how to filter the files which matched the pattern stored into Array ?

Example:
Excel File/Array with some number:
123
345
444
809

Source Container [with millions of files] :
00180_345_20_1106_41111.html
00180_555_20_1106_41111.html
00180_867_20_1103_41111.html
00180_809_20_1103_41111.html

Destination Container [should only contains the files which matches the number pattern listed in Excel/Array]:
00180_345_20_1106_41111.html
00180_809_20_1103_41111.html

-Thank you

Azure Notification Hubs
Azure Notification Hubs
An Azure service that is used to send push notifications to all major platforms from the cloud or on-premises environments.
264 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,436 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,599 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 31,061 Reputation points Microsoft Employee
    2022-04-11T15:51:26.217+00:00

    Hi @Kumar, Dinesh ,
    Welcome to Microsoft Q&A platform and thankyou for posting your query.

    As I understand your requirement, it seems you want to copy only those files whose filename contains the number present in an excel sheet. Please correct me if my understanding is wrong.

    To achieve this , you need to use two loops one inside another : One to iterate through the list of files present in the storage account and another to iterate through the filenames present in the excel sheet. As it's not possible to use foreach inside foreach, we need to create two pipelines and inside foreach block of first pipeline, we need to call another pipeline using execute pipeline activity and use another foreach block in the second pipeline.

    Step1 : Use LookUp activity to point to the Excel containing randomNumbers.
    Step2: Use Foreach activity with Items as @activity('Lookup1').output.value .
    Step3: Create another pipeline, inside of which create a parameter Number .
    Step4: Inside Foreach of First pipeline , call Execute pipeline Activity to invoke second pipeline where we need to pass the contents of the excel sheet as parameter Number=@item().Prop_0
    Step5: Use GetMetadata activity in the second pipeline where dataset is pointing to the container/Folder in the storage account containing all the files. Select FieldList as 'Child Items'
    Step6: Use Foreach block with Items as @activity('Get Metadata1').output.childItems
    Step7: Inside Foreach, use an If block with condition as @contains(item().name,pipeline().parameters.Number) .
    Step8: Inside If block, Add a CopyData activity where in source settings, use a dataset pointing to the container/Folder which contains the files . However,Select File Path Type as 'Wildcard file path' and provide file name as @item().name
    Step9: In the sink , provide the target folder path. Execute the pipeline

    Post running the pipeline , the expected result is only those files would be copied whose name contains randomNumber present in the excel sheet.

    Please refer to the gif below for your reference:

    191877-filterfilesbasedonname.gif

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful