question

bhushangawale avatar image
1 Vote"
bhushangawale asked ShreyashChoudhary-4441 published

Sequential file processing in ADF

Hi,

We have been working on a use case where we would need to process files sequentially using ADF. Sharing the context and overview of the use case as below

Some business process would end up generating and pushing files to the storage account, number of files and the size could vary and these files needs to be processed in a sequence of their arrival and that's where ADF comes into the picture to apply transformations before pushing content to the sink.

To elaborate more - Assume that the files below would arrive in below sequence
file1.csv, file2.csv, file3.csv, file4.csv and file5.csv

So file1.csv needs to be processed first before file2.csv processing starts and so on.. Also, the sequencing has to be maintained in case of failures. E.g. if file1.csv and file2.csv were processed fine and an error is encountered while processing file3.csv, then the pipeline should stop. When rectified file3.csv is uploaded by the upstream process again, the pipeline needs to be executed to process only pending files with original sequence i.e. updated file3.csv, file4.csv and file5.csv and that's where the problem lies as we do not see any orchestration built in to ADF as a platform to handle such scenario.

Currently, we are leveraging an option of blob triggers and can see that the processing ADF pipeline getting triggered multiple times as multiple files are being dropped into the storage account, but have to write a lot of custom logic to ensure sequence is maintained in some persistent storage, look it up every time as well as to handle failure and re-runs to honor original sequence (as explained above).

Looking for inputs if there is any better way to handle such orchestrations in ADF? Is there anything in mapping data flows that can be leverage to address the original problem and could help in processing files in a sequence?

Thanks in advance.

azure-data-factory
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @bhushangawale,

Just checking in to see if the below suggestion from Nandan was helpful. If it answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

0 Votes 0 ·

Hi @bhushangawale,

We still have not heard back from you. Just wanted to check if the below suggestion was helpful? If it answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

0 Votes 0 ·

Not able to implement this, I have the same scenario, i saw this article

https://docs.microsoft.com/en-us/answers/questions/136815/lates-file-extraction-azure-data-factory.html

Would be great if you can explain like this. in one execution it should process all files in what order they arrived like file (1pm),file(2pm),file(3pm),file(4pm)
it should process file which come in 1pm first and 4pm file in last

@HimanshuSinha-MSFT , @KranthiPakala-MSFT , @ShaikMaheer-MSFT can you implement this scenario and explain ,please reply

0 Votes 0 ·

1 Answer

NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 answered ShreyashChoudhary-4441 edited

Hey,
You can use a combination of Get meta data activity and foreach to achieve your goal.
Via Get meta data activity, you can map you folder and the list of child items would act as the input for the foreach activity (for which we can map sequential execution)
Since Foreach would proceed for all iterations irrespective a failed one, you can add a variable /custom logic at the beginning of foreach to validate if the previous file was processed properly or not)
And at the end of foreach , you can have a file archival logic.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.