Hi,
We have been working on a use case where we would need to process files sequentially using ADF. Sharing the context and overview of the use case as below
Some business process would end up generating and pushing files to the storage account, number of files and the size could vary and these files needs to be processed in a sequence of their arrival and that's where ADF comes into the picture to apply transformations before pushing content to the sink.
To elaborate more - Assume that the files below would arrive in below sequence
file1.csv, file2.csv, file3.csv, file4.csv and file5.csv
So file1.csv needs to be processed first before file2.csv processing starts and so on.. Also, the sequencing has to be maintained in case of failures. E.g. if file1.csv and file2.csv were processed fine and an error is encountered while processing file3.csv, then the pipeline should stop. When rectified file3.csv is uploaded by the upstream process again, the pipeline needs to be executed to process only pending files with original sequence i.e. updated file3.csv, file4.csv and file5.csv and that's where the problem lies as we do not see any orchestration built in to ADF as a platform to handle such scenario.
Currently, we are leveraging an option of blob triggers and can see that the processing ADF pipeline getting triggered multiple times as multiple files are being dropped into the storage account, but have to write a lot of custom logic to ensure sequence is maintained in some persistent storage, look it up every time as well as to handle failure and re-runs to honor original sequence (as explained above).
Looking for inputs if there is any better way to handle such orchestrations in ADF? Is there anything in mapping data flows that can be leverage to address the original problem and could help in processing files in a sequence?
Thanks in advance.