Hi All, can someone give the best possible design for the below problem statement
Problem statement:
we are getting a file with more than 5k records. Max volume is 700k and min 3k.
ADF pipeline needs to call API to process each record.
our current design:
ADF pipe line creating with copy activity to load the records into a staging table and using look up activity (which will pull only 5k records) and foreach record calling API to process the record.
Foreach also has the batch limitation to process the records with min 20 and max 50.
as there is limitation of 5k for look up, we are unable to process more than 5k records. to process these 5k records also taking more time as we have to call API for reach record.
possible given solutions :
we requested source team to send the files by dividing records. for example if they are going to send 20k records, we asked them to send 4 files (test_1_ddmmyyyy.txt, test_2_ddmmyyy.txt etc). each file contains 5k.
we have created 4 triggers and based on the file name we are processing the file in parallel. so that if we get 4 files, 4 instances of pipelines triggers and will do the processing of records in parallel. but when we check the execution time each pipe line took 3 hours.
the up stream said that the max count of records is 65k. so we decided to crate 13 triggers to process those files.
is there any better way to design this solution? in some cases we are going to get the file with volume of 700k (it means they need to divide the files with each 5k and total count is 140k files and 140 triggers). this is seems like crazy volume and design. but we don't see any other solution as our pipe lines has web api calls for each record.
API is not taking multiple record values at a time. so we don't have any other approach.
kindly check and the problem and provide your valuable suggestions.