question

amikm avatar image
0 Votes"
amikm asked NandanHegde-7720 answered

Creating a dependency pipeline to check file is latest in ADF

I am trying to create a dependency pipeline for files before executing my model refresh (web activity) I want to make sure all the related files are there in their respective folders and all files are latest.

Suppose, my model refreshes uses the following file present in adls-
1. myadls/raw/master/file1.csv
2. myadls/raw/dim/file2.csv
3. myadls/raw/dim/file3.csv
4. myadls/master/reporting/file4.csv

We need to compare the files last modified with today's date. If both are equal then files are the latest. If any of the files is not the latest then I need an email with the file name that is not the latest and I shouldn't trigger my web activity which usually does model refresh.

I have created this pipeline using get metadata, for each activity, If-condition, web activity, and Set variable activity. But the problem is I am not able to get an email for the file which is not the latest. Can anyone help me to re-design my below pipeline so that I can get an email for the file which is not the latest file?

Please find my current design for the dependency pipeline. In the last, If condition in a true activity I am running a web activity that does model refresh and In false activity, I have created another web activity that says one of the files is not the lastest so model refresh can't be done. But It's not able to tell which file it is.

117957-adf.png


azure-data-factory
adf.png (33.9 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hey,
Are the number of files and their names fixed in your requirement ?

0 Votes 0 ·
amikm avatar image amikm NandanHegde-7720 ·

Hi, filenames are fixed. But over the period of time, we can expect more files in those folders and subfolders. As these folders/subfolders have other files as well and we always look for the same file name and his last modified date, If the last modified date is not equal to today's date then files present as part of the first flow should send email as file1.csv is not latest

If files present in the second flow or third flow are not the latest, it should send an email saying files are not the latest. Or it can tell what are files which are not the latest.

If you see the second For-each activity, I am using Get-metadata, and If condition and under IF-condition I am setting the Set variable value as True. Same with the Second and third flow.

And, If you see the last If-condition Activity I am checking all the Set variable values as True with and condition. If all values are true then the True section will execute else False section is sending emails like one of the files is not the latest.

But, I am trying to get all file names that are not the latest

0 Votes 0 ·
HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered amikm edited

Hello @amikm ,
Thanks for the ask and using the Microsoft Q&A platform .

I think the ask is how we can send an email from ADF . ADF by itself does not have any activity which you can use . You can implement the same using Logic apps .

https://www.mssqltips.com/sqlservertip/5718/azure-data-factory-pipeline-email-notification-part-1/

One other "Not a clean way " Is to fail the pipeline itself and this should trigger a pipeline failure email . I think you can use a web activity with an no existance URL and this will make the activity fail .

Please do let me know how it goes .

Thanks
Himanshu


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @HimanshuSinha-MSFT, my question is specific to my scenario, where I want to get an email if the file is not the latest. Suppose in my above file list, file1.csv is not the latest. So, it should say the file is not the latest. But the problem is files are present in different folders/sub-folders. Please see my current design (screenshot). My question is more specific to the scenario than a technical question.

I am not able to configure the logic, like if my first flow where I am looking for file1.csv is not the latest. It should send an email like file1.csv is not latest, but the way I designed, in last If -condition true activity is checking all files variable, If all file variables are true then it is executing the web activity for a model refresh. Here is the false condition I have added a web activity that sends an email saying one of the files is not the latest.

I think the problem is in the second activity "For-each" activity which loops through all the files present in raw/master folders. I want a proper solution here itself.

Not sure, If the same scenario can be achieved through ADF pipelines

0 Votes 0 ·
NandanHegde-7720 avatar image
0 Votes"
NandanHegde-7720 answered

Hey,
As per your current arch ,you can create variables per foreach activity that would store the file name .
So within foreach activity, in case if the file is not latest using append variable activity
you can save all file names.
and then in the final validation, you can concat all for each loop variables to have the final list of files that are not modified.

But ideally I would suggest the below approach :
1) Have the list of files created as a lookup activity output.
2) Provide that to a single foreach activity in sequential execution.
3) within foreach via IF activity and getmeta data activity, check whether the file is latest or not.
If not via append variable activity append the file name.
4) Once out of foreach, via If condition check whether the file name variable is blank or has some values.
If it has values, then you can send an email and the filename variable has all the non updated file names

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.