question

pankajchaturvedi-6530 avatar image
0 Votes"
pankajchaturvedi-6530 asked JohnKilleen-7378 commented

Latest file extraction (Azure Data factory)

Hi Team,

I want to extract latest file from Azure data lake store based on last modified date and want to process in another folder.Could you please someone let me know what needs to be done here if anyone has idea or implemented the same.

for example i do have 3 files

"lastModified": "2020-10-22T08:21:53Z",
"fileName": "people.csv",

"lastModified": "2020-10-22T07:51:42Z",
"fileName": "Product.csv",

"lastModified": "2020-10-22T14:48:51Z",
"fileName": "Address.csv",

So it should process only "Address.csv" file.

I am looking forward your response.

Thanks,
Pankaj

azure-data-factoryazure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HimanshuSinha-MSFT avatar image
1 Vote"
HimanshuSinha-MSFT answered pankajchaturvedi-6530 commented

Hello @pankajchaturvedi-6530
Thanks for the ask and also using the Microsoft Q&A.

We have created the below pipeline and our test shows that its should do the trick .

Logic :
1 . We will use 2 of the Getmetadata activity one for iterating the folder which have the files and the other to get the metadata for the specific file
( in this case the lastmodified date). The seond is a paramterized one and will pass the file name to that .

2.We are using an if clause to check the the lastmodified date , I am using the tick function ( as it return an int , if you wish we can use anything else )

@greater(ticks(activity('get file details').output.lastModified),ticks(formatDateTime(variables('TakeAnyStartDate'))))


3.All the variables are self explanatory , but wanted to call out the TakeAnyStartDate , iam using this to set the any value to start with .
I am using this as value "2000-10-27"



Thanks Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members




35581-l1.gif35420-l2.gif


35524-l3.gif



l1.gif (930.7 KiB)
l2.gif (2.5 MiB)
l3.gif (2.7 MiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Could you please someone help me here who has already came across the same kind of requirement.Thanks!

0 Votes 0 ·

Hi Team,

Could you please respond here.Thanks!

0 Votes 0 ·

Thanks a lot ,Himanshu .Given solution worked for me.

0 Votes 0 ·
VaibhavChaudhari avatar image
0 Votes"
VaibhavChaudhari answered pankajchaturvedi-6530 commented
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Vaibhav,

Thanks for your response.
I followed above solution and implemented, but unfortunately not worked for me.
link:https://stackoverflow.com/questions/60558731/get-the-latest-added-file-in-a-folder-azure-data-factory

processing all the files in the target,not the latest one file and the link:https://stackoverflow.com/questions/64374678/only-copying-the-latest-folder-files-for-a-specific-day-with-azure-data-factory. unable to understand what would be the value of both variables(max_date,Current_date).If i am providing the same value as mentioned in this example,it is copying all the files in the Target.

Could you please verify it once and let me know.

Thanks,
Pankaj

0 Votes 0 ·
ashokgupta-8213 avatar image
2 Votes"
ashokgupta-8213 answered JohnKilleen-7378 commented

I have explained this in a detailed video.
please watch.
https://youtu.be/9YU0RiGLijY

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Unfortunately this approach is not much use where there are hundreds of files in the folder. I takes up to 10 seconds to read the Latest modified date for each file so can take hours to find the latest one. There must be a simpler approach. I have tried returning the last modified date in the first Get Metadata task to try and filter by that to reduce the numbers but that fails if there is a folder in by target folder (on FTP site).

0 Votes 0 ·