question

TalZadok-4082 avatar image
0 Votes"
TalZadok-4082 asked AbhayVerma-2987 edited

ADF: copy last modified blob

Hi,
I'd like to copy the last modified blob from an Azure container using the copy activity (to Azure Data Explorer, but that does not relevant for the question :)).
Note: it is possible that N>1 blobs were added since last pipeline run, but am only interested in last modified one.
How can I achieve this?
I was thinking about on of the 2 directions below:
1 - Is it possible to configure copy activity to retrieve last modified in "Source" linked service?
2 - If using "Get Metadata" activity that outputs blob name & modification date, how can I configure Filter activity to filter by modification date and output blob name?

other suggestions are welcome.


Thanks,
Tal

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

KranthiPakala-MSFT avatar image
0 Votes"
KranthiPakala-MSFT answered AbhayVerma-2987 edited

Hi @TalZadok-4082,

Thanks for reaching out. You will have to use GetMetadata activity to get the list of files and then loop through each file to get the last modified date of file, name and load those values to 2 set variable activities (1 to store file modified date and other to store file name which should be used in the actual copy activity to process the file)

To do this first you have to declare two variables - varReferenceDateTime = 1900-01-01 00:00:00 this we take a default value to check if the file date is greater than this value and if yes, then we assign file modified value to this variable, and the other variable is varLatestFileName we leave it empty and once we get the file modified data and condition is passed then we assign the file name value to this variable inside IfCondition activity. The ForEach activity iterates through all the files and after the last iteration is completed, those 2 variables will have the last modified file date and the file name which will be used in Copy activity which is outside of ForEach activity.

86146-image.png

  1. Declare variables -> varReferenceDateTime = 1900-01-01 00:00:00 & varLatestFileName

  2. getListOfFileNames -> Get child items which is nothing but the list of file names

  3. loopThroughAllTheFiles -> ForEach to loop through each file. - > items = @activity('getListOfFileNames').output.childItems, make sure sequential box is checked

  4. Inside ForEach -> getLastModifiedDateOfTheCurrentIterationFile -> to get current iteration file modified date and name (We use Item name & Last Modified arguments)

  5. conditionToCheckIfFileDateGreaterThanSetDate -> If Condition Activity to check if file modified date is greater than varReferenceDateTime. Here is the condition @greater(ticks(activity('getLastModifiedDateOfTheCurrentIterationFile').output.lastModified),ticks(formatDateTime(variables('varReferenceDateTime'))))

  6. If condition passes -> setFileLastModifiedDate - Set variable activity to load the Last modified value of the current file - varReferenceDateTime = @activity('getLastModifiedDateOfTheCurrentIterationFile').output.lastModified

  7. Next we have another set variable activity to load the current file name --> setLatestFileName -> varLatestFileName = @activity('getLastModifiedDateOfTheCurrentIterationFile').output.itemName

  8. Once all the ForEach iterations are completed, at the end the two set variables will have the latest file name and last modified date

  9. Then outside of ForEach, have a subsequent Copy activtiy -> copyLatestFileToDestination - In the source settings of your dataset pass the variable varLatestFileName value to the file name field.

Here is the demonstration GIF:

86149-getlastmodifiedfile.gif

Hope this helps. Do let us know if you have any query.



Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.




image.png (157.7 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@KranthiPakala-MSFT can you please share how to define the dataset inside the second Get Metadata activity (the one within For Each)? it should be the current file in the iteration, just not sure how to do that.

Thanks

0 Votes 0 ·

Hi @TalZadok-4082,

Thanks for getting back.

  1. You have to create a new dataset and a parameter for that dataset (I have created ds_inputFileName) to pass the file name
    87062-image.png

  2. Then use that parameter in the dataset-> Connection-> File path settings -> File name as shown in below GIF.
    87034-image.png

  3. And in GetMetadata activity under Dataset settings map the '@item().name' value (which is nothing but the file name of the current iteration) to ds_inputFileName parameter.
    87014-image.png


Please see below GIF:
87053-germetadataparameterfilename.gif

Hope this clarifies.





1 Vote 1 ·
image.png (19.5 KiB)
image.png (32.8 KiB)
image.png (52.2 KiB)

@KranthiPakala-MSFT
I followed your instructions and was able to copy the last file available in the storage account. thanks

Have 2 questions.

Question 1. how can we copy the last batch(group of files) which were dropped into the container?
1.a: For example, a user dropped 10 csv files at 1 pm est. - how can we copy all of them?
1.b: Another user dropped NEW 10 csv files at 3 pm est - how can we copy the new 10 files dropped at 3 pm EST and skip the ones uploaded at 1 pm EST by the previous user in the source?

Question 2:
how can we automate this process?
can we use add trigger->Trigger now -> Blob Created?


0 Votes 0 ·