I think this question may have come up before but I have not been able to locate a clear or clean answer which properly addresses the situation. Given the likelihood for how common this operation must be I can't help but feel that there must be a good way to do this without layers of complexity.
Scenario:
I have a storage account container into which an application is sending data files at varying rates. If the application is experiencing demand it may push a data file into that container once a minute. If the application is not in demand the data files may be written one every couple of minutes. In either extreme for the scenario the metadata for the file is the source of truth: the latest file, the one I need to process, has the most up to date last modified time in the meta data.
The patterns I see for 'getting' the latest file all require you to get the metadata.childitems of the container, then link that to a foreach which then gets the metadata for each file in turn, this time you can get at the last modified property. It is also possible to filter that with the filter by last modified 'Start time' and 'End time'.
This pattern does not seem to work for me because if the application is busy and I am using that filter to return files from E.g. the last two minutes I may well get back two or even three files. I need to use the last couple of minutes as a window because if the application has not been busy then I need that window to catch the last single file (which might have been two minutes ago).
I thought that the filter activity, would have provided a 'sort' function to allow me to at least sort the files that I get back (from the two minute window) then select or filter-in the latest one, but that does not seem to be possible.
Fundamentally, all I am looking for is a solid pattern for getting the single newest file in that container - surely that is a common use case?
At the moment, I am thinking it may be necessary for the pipeline to call a function app which does the work of locating the file I need, then renaming it. That way all Datafactory would need is a dataset pointing to that filename - but I can't help but feel accomplishing something this simple should be do-able in Datafactory itself without having to make a function app (just for that).
I would be very grateful for any suggestions on how this can be accomplished.