question

AbdullaMahammadKhanDawood-5577 avatar image
0 Votes"
AbdullaMahammadKhanDawood-5577 asked KranthiPakala-MSFT rolled back

Azure Data Factory- We have requirement to copy new and modified files incrementally from a folder and select specific type of files based on starting with specific file name into ADLS Gen2

We have requirement to copy new and modified files incrementally from a ADLS folder and also select specific files based on starting with specific file name into ADLS Gen2.

There are 2 different file names are there

BackendREQ_
BackendRESP_

Here we need to copy only BackendRESP_ files with incremental approach which are newly created and last modified.

Can someone please help us in writing Data Factory ingestion pipeline, here source and sink both are ADLS Gen2 and file format is JSON.102058-sourcejsonfiles.jpg



Thank you in anticipation!!

Regards,
Mahammad Khan

azure-data-factory
sourcejsonfiles.jpg (61.3 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

KranthiPakala-MSFT avatar image
0 Votes"
KranthiPakala-MSFT answered KranthiPakala-MSFT rolled back

Hi @AbdullaMahammadKhanDawood-5577,

Thanks for using Microsoft Q&A forum and posting your query.


You can achieve this by using Storage Events Trigger in ADF. This will trigger the linked/associated pipeline when a new blob or an existing blob is modified.

102197-image.png

Please refer to this doc for more info: Create a trigger that runs a pipeline in response to a storage event

Demonstration Video : Here is a demonstration video on event triggers in ADF - Event based Triggers in Azure Data Factory


Hope this helps. Do let us know if you have further query.



Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.



image.png (45.8 KiB)
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @AbdullaMahammadKhanDawood-5577,

Just checking in to see if the above suggestion was helpful. If it answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community
members reading this thread. And, if you have any further query do let us know.

0 Votes 0 ·

Hi Kranthi,

I tried to create the pipeline same as sugessted by creating Storage Event Based Trigger assigning to the Pipeline having copy activity, However after publishing the trigger what I observed is whenever blob is created in ADLS pipeline is get triggered but it is failing due to below reason.

Operation on target Copy data1 failed: ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'. Account: 'dataplatformadlsdevqa'. FileSystem: 'thirdparty'. Path: 'thirdparty/AUTOREPORT/BackendRESP_VeensflCm4nssd3j2LSXfg=='. ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. RequestId: 'e8a2ff6b-901f-008d-528e-5b0c75000000'.

0 Votes 0 ·

Hi Kranthi,

Please find the pipeline, Trigger json file enclosed and jpg file of blob created in ADLS Gen2 after publishing the Storage Event Trigger.

103052-pl-storageevent.txt
102998-blobcreated.jpg




0 Votes 0 ·
KranthiPakala-MSFT avatar image KranthiPakala-MSFT AbdullaMahammadKhanDawood-5577 ·

Hi @AbdullaMahammadKhanDawood-5577 ,

Looks like something wrong with the sourceFolder path value passed from your trigger. Could you please check the folder path from parameters value in Monitor hub as shown below, please check if a valid path is being passed:

103498-image.png


Workaround 1: To avoid this issue is please use Blob path ends with as .jpg in trigger configuration and check if the blobs are listed as expected.

Workaround 2: Instead of using the trigger parameter in your dataset folder path, please try hardcoding it and just pass file name parameter from trigger to pipeline parameter and then to dataset parameter. This should help avoid the issue.

I have tested this implementation and working fine.

103573-image.png

Since you are processing .jpg files please use binary dataset on source and sink.
Do let us know how it goes


0 Votes 0 ·
image.png (102.7 KiB)
image.png (26.5 KiB)
Show more comments