How to maintain same folder structure as source while sink the processed file

Question

I have a requirement to process the JSON to parquet on daily basis.

I have folder A,B,C needs to sink the file to another container with same structure as A,B,C for example if I'm processing a file from folder A it should sink to output container folder A same follow for B and C, Need to achieve this in mapping dataflow

User's image

Answer

@Venkatesh Srinivasan Thank you for sharing the images. Based on the images, it seems that you are using a wildcard path to read the files from the source container. To pass the folder path as a parameter to the Mapping Data Flow, you can create a pipeline parameter and use it in the source dataset's wildcard path.

Here are the steps to achieve this:

Create a pipeline parameter in your pipeline and name it "FolderPath" (or any other name you prefer).
In the source dataset, replace the folder path with the pipeline parameter using the following expression: @concat('containername/folderpath/', pipeline().parameters.FolderPath, '/*.json')
In the Mapping Data Flow, add a "Derived Column" transformation to your data flow.
In the "Derived Column" transformation, add a new column with the folder path using the following expression: substring(input_file_path(), 1, lastIndexOf(input_file_path(), '/'))
Connect the "Derived Column" transformation to the "Sink" transformation.
In the "Sink" transformation, select the output container and set the file path to the new column you created in the "Derived Column" transformation.

By doing this, you can pass the folder path as a parameter to the pipeline and use it in the source dataset's wildcard path. The Mapping Data Flow will then use the folder path to create the same folder structure in the output container.

For more information you can refer the official doc: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions#parameters

I hope this helps! Please let me know if you have any further questions.

How to maintain same folder structure as source while sink the processed file

1 answer