How to maintain same folder structure as source while sink the processed file

Venkatesh Srinivasan 0 Reputation points
2024-04-26T08:36:30.47+00:00

I have a requirement to process the JSON to parquet on daily basis.

I have folder A,B,C needs to sink the file to another container with same structure as A,B,C for example if I'm processing a file from folder A it should sink to output container folder A same follow for B and C, Need to achieve this in mapping dataflow

User's image

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,350 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,397 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,606 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 9,845 Reputation points Microsoft Vendor
    2024-04-30T11:36:19.9033333+00:00

    @Venkatesh Srinivasan Thank you for sharing the images. Based on the images, it seems that you are using a wildcard path to read the files from the source container. To pass the folder path as a parameter to the Mapping Data Flow, you can create a pipeline parameter and use it in the source dataset's wildcard path.

    Here are the steps to achieve this:

    1. Create a pipeline parameter in your pipeline and name it "FolderPath" (or any other name you prefer).
    2. In the source dataset, replace the folder path with the pipeline parameter using the following expression: @concat('containername/folderpath/', pipeline().parameters.FolderPath, '/*.json')
    3. In the Mapping Data Flow, add a "Derived Column" transformation to your data flow.
    4. In the "Derived Column" transformation, add a new column with the folder path using the following expression: substring(input_file_path(), 1, lastIndexOf(input_file_path(), '/'))
    5. Connect the "Derived Column" transformation to the "Sink" transformation.
    6. In the "Sink" transformation, select the output container and set the file path to the new column you created in the "Derived Column" transformation.

    By doing this, you can pass the folder path as a parameter to the pipeline and use it in the source dataset's wildcard path. The Mapping Data Flow will then use the folder path to create the same folder structure in the output container.

    For more information you can refer the official doc: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions#parameters

    I hope this helps! Please let me know if you have any further questions.

    0 comments No comments