How to parameterize a dataset so I can use multiple excel files in a blob folder to pass to PowerQuery one by one?

Mayank Dhami 20 Reputation points
2024-05-09T08:24:14.3866667+00:00

I am in Data factory and I need help to parameterize a dataset so I can use multiple files in a blob folder to process in PowerQuery. I do not want to make 100 datasets if I want to process the 100 excel files in the same way. I have successfully executed a pipeline as follows:

Get Metadata activity to read all files in a blob folder (.xlsx), then use a for each loop to get each file and inside that an activity to copy it to an Azure SQL DB.

Now I want to see if I can do following:

Get a list of files in a blob folder or get full blob addresses of files, and then use a for loop to try to pass one file/file address (per loop turn) and process it through PowerQuery.

Is that even possible right now? Really stuck with this. I want to process data in PowerQuery before loading it to a sink basically.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,793 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 16,856 Reputation points
    2024-05-09T20:21:08.5066667+00:00

    You need to start by configuring a Get Metadata Activity to list all files in the blob folder, selecting Child Items in the field list.

    Next, use a ForEach Activity to iterate over the files, setting the Items property to @activity('GetMetadataActivity').output.childItems.

    Create a parameter named FilePath in the dataset pointing to the blob storage and adjust the file path to utilize this parameter (e.g., @dataset().FilePath).

    Inside the ForEach activity, add a Copy Data Activity and configure its source to use the parameterized dataset, passing @item().name to the FilePath parameter.

    Enable Power Query within the Copy Data activity Source tab and design the necessary transformations.

    Finally, configure the sink to point to your Azure SQL Database, ensuring proper column mapping. This setup allows each file to be processed through Power Query and loaded into the Azure SQL Database without creating multiple datasets, streamlining the data integration process.

    Here is how your pipeline JSON might look for the Copy Data activity:

    {
        "name": "CopyDataActivity",
        "type": "Copy",
        "inputs": [
            {
                "name": "ParameterizedBlobDataset",
                "parameters": {
                    "FilePath": "@item().name"
                }
            }
        ],
        "outputs": [
            {
                "name": "AzureSqlDatabaseDataset"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "JsonSource",
                "storeSettings": {
                    "type": "AzureBlobStorageReadSettings"
                }
            },
            "sink": {
                "type": "SqlSink"
            }
        }
    }
    
    

0 additional answers

Sort by: Most helpful