PowerShell script - copy multiple tables in bulk by using Azure Data Factory

This sample PowerShell script copies data from multiple tables in an Azure SQL database to an Azure SQL data warehouse.

This sample requires the Azure PowerShell module version 5.1.1 or later. Run Get-Module -ListAvailable AzureRM to find the version. If you need to install or upgrade, see Install Azure PowerShell module.

Run Login-AzureRmAccount to create a connection with Azure.

See tutorial: bulk copy for the prerequisites for running this sample.

Sample script

Important

This script creates JSON files that define Data Factory entities (linked service, dataset, and pipeline) on your hard drive in the c:\ folder.

# Set variables with your own values
$resourceGroupName = "<resource group name>"
$dataFactoryName = "<data factory name>" # Name of the data factory must be globally unique
$dataFactoryNameLocation = "East US"

$azureSqlServer = "<azure sql server name>"
$azureSqlServerUser = "<azure sql server user>"
$azureSqlServerUserPassword = "<azure sql server password>"
$azureSqlDatabase = "<source azure sql database name>"
$azureSqlDataWarehouse = "<sink azure sql data warehouse name>"

$azureStorageAccount = "<azure storage accoutn name>"
$azureStorageAccountKey = "<azure storage account key>"

# No need to change values for these variables
$azureSqlDatabaseLinkedService = "AzureSqlDatabaseLinkedService"
$azureSqlDataWarehouseLinkedService = "AzureSqlDataWarehouseLinkedService"
$azureStorageLinkedService = "AzureStorageLinkedService"
$azureSqlDatabaseDataset = "AzureSqlDatabaseDataset"
$azureSqlDataWarehouseDataset = "AzureSqlDataWarehouseDataset"
$IterateAndCopySQLTablesPipeline = "IterateAndCopySQLTables"
$pipelineGetTableListAndTriggerCopyData = "GetTableListAndTriggerCopyData"


# create a resource gorup
New-AzureRmResourceGroup -Name $resourceGroupName -Location $dataFactoryNameLocation

# create a data factory
$df = Set-AzureRmDataFactoryV2 -ResourceGroupName $resourceGroupName -Location $dataFactoryNameLocation -Name $dataFactoryName

# create a linked service for Azure SQL Database (source)
$azureSQLDatabaseLinkedServiceDefinition = @"
{
    "name": "$azureSqlDatabaseLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:$azureSqlServer.database.windows.net,1433;Database=$azureSqlDatabase;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
            }
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2LinkedService command. 
$azureSQLDatabaseLinkedServiceDefinition | Out-File c:\$azureSqlDatabaseLinkedService.json

## Creates an Azure Storage linked service
Set-AzureRmDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseLinkedService -File c:\$azureSqlDatabaseLinkedService.json


# create a linked service for Azure SQL Datawarehouse (sink)
$azureSQLDataWarehouseLinkedServiceDefinition = @"
{
    "name": "$azureSqlDataWarehouseLinkedService",
    "properties": {
        "type": "AzureSqlDW",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:$azureSqlServer.database.windows.net,1433;Database=$azureSqlDataWarehouse;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
            }
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2LinkedService command. 
$azureSQLDataWarehouseLinkedServiceDefinition | Out-File c:\$azureSqlDataWarehouseLinkedService.json

## Creates an linked service for Azure Storage Account. Interim storage to enable PolyBase
Set-AzureRmDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseLinkedService -File c:\$azureSqlDataWarehouseLinkedService.json

$storageLinkedServiceDefinition = @"
{
    "name": "$azureStorageLinkedService",
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": {
                "value": "DefaultEndpointsProtocol=https;AccountName=$azureStorageAccount;AccountKey=$azureStorageAccountKey",
                "type": "SecureString"
            }
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2LinkedService command. 
$storageLinkedServiceDefinition | Out-File c:\$azureStorageLinkedService.json

## Creates an Azure Storage linked service
Set-AzureRmDataFactoryV2LinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureStorageLinkedService -File c:\$azureStorageLinkedService.json


# create the input dataset (Azure SQL Database)
$azureSqlDatabaseDatasetDefiniton = @"
{
    "name": "$azureSqlDatabaseDataset",
    "properties": {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "$azureSqlDatabaseLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "dummy"
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2Dataset command. 
$azureSqlDatabaseDatasetDefiniton | Out-File c:\$azureSqlDatabaseDataset.json

## Create a dataset in the data factory
Set-AzureRmDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseDataset -File "c:\$azureSqlDatabaseDataset.json"


# create the output dataset (Azure SQL Data Warehouse)
$azureSqlDataWarehouseDatasetDefiniton = @"
{
    "name": "$azureSqlDataWarehouseDataset",
    "properties": {
        "type": "AzureSqlDWTable",
        "linkedServiceName": {
            "referenceName": "$azureSqlDataWarehouseLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": {
                "value": "@{dataset().DWTableName}",
                "type": "Expression"
            }
        },
        "parameters":{
            "DWTableName":{
                "type":"String"
            }
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2Dataset command. 
$azureSqlDataWarehouseDatasetDefiniton | Out-File c:\$azureSqlDataWarehouseDataset.json

## Create a dataset in the data factory
Set-AzureRmDataFactoryV2Dataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseDataset -File "c:\$azureSqlDataWarehouseDataset.json"

# Create a pipeline in the data factory that copies data from source SQL Database to sink SQL Data Warehouse
$pipelineDefinition = @"
{
    "name": "$IterateAndCopySQLTablesPipeline",
    "properties": {
        "activities": [
            {
                "name": "IterateSQLTables",
                "type": "ForEach",
                "typeProperties": {
                    "isSequential": "false",
                    "items": {
                        "value": "@pipeline().parameters.tableList",
                        "type": "Expression"
                    },
                    "activities": [
                        {
                            "name": "CopyData",
                            "description": "Copy data from SQL database to SQL DW",
                            "type": "Copy",
                            "inputs": [
                                {
                                    "referenceName": "$azureSqlDatabaseDataset",
                                    "type": "DatasetReference"
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "$azureSqlDataWarehouseDataset",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "DWTableName": "[@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
                                    }
                                }
                            ],
                            "typeProperties": {
                                "source": {
                                    "type": "SqlSource",
                                    "sqlReaderQuery": "SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
                                },
                                "sink": {
                                    "type": "SqlDWSink",
                                    "preCopyScript": "TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]",
                                    "allowPolyBase": true
                                },
                                "enableStaging": true,
                                "stagingSettings": {
                                    "linkedServiceName": {
                                        "referenceName": "$azureStorageLinkedService",
                                        "type": "LinkedServiceReference"
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        ],
        "parameters": {
            "tableList": {
                "type": "Object"
            }
        }
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2Pipeline command. 
$pipelineDefinition | Out-File c:\$IterateAndCopySQLTablesPipeline.json

## Create a pipeline in the data factory
Set-AzureRmDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $IterateAndCopySQLTablesPipeline -File "c:\$IterateAndCopySQLTablesPipeline.json"


# Create a pipeline in the data factory that retrieves a list of tables and invokes the above pipeline for each table to be copied
$pipeline2Definition = @"
{
    "name":"$pipelineGetTableListAndTriggerCopyData",
    "properties":{
        "activities":[
            { 
                "name": "LookupTableList",
                "description": "Retrieve the table list from Azure SQL dataabse",
                "type": "Lookup",
                "typeProperties": {
                    "source": {
                        "type": "SqlSource",
                        "sqlReaderQuery": "SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'SalesLT' and TABLE_NAME <> 'ProductModel'"
                    },
                    "dataset": {
                        "referenceName": "$azureSqlDatabaseDataset",
                        "type": "DatasetReference"
                    },
                    "firstRowOnly": false
                }
            },
            {
                "name": "TriggerCopy",
                "type": "ExecutePipeline",
                "typeProperties": {
                    "parameters": {
                        "tableList": {
                            "value": "@activity('LookupTableList').output.value",
                            "type": "Expression"
                        }
                    },
                    "pipeline": {
                        "referenceName": "$IterateAndCopySQLTablesPipeline",
                        "type": "PipelineReference"
                    },
                    "waitOnCompletion": true
                },
                "dependsOn": [
                    {
                        "activity": "LookupTableList",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ]
            }
        ]
    }
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzureRmDataFactoryV2Pipeline command. 
$pipeline2Definition | Out-File c:\$pipelineGetTableListAndTriggerCopyData.json

## Create a pipeline in the data factory
Set-AzureRmDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $pipelineGetTableListAndTriggerCopyData -File "c:\$pipelineGetTableListAndTriggerCopyData.json"



# Create a pipeline run 

## JSON definition for dummy pipeline parameters
$pipelineParameters = @"
{
    "dummy":  "b"
}
"@

## IMPORTANT: store the JSON definition in a file that will be used by the Invoke-AzureRmDataFactoryV2Pipeline command. 
$pipelineParameters | Out-File c:\PipelineParameters.json

# Create a pipeline run by using parameters
$runId = Invoke-AzureRmDataFactoryV2Pipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName $pipelineGetTableListAndTriggerCopyData -ParameterFile c:\PipelineParameters.json

# Check the pipeline run status until it finishes the copy operation
Start-Sleep -Seconds 30
while ($True) {
    $result = Get-AzureRmDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)

    if (($result | Where-Object { $_.Status -eq "InProgress" } | Measure-Object).count -ne 0) {
        Write-Host "Pipeline run status: In Progress" -foregroundcolor "Yellow"
        Start-Sleep -Seconds 30
    }
    else {
        Write-Host "Pipeline '$pipelineGetTableListAndTriggerCopyData' run finished. Result:" -foregroundcolor "Yellow"
        $result
        break
    }
}

# Get the activity run details 
    $result = Get-AzureRmDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName `
        -PipelineRunId $runId `
        -RunStartedAfter (Get-Date).AddMinutes(-10) `
        -RunStartedBefore (Get-Date).AddMinutes(10) `
        -ErrorAction Stop

    $result

    if ($result.Status -eq "Succeeded") {`
        $result.Output -join "`r`n"`
    }`
    else {`
        $result.Error -join "`r`n"`
    }

# To remove the data factory from the resource gorup
# Remove-AzureRmDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName
# 
# To remove the whole resource group
# Remove-AzureRmResourceGroup  -Name $resourceGroupName

Clean up deployment

After you run the sample script, you can use the following command to remove the resource group and all resources associated with it:

Remove-AzureRmResourceGroup -ResourceGroupName $resourceGroupName

To remove the data factory from the resource group, run the following command:

Remove-AzureRmDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName

Script explanation

This script uses the following commands:

Command Notes
New-AzureRmResourceGroup Creates a resource group in which all resources are stored.
Set-AzureRmDataFactoryV2 Create a data factory.
Set-AzureRmDataFactoryV2LinkedService Creates a linked service in the data factory. A linked service links a data store or compute to a data factory.
Set-AzureRmDataFactoryV2Dataset Creates a dataset in the data factory. A dataset represents input/output for an activity in a pipeline.
Set-AzureRmDataFactoryV2Pipeline Creates a pipeline in the data factory. A pipeline contains one or more activities that performs a certain operation. In this pipeline, a copy activity copies data from one location to another location in an Azure Blob Storage.
Invoke-AzureRmDataFactoryV2Pipeline Creates a run for the pipeline. In other words, runs the pipeline.
Get-AzureRmDataFactoryV2ActivityRun Gets details about the run of the activity (activity run) in the pipeline.
Remove-AzureRmResourceGroup Deletes a resource group including all nested resources.

Next steps

For more information on the Azure PowerShell, see Azure PowerShell documentation.

Additional Azure Data Factory PowerShell script samples can be found in the Azure Data Factory PowerShell scripts.