Skrip PowerShell - menyalin beberapa tabel secara massal dengan menggunakan Microsoft Azure Data Factory
Contoh skrip PowerShell ini menyalin data dari beberapa tabel dalam Microsoft Azure SQL Database ke Microsoft Azure Synapse Analytics.
Catatan
Artikel ini menggunakan modul Azure Az PowerShell, yang merupakan modul PowerShell yang direkomendasikan untuk berinteraksi dengan Azure. Untuk mulai menggunakan modul Az PowerShell, lihat Menginstal Azure PowerShell. Untuk mempelajari cara bermigrasi ke modul Az PowerShell, lihat Memigrasikan Azure PowerShell dari AzureRM ke Az.
Sampel ini memerlukan Azure PowerShell. Jalankan Get-Module -ListAvailable Az untuk menemukan versinya.
Jika Anda perlu menginstal atau meningkatkan, lihat Menginstal modul Azure PowerShell.
Jalankan cmdlet Connect-AzAccount untuk tersambung ke Azure.
Lihat tutorial: penyalinan bertahap untuk prasyarat menjalankan contoh ini.
Sampel skrip
Penting
Skrip ini membuat file JSON yang menentukan entitas Azure Data Factory (layanan tertaut, himpunan data, dan alur) pada hard drive Anda di folder c:\.
# Set variables with your own values
$resourceGroupName = "<resource group name>"
$dataFactoryName = "<data factory name>" # Name of the data factory must be globally unique
$dataFactoryNameLocation = "East US"
$azureSqlServer = "<azure sql server name>"
$azureSqlServerUser = "<azure sql server user>"
$azureSqlServerUserPassword = "<azure sql server password>"
$azureSqlDatabase = "<source azure sql database name>"
$azureSqlDataWarehouse = "<sink azure sql data warehouse name>"
$azureStorageAccount = "<Az.Storage accoutn name>"
$azureStorageAccountKey = "<Az.Storage account key>"
# No need to change values for these variables
$azureSqlDatabaseLinkedService = "AzureSqlDatabaseLinkedService"
$azureSqlDataWarehouseLinkedService = "AzureSqlDataWarehouseLinkedService"
$azureStorageLinkedService = "AzureStorageLinkedService"
$azureSqlDatabaseDataset = "AzureSqlDatabaseDataset"
$azureSqlDataWarehouseDataset = "AzureSqlDataWarehouseDataset"
$IterateAndCopySQLTablesPipeline = "IterateAndCopySQLTables"
$pipelineGetTableListAndTriggerCopyData = "GetTableListAndTriggerCopyData"
# create a resource gorup
New-AzResourceGroup -Name $resourceGroupName -Location $dataFactoryNameLocation
# create a data factory
$df = Set-AzDataFactory -ResourceGroupName $resourceGroupName -Location $dataFactoryNameLocation -Name $dataFactoryName
# create a linked service for Azure SQL Database (source)
$azureSQLDatabaseLinkedServiceDefinition = @"
{
"name": "$azureSqlDatabaseLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:$azureSqlServer.database.windows.net,1433;Database=$azureSqlDatabase;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$azureSQLDatabaseLinkedServiceDefinition | Out-File c:\$azureSqlDatabaseLinkedService.json
## Creates an Az.Storage linked service
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseLinkedService -File c:\$azureSqlDatabaseLinkedService.json
# create a linked service for Azure SQL Datawarehouse (sink)
$azureSQLDataWarehouseLinkedServiceDefinition = @"
{
"name": "$azureSqlDataWarehouseLinkedService",
"properties": {
"type": "AzureSqlDW",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Server=tcp:$azureSqlServer.database.windows.net,1433;Database=$azureSqlDataWarehouse;User ID=$azureSqlServerUser;Password=$azureSqlServerUserPassword;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$azureSQLDataWarehouseLinkedServiceDefinition | Out-File c:\$azureSqlDataWarehouseLinkedService.json
## Creates an linked service for Az.Storage Account. Interim storage to enable PolyBase
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseLinkedService -File c:\$azureSqlDataWarehouseLinkedService.json
$storageLinkedServiceDefinition = @"
{
"name": "$azureStorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": {
"value": "DefaultEndpointsProtocol=https;AccountName=$azureStorageAccount;AccountKey=$azureStorageAccountKey",
"type": "SecureString"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryLinkedService command.
$storageLinkedServiceDefinition | Out-File c:\$azureStorageLinkedService.json
## Creates an Az.Storage linked service
Set-AzDataFactoryLinkedService -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureStorageLinkedService -File c:\$azureStorageLinkedService.json
# create the input dataset (Azure SQL Database)
$azureSqlDatabaseDatasetDefiniton = @"
{
"name": "$azureSqlDatabaseDataset",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": {
"referenceName": "$azureSqlDatabaseLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": "dummy"
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command.
$azureSqlDatabaseDatasetDefiniton | Out-File c:\$azureSqlDatabaseDataset.json
## Create a dataset in the data factory
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDatabaseDataset -File "c:\$azureSqlDatabaseDataset.json"
# create the output dataset (Azure SQL Data Warehouse)
$azureSqlDataWarehouseDatasetDefiniton = @"
{
"name": "$azureSqlDataWarehouseDataset",
"properties": {
"type": "AzureSqlDWTable",
"linkedServiceName": {
"referenceName": "$azureSqlDataWarehouseLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"tableName": {
"value": "@{dataset().DWTableName}",
"type": "Expression"
}
},
"parameters":{
"DWTableName":{
"type":"String"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryDataset command.
$azureSqlDataWarehouseDatasetDefiniton | Out-File c:\$azureSqlDataWarehouseDataset.json
## Create a dataset in the data factory
Set-AzDataFactoryDataset -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $azureSqlDataWarehouseDataset -File "c:\$azureSqlDataWarehouseDataset.json"
# Create a pipeline in the data factory that copies data from source SQL Database to sink SQL Data Warehouse
$pipelineDefinition = @"
{
"name": "$IterateAndCopySQLTablesPipeline",
"properties": {
"activities": [
{
"name": "IterateSQLTables",
"type": "ForEach",
"typeProperties": {
"isSequential": "false",
"items": {
"value": "@pipeline().parameters.tableList",
"type": "Expression"
},
"activities": [
{
"name": "CopyData",
"description": "Copy data from SQL database to SQL DW",
"type": "Copy",
"inputs": [
{
"referenceName": "$azureSqlDatabaseDataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "$azureSqlDataWarehouseDataset",
"type": "DatasetReference",
"parameters": {
"DWTableName": "[@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
}
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]"
},
"sink": {
"type": "SqlDWSink",
"preCopyScript": "TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]",
"allowPolyBase": true
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "$azureStorageLinkedService",
"type": "LinkedServiceReference"
}
}
}
}
]
}
}
],
"parameters": {
"tableList": {
"type": "Object"
}
}
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryPipeline command.
$pipelineDefinition | Out-File c:\$IterateAndCopySQLTablesPipeline.json
## Create a pipeline in the data factory
Set-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $IterateAndCopySQLTablesPipeline -File "c:\$IterateAndCopySQLTablesPipeline.json"
# Create a pipeline in the data factory that retrieves a list of tables and invokes the above pipeline for each table to be copied
$pipeline2Definition = @"
{
"name":"$pipelineGetTableListAndTriggerCopyData",
"properties":{
"activities":[
{
"name": "LookupTableList",
"description": "Retrieve the table list from Azure SQL dataabse",
"type": "Lookup",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'SalesLT' and TABLE_NAME <> 'ProductModel'"
},
"dataset": {
"referenceName": "$azureSqlDatabaseDataset",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "TriggerCopy",
"type": "ExecutePipeline",
"typeProperties": {
"parameters": {
"tableList": {
"value": "@activity('LookupTableList').output.value",
"type": "Expression"
}
},
"pipeline": {
"referenceName": "$IterateAndCopySQLTablesPipeline",
"type": "PipelineReference"
},
"waitOnCompletion": true
},
"dependsOn": [
{
"activity": "LookupTableList",
"dependencyConditions": [
"Succeeded"
]
}
]
}
]
}
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Set-AzDataFactoryPipeline command.
$pipeline2Definition | Out-File c:\$pipelineGetTableListAndTriggerCopyData.json
## Create a pipeline in the data factory
Set-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -Name $pipelineGetTableListAndTriggerCopyData -File "c:\$pipelineGetTableListAndTriggerCopyData.json"
# Create a pipeline run
## JSON definition for dummy pipeline parameters
$pipelineParameters = @"
{
"dummy": "b"
}
"@
## IMPORTANT: store the JSON definition in a file that will be used by the Invoke-AzDataFactoryPipeline command.
$pipelineParameters | Out-File c:\PipelineParameters.json
# Create a pipeline run by using parameters
$runId = Invoke-AzDataFactoryPipeline -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineName $pipelineGetTableListAndTriggerCopyData -ParameterFile c:\PipelineParameters.json
# Check the pipeline run status until it finishes the copy operation
Start-Sleep -Seconds 30
while ($True) {
$result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)
if (($result | Where-Object { $_.Status -eq "InProgress" } | Measure-Object).count -ne 0) {
Write-Host "Pipeline run status: In Progress" -foregroundcolor "Yellow"
Start-Sleep -Seconds 30
}
else {
Write-Host "Pipeline '$pipelineGetTableListAndTriggerCopyData' run finished. Result:" -foregroundcolor "Yellow"
$result
break
}
}
# Get the activity run details
$result = Get-AzDataFactoryActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName `
-PipelineRunId $runId `
-RunStartedAfter (Get-Date).AddMinutes(-10) `
-RunStartedBefore (Get-Date).AddMinutes(10) `
-ErrorAction Stop
$result
if ($result.Status -eq "Succeeded") {`
$result.Output -join "`r`n"`
}`
else {`
$result.Error -join "`r`n"`
}
# To remove the data factory from the resource gorup
# Remove-AzDataFactory -Name $dataFactoryName -ResourceGroupName $resourceGroupName
#
# To remove the whole resource group
# Remove-AzResourceGroup -Name $resourceGroupName
Membersihkan penyebaran
Setelah menjalankan sampel skrip, Anda dapat menggunakan perintah berikut untuk menghapus grup sumber daya dan semua sumber daya yang terkait:
Remove-AzResourceGroup -ResourceGroupName $resourceGroupName
Untuk menghapus pabrik data dari grup sumber daya, jalankan perintah berikut:
Remove-AzDataFactoryV2 -Name $dataFactoryName -ResourceGroupName $resourceGroupName
Penjelasan skrip
Skrip ini menggunakan perintah berikut:
| Perintah | Catatan |
|---|---|
| Baru-AzResourceGroup | Membuat grup sumber daya tempat semua sumber daya disimpan. |
| Set-AzDataFactoryV2 | Membuat pabrik data. |
| Set-AzDataFactoryV2LinkedService | Buat layanan tertaut di pabrik data. Layanan tertaut menautkan penyimpanan data atau komputasi ke pabrik data. |
| Set-AzDataFactoryV2Dataset | Membuat himpunan data di pabrik data. Himpunan data yang mewakili input/output untuk aktivitas dalam alur. |
| Set-AzDataFactoryV2Pipeline | Membuat alur di pabrik data. Alur memuat satu atau beberapa aktivitas yang melakukan operasi tertentu. Dalam alur ini, aktivitas salin menyalin data dari satu lokasi ke lokasi lain dalam Microsoft Azure Blob Storage. |
| Invoke-AzDataFactoryV2Pipeline | Membuat eksekusi untuk alur. Dengan kata lain, menjalankan alur. |
| Get-AzDataFactoryV2ActivityRun | Mendapatkan detail tentang jalannya aktivitas (eksekusi aktivitas) di alur. |
| Remove-AzResourceGroup | Menghapus grup sumber daya termasuk semua sumber daya berlapis. |
Langkah berikutnya
Untuk informasi selengkapnya tentang Azure PowerShell, lihat dokumentasi Azure PowerShell.
Sampel skrip PowerShell Azure Data Factory tambahan dapat ditemukan di Skrip PowerShell Azure Data Factory.