您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

在 Azure 数据工厂中进行持续集成和交付 (CI/CD)Continuous integration and delivery (CI/CD) in Azure Data Factory

概述Overview

持续集成是这样一种做法:自动地尽早测试对代码库所做的每项更改。Continuous Integration is the practice of testing each change done to your codebase automatically and as early as possible. 持续交付遵循在持续集成期间进行的测试,并将更改推送到过渡或生产系统。 Continuous Delivery follows the testing that happens during Continuous Integration and pushes changes to a staging or production system.

在 Azure 数据工厂中,持续集成 & 提供意味着将数据工厂管道从一个环境(开发、测试、生产)移到另一个环境。In Azure Data Factory, continuous integration & delivery means moving Data Factory pipelines from one environment (development, test, production) to another. 若要 & 传递进行持续集成,你可以将数据工厂 UX 集成与 Azure 资源管理器模板结合使用。To do continuous integration & delivery, you can use Data Factory UX integration with Azure Resource Manager templates. 数据工厂 UX 可以从ARM 模板下拉列表中生成资源管理器模板。The Data Factory UX can generate a Resource Manager template from the ARM template dropdown. 选择“导出 ARM 模板”时,门户会为数据工厂生成资源管理器模板,并生成一个包含所有连接字符串和其他参数的配置文件。When you select Export ARM template, the portal generates the Resource Manager template for the data factory and a configuration file that includes all your connections strings and other parameters. 然后,你必须为每个环境(开发、测试、生产)创建一个配置文件。Then you've to create one configuration file for each environment (development, test, production). 所有环境的主资源管理器模板文件始终相同。The main Resource Manager template file remains the same for all the environments.

有关此功能的 9 分钟介绍和演示,请观看以下视频:For a nine-minute introduction and demonstration of this feature, watch the following video:

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

持续集成生命周期Continuous integration lifecycle

下面是使用 Azure Repos Git 配置的 Azure 数据工厂中的持续集成和交付生命周期的示例概述。Below is a sample overview of the continuous integration and delivery lifecycle in an Azure Data factory configured with Azure Repos Git. 有关如何配置 Git 存储库的详细信息,请参阅Azure 数据工厂中的源代码管理For more information on how to configure a Git Repository, see Source control in Azure Data Factory.

  1. 开发数据工厂使用 Azure Repos Git 创建和配置,其中所有开发人员都有权创作数据工厂资源,如管道和数据集。A development data factory is created and configured with Azure Repos Git where all developers have permission to author Data Factory resources such as pipelines and datasets.

  2. 当开发人员在其功能分支中进行更改时,它们会使用最新的更改对其管道运行进行调试。As the developers make changes in their feature branch, they debug their pipeline runs with their most recent changes. 有关如何调试管道运行的详细信息,请参阅使用 Azure 数据工厂进行迭代开发和调试For more information on how to debug a pipeline run, see Iterative development and debugging with Azure Data Factory.

  3. 当开发人员对更改感到满意后,他们会创建一个从其功能分支到主分支或协作分支的拉取请求,以获取对等方所做的更改。Once the developers are satisfied with their changes, they create a pull request from their feature branch to the master or collaboration branch to get their changes reviewed by peers.

  4. 批准拉取请求并将更改合并到主分支后,它们可发布到开发工厂。After the pull request is approved and changes are merged in the master branch, they can publish to the development factory.

  5. 当团队准备好将所做的更改部署到测试工厂,然后再部署到生产工厂时,它们将从 master 分支导出资源管理器模板。When the team is ready to deploy the changes to the test factory and then to the production factory, they export the Resource Manager template from the master branch.

  6. 将导出的资源管理器模板部署到测试工厂和生产工厂的不同参数文件。The exported Resource Manager template gets deployed with different parameter files to the test factory and the production factory.

为每个环境创建资源管理器模板Create a Resource Manager template for each environment

ARM 模板下拉列表中,选择 "导出 ARM 模板",为开发环境中的数据工厂导出资源管理器模板。From the ARM template dropdown, select Export ARM template to export the Resource Manager template for your data factory in the development environment.

在测试和生产数据工厂中,选择 "导入 ARM 模板"。In your test and production data factories, select Import ARM template. 此操作会将你转到 Azure 门户,可以在其中导入已导出的模板。This action takes you to the Azure portal, where you can import the exported template. 选择 "在编辑器中生成自己的模板" 以打开 "资源管理器模板编辑器"。Select Build your own template in the editor to open the Resource Manager template editor.

单击 "加载文件",然后选择生成的资源管理器模板。Click Load file and select the generated Resource Manager template.

在 "设置" 窗格中,输入配置值,例如链接的服务凭据。In the settings pane, enter the configuration values such as linked service credentials. 完成后,单击 "购买" 以部署资源管理器模板。Once you're done, click Purchase to deploy the Resource Manager template.

连接字符串Connection strings

有关如何配置连接字符串的信息,请参阅每个连接器的文章。Information on how to configure connection strings can be found in each connector's article. 例如,对于 Azure SQL 数据库,请参阅使用 Azure 数据工厂向/从 Azure SQL 数据库复制数据For example, for Azure SQL Database, see Copy data to or from Azure SQL Database by using Azure Data Factory. 若要验证连接字符串,可以在数据工厂 UX 中打开资源的 "代码" 视图。To verify a connection string, you can open the code view for the resource in the Data Factory UX. 在 "代码" 视图中,将删除连接字符串的密码或帐户密钥部分。In code view, the password or account key portion of the connection string is removed. 若要打开代码视图,请选择下面的屏幕截图中突出显示的图标。To open code view, select the icon highlighted in the following screenshot.

打开代码视图来查看连接字符串

使用 Azure Pipelines 发布自动完成持续集成Automate continuous integration with Azure Pipelines releases

下面是设置 Azure Pipelines 版本的指南,它自动将数据工厂部署到多个环境。Below is a guide to set up an Azure Pipelines release, which automates the deployment of a data factory to multiple environments.

与 Azure Pipelines 的持续集成示意图

要求Requirements

设置 Azure Pipelines 发布Set up an Azure Pipelines release

  1. Azure DevOps 用户体验中,打开配置了数据工厂的项目。In the Azure DevOps user experience, open the project configured with your Data Factory.

  2. 在页面左侧,单击 "管道",然后选择 "发布"。On the left side of the page, click Pipelines and then select Releases.

  3. 选择 "新建管道"; 如果您有现有管道,则选择 "新建",然后选择新的发布管道Select New pipeline or if you have existing pipelines, New, and then New release pipeline.

  4. 选择空作业模板。Select the Empty job template.

  5. 在 "阶段名称" 字段中,输入环境的名称。In the Stage name field, enter the name of your environment.

  6. 选择 "添加项目",并选择配置有数据工厂的相同存储库。Select Add an artifact, and select the same repository configured with your Data Factory. 选择 adf_publish 作为使用最新默认版本的默认分支。Choose adf_publish as the default branch with latest default version.

  7. 添加 Azure 资源管理器部署任务:Add an Azure Resource Manager Deployment task:

    a.在“解决方案资源管理器”中,右键单击项目文件夹下的“引用”文件夹,并单击“添加引用”。a. 在 "阶段" 视图中,单击 "查看阶段任务" 链接。In the stage view, click the View stage tasks link.

    b.b. 创建新的任务。Create a new task. 搜索 " Azure 资源组部署",然后单击 "添加"。Search for Azure Resource Group Deployment, and click Add.

    c.c. 在部署任务中选择目标数据工厂对应的订阅、资源组和位置,然后根据需要提供凭据。In the Deployment task, choose the subscription, resource group, and location for the target Data Factory, and provide credentials if necessary.

    d.d. 在 "操作" 下拉列表中,选择 "创建或更新资源组"。In the action dropdown, select Create or update resource group.

    e.e. 在“模板”字段中Select 选择“…”。in the Template field. 通过为每个环境创建资源管理器模板中的 "导入 ARM 模板" 步骤,浏览 Azure 资源管理器模板创建。Browse for the Azure Resource Manager template create via the Import ARM Template step in Create a resource manager template for each environment. <FactoryName> 分支的文件夹 adf_publish 中查找该文件。Look for this file in the folder <FactoryName> of the adf_publish branch.

    f.f. 在“模板”字段中Select 在 "模板参数" 字段中。in the Template parameters field. 选择参数文件。to choose the parameters file. 选择正确的文件,具体取决于你是创建了副本,还是使用默认的 ARMTemplateParametersForFactory.json 文件。Choose the correct file depending on whether you created a copy or you’re using the default file ARMTemplateParametersForFactory.json.

    g.g. 在“模板”字段中Select 选择“…”,然后填充目标数据工厂的信息。next to the Override template parameters field and fill in the information for the target Data Factory. 对于来自密钥保管库的凭据,请在双引号之间输入密钥名称。For credentials that come from key vault, enter the secret name between double quotes. 例如,如果密码的名称为 cred1,请输入 "$(cred1)"作为其值。For example, if the secret’s name is cred1, enter "$(cred1)"for its value.

    h.h. 选择“增量”部署模式。Select the Incremental Deployment Mode.

    警告

    如果选择 "完成部署模式",则可能会删除现有资源,其中包括未在资源管理器模板中定义的目标资源组中的所有资源。If you select Complete deployment mode, existing resources may be deleted, including all the resources in the target resource group that aren't defined in the Resource Manager template.

  8. 保存发布管道。Save the release pipeline.

  9. 若要触发发布,请单击 "创建发布"To trigger a release, click Create release

从 Azure Key Vault 获取机密Get secrets from Azure Key Vault

如果你有权传入 Azure 资源管理器模板中的机密,则建议在 Azure Pipelines 版本中使用 Azure Key Vault。If you've secrets to pass in an Azure Resource Manager template, we recommend using Azure Key Vault with the Azure Pipelines release.

可以通过两种方式来处理机密:There are two ways to handle secrets:

  1. 将机密添加到参数文件。Add the secrets to parameters file. 有关详细信息,请参阅在部署过程中使用 Azure Key Vault 传递安全参数值For more info, see Use Azure Key Vault to pass secure parameter value during deployment.

    • 创建上传到 publish 分支的参数文件的副本,并使用以下格式设置需要从密钥保管库获取的参数的值:Create a copy of the parameters file that is uploaded to the publish branch and set the values of the parameters you want to get from key vault with the following format:
    {
        "parameters": {
            "azureSqlReportingDbPassword": {
                "reference": {
                    "keyVault": {
                        "id": "/subscriptions/<subId>/resourceGroups/<resourcegroupId> /providers/Microsoft.KeyVault/vaults/<vault-name> "
                    },
                    "secretName": " < secret - name > "
                }
            }
        }
    }
    
    • 使用此方法时,会自动从密钥保管库拉取机密。When you use this method, the secret is pulled from the key vault automatically.

    • 参数文件也需位于 publish 分支中。The parameters file needs to be in the publish branch as well.

  2. 在上一部分中介绍的 Azure 资源管理器部署之前添加 Azure Key Vault 任务Add an Azure Key Vault task before the Azure Resource Manager Deployment described in the previous section:

    • 选择“任务”选项卡,创建新的任务,然后搜索并添加“Azure Key Vault”。Select the Tasks tab, create a new task, search for Azure Key Vault and add it.

    • 在 Key Vault 任务中,选择在其中创建了密钥保管库的订阅,根据需要提供凭据,然后选择密钥保管库。In the Key Vault task, choose the subscription in which you created the key vault, provide credentials if necessary, and then choose the key vault.

向 Azure Pipelines 代理授权Grant permissions to the Azure Pipelines agent

如果没有适当的权限,Azure Key Vault 任务可能会失败并出现 "拒绝访问" 错误。The Azure Key Vault task may fail with an Access Denied error if the proper permissions aren't present. 请下载此发行版的日志,并使用此命令找到 .ps1 文件,以便向 Azure Pipelines 代理授权。Download the logs for the release, and locate the .ps1 file with the command to give permissions to the Azure Pipelines agent. 可以直接运行此命令,也可以从文件中复制主体 ID,然后在 Azure 门户中手动添加访问策略。You can run the command directly, or you can copy the principal ID from the file and add the access policy manually in the Azure portal. GetList是所需的最小权限。Get and List are the minimum permissions required.

更新活动触发器Update active triggers

如果尝试更新活动触发器,部署可能会失败。Deployment can fail if you try to update active triggers. 若要更新活动触发器,需手动将其停止,在部署后再将其启动。To update active triggers, you need to manually stop them and start them after the deployment. 可以通过 Azure Powershell 任务实现此目的。You can do this via an Azure Powershell task.

  1. 在版本的 "任务" 选项卡中,添加Azure Powershell任务。In the Tasks tab of the release, add an Azure Powershell task.

  2. 选择“Azure 资源管理器”作为连接类型,然后选择订阅。Choose Azure Resource Manager as the connection type and select your subscription.

  3. 选择“内联脚本”作为脚本类型,然后提供代码。Choose Inline Script as the script type and then provide your code. 以下示例停止触发器:The following example stops the triggers:

    $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
    
    $triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name $_.name -Force }
    

部署后,可以执行类似的 Start-AzDataFactoryV2Trigger 步骤来重新启动触发器。You can follow similar steps (with the Start-AzDataFactoryV2Trigger function) to restart the triggers after deployment.

重要

在持续集成和部署方案中,不同环境之间的集成运行时类型必须相同。In continuous integration and deployment scenarios, the Integration Runtime type across different environments must be the same. 例如,如果在开发环境中有自承载集成运行时 (IR),则在测试和生产等其他环境中同一 IR 的类型必须为自承载。For example, if you have a Self-Hosted Integration Runtime (IR) in the development environment, the same IR must be of type Self-Hosted in other environments such as test and production also. 同样,如果跨多个阶段共享集成运行时,则必须在所有环境(如开发、测试和生产)中将集成运行时配置为“链接自承载”。Similarly, if you're sharing integration runtimes across multiple stages, you have to configure the Integration Runtimes as Linked Self-Hosted in all environments, such as development, test, and production.

示例预/部署后脚本Sample pre/postdeployment script

下面是一个示例脚本,用于在部署之前停止触发器并随后重启触发器。Below is a sample script to stop triggers before deployment and to restart triggers afterwards. 此脚本还包括用于删除已移除资源的代码。The script also includes code to delete resources that have been removed. 若要安装最新版本的 Azure PowerShell,请参阅使用 PowerShellGet 在 Windows 上安装 Azure PowerShellTo install the latest version of Azure PowerShell, see Install Azure PowerShell on Windows with PowerShellGet.

param
(
    [parameter(Mandatory = $false)] [String] $rootFolder,
    [parameter(Mandatory = $false)] [String] $armTemplate,
    [parameter(Mandatory = $false)] [String] $ResourceGroupName,
    [parameter(Mandatory = $false)] [String] $DataFactoryName,
    [parameter(Mandatory = $false)] [Bool] $predeployment=$true,
    [parameter(Mandatory = $false)] [Bool] $deleteDeployment=$false
)

$templateJson = Get-Content $armTemplate | ConvertFrom-Json
$resources = $templateJson.resources

#Triggers 
Write-Host "Getting triggers"
$triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
$triggersTemplate = $resources | Where-Object { $_.type -eq "Microsoft.DataFactory/factories/triggers" }
$triggerNames = $triggersTemplate | ForEach-Object {$_.name.Substring(37, $_.name.Length-40)}
$activeTriggerNames = $triggersTemplate | Where-Object { $_.properties.runtimeState -eq "Started" -and ($_.properties.pipelines.Count -gt 0 -or $_.properties.pipeline.pipelineReference -ne $null)} | ForEach-Object {$_.name.Substring(37, $_.name.Length-40)}
$deletedtriggers = $triggersADF | Where-Object { $triggerNames -notcontains $_.Name }
$triggerstostop = $triggerNames | where { ($triggersADF | Select-Object name).name -contains $_ }

if ($predeployment -eq $true) {
    #Stop all triggers
    Write-Host "Stopping deployed triggers"
    $triggerstostop | ForEach-Object { 
        Write-host "Disabling trigger " $_
        Stop-AzDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name $_ -Force 
    }
}
else {
    #Deleted resources
    #pipelines
    Write-Host "Getting pipelines"
    $pipelinesADF = Get-AzDataFactoryV2Pipeline -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
    $pipelinesTemplate = $resources | Where-Object { $_.type -eq "Microsoft.DataFactory/factories/pipelines" }
    $pipelinesNames = $pipelinesTemplate | ForEach-Object {$_.name.Substring(37, $_.name.Length-40)}
    $deletedpipelines = $pipelinesADF | Where-Object { $pipelinesNames -notcontains $_.Name }
    #datasets
    Write-Host "Getting datasets"
    $datasetsADF = Get-AzDataFactoryV2Dataset -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
    $datasetsTemplate = $resources | Where-Object { $_.type -eq "Microsoft.DataFactory/factories/datasets" }
    $datasetsNames = $datasetsTemplate | ForEach-Object {$_.name.Substring(37, $_.name.Length-40) }
    $deleteddataset = $datasetsADF | Where-Object { $datasetsNames -notcontains $_.Name }
    #linkedservices
    Write-Host "Getting linked services"
    $linkedservicesADF = Get-AzDataFactoryV2LinkedService -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
    $linkedservicesTemplate = $resources | Where-Object { $_.type -eq "Microsoft.DataFactory/factories/linkedservices" }
    $linkedservicesNames = $linkedservicesTemplate | ForEach-Object {$_.name.Substring(37, $_.name.Length-40)}
    $deletedlinkedservices = $linkedservicesADF | Where-Object { $linkedservicesNames -notcontains $_.Name }
    #Integrationruntimes
    Write-Host "Getting integration runtimes"
    $integrationruntimesADF = Get-AzDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -ResourceGroupName $ResourceGroupName
    $integrationruntimesTemplate = $resources | Where-Object { $_.type -eq "Microsoft.DataFactory/factories/integrationruntimes" }
    $integrationruntimesNames = $integrationruntimesTemplate | ForEach-Object {$_.name.Substring(37, $_.name.Length-40)}
    $deletedintegrationruntimes = $integrationruntimesADF | Where-Object { $integrationruntimesNames -notcontains $_.Name }

    #Delete resources
    Write-Host "Deleting triggers"
    $deletedtriggers | ForEach-Object { 
        Write-Host "Deleting trigger "  $_.Name
        $trig = Get-AzDataFactoryV2Trigger -name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName
        if ($trig.RuntimeState -eq "Started") {
            Stop-AzDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name $_.Name -Force 
        }
        Remove-AzDataFactoryV2Trigger -Name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Force 
    }
    Write-Host "Deleting pipelines"
    $deletedpipelines | ForEach-Object { 
        Write-Host "Deleting pipeline " $_.Name
        Remove-AzDataFactoryV2Pipeline -Name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Force 
    }
    Write-Host "Deleting datasets"
    $deleteddataset | ForEach-Object { 
        Write-Host "Deleting dataset " $_.Name
        Remove-AzDataFactoryV2Dataset -Name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Force 
    }
    Write-Host "Deleting linked services"
    $deletedlinkedservices | ForEach-Object { 
        Write-Host "Deleting Linked Service " $_.Name
        Remove-AzDataFactoryV2LinkedService -Name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Force 
    }
    Write-Host "Deleting integration runtimes"
    $deletedintegrationruntimes | ForEach-Object { 
        Write-Host "Deleting integration runtime " $_.Name
        Remove-AzDataFactoryV2IntegrationRuntime -Name $_.Name -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Force 
    }

    if ($deleteDeployment -eq $true) {
        Write-Host "Deleting ARM deployment ... under resource group: " $ResourceGroupName
        $deployments = Get-AzResourceGroupDeployment -ResourceGroupName $ResourceGroupName
        $deploymentsToConsider = $deployments | Where { $_.DeploymentName -like "ArmTemplate_master*" -or $_.DeploymentName -like "ArmTemplateForFactory*" } | Sort-Object -Property Timestamp -Descending
        $deploymentName = $deploymentsToConsider[0].DeploymentName

       Write-Host "Deployment to be deleted: " $deploymentName
        $deploymentOperations = Get-AzResourceGroupDeploymentOperation -DeploymentName $deploymentName -ResourceGroupName $ResourceGroupName
        $deploymentsToDelete = $deploymentOperations | Where { $_.properties.targetResource.id -like "*Microsoft.Resources/deployments*" }

        $deploymentsToDelete | ForEach-Object { 
            Write-host "Deleting inner deployment: " $_.properties.targetResource.id
            Remove-AzResourceGroupDeployment -Id $_.properties.targetResource.id
        }
        Write-Host "Deleting deployment: " $deploymentName
        Remove-AzResourceGroupDeployment -ResourceGroupName $ResourceGroupName -Name $deploymentName
    }

    #Start Active triggers - After cleanup efforts
    Write-Host "Starting active triggers"
    $activeTriggerNames | ForEach-Object { 
        Write-host "Enabling trigger " $_
        Start-AzDataFactoryV2Trigger -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name $_ -Force 
    }
}

将自定义参数用于资源管理器模板Use custom parameters with the Resource Manager template

如果处于 GIT 模式,则可以覆盖资源管理器模板中的默认属性,以设置在模板中参数化的属性和硬编码的属性。If you're in GIT mode, you can override the default properties in your Resource Manager template to set properties that are parameterized in the template and properties that are hard-coded. 在以下情况下,你可能希望重写默认参数化模板:You might want to override the default parameterization template in these scenarios:

  • 你使用自动 CI/CD,并且想要在资源管理器部署过程中更改某些属性,但默认情况下不会对属性进行参数化。You use automated CI/CD and you want to change some properties during Resource Manager deployment, but the properties aren't parameterized by default.
  • 由于默认资源管理器模板超过允许的最大参数(256),因此您的工厂非常大。Your factory is so large that the default Resource Manager template is invalid because it has more than the maximum allowed parameters (256).

在这些情况下,若要重写默认参数化模板,请在存储库的根文件夹中创建一个名为 " arm-模板-参数-json " 的文件。Under these conditions, to override the default parameterization template, create a file named arm-template-parameters-definition.json in the root folder of the repository. 文件名必须完全匹配。The file name must exactly match. 数据工厂尝试从你当前在 Azure 数据工厂门户中的哪个分支读取此文件,而不仅仅是从协作分支读取此文件。Data Factory tries to read this file from whichever branch you're currently on in the Azure Data Factory portal, not just from the collaboration branch. 你可以从私有分支创建或编辑该文件,你可以使用 UI 中的 "导出 ARM" 模板来测试你的更改。You can create or edit the file from a private branch, where you can test your changes by using the Export ARM template in the UI. 然后,可以将文件合并到协作分支。Then, you can merge the file into the collaboration branch. 如果未找到任何文件,则使用默认模板。If no file is found, the default template is used.

自定义参数文件的语法Syntax of a custom parameters file

下面是在创作自定义参数文件时使用的一些准则。Here are some guidelines to use when you author the custom parameters file. 该文件由每个实体类型的部分组成:触发器、管道、链接服务、数据集、集成运行时等。The file consists of a section for each entity type: trigger, pipeline, linked service, dataset, integration runtime, and so on.

  • 输入相关实体类型下的属性路径。Enter the property path under the relevant entity type.
  • 将属性名称设置为 "*" 时,指示要参数化其下的所有属性(仅向下到第一级,而不是递归)。When you set a property name to '*'', you indicate that you want to parameterize all properties under it (only down to the first level, not recursively). 你还可以向此提供任何例外。You can also provide any exceptions to this.
  • 将属性的值设置为字符串时,表示你希望参数化该属性。When you set the value of a property as a string, you indicate that you want to parameterize the property. 使用格式 <action>:<name>:<stype>Use the format <action>:<name>:<stype>.
    • <action> 可以是以下字符之一:<action> can be one of the following characters:
      • = 意味着保留当前值作为参数的默认值。= means keep the current value as the default value for the parameter.
      • - 表示不保留参数的默认值。- means don't keep the default value for the parameter.
      • 对于连接字符串或密钥,| 是 Azure Key Vault 机密的特殊情况。| is a special case for secrets from Azure Key Vault for connection strings or keys.
    • <name> 是参数的名称。<name> is the name of the parameter. 如果为空,则采用属性的名称。If it's blank, it takes the name of the property. 如果值以 - 字符开头,则将缩短名称。If the value starts with a - character, the name is shortened. 例如,AzureStorage1_properties_typeProperties_connectionString 将缩短为 AzureStorage1_connectionStringFor example, AzureStorage1_properties_typeProperties_connectionString would be shortened to AzureStorage1_connectionString.
    • <stype> 是参数的类型。<stype> is the type of parameter. 如果 <stype> 为空白,则默认类型为 stringIf <stype> is blank, the default type is string. 支持的值: stringboolnumberobjectsecurestringSupported values: string, bool, number, object, and securestring.
  • 如果在定义文件中指定数组,则表明模板中的匹配属性是数组。When you specify an array in the definition file, you indicate that the matching property in the template is an array. 数据工厂使用数组的 Integration Runtime 对象中指定的定义来循环访问数组中的所有对象。Data Factory iterates through all the objects in the array by using the definition that's specified in the Integration Runtime object of the array. 第二个对象(一个字符串)成为属性的名称,这用作每次遍历的参数的名称。The second object, a string, becomes the name of the property, which is used as the name for the parameter for each iteration.
  • 不能有特定于资源实例的定义。It's not possible to have a definition that's specific for a resource instance. 任何定义都适用于该类型的所有资源。Any definition applies to all resources of that type.
  • 默认情况下,所有安全字符串(如 Key Vault 机密)和安全字符串(如连接字符串、密钥和令牌)都是参数化的。By default, all secure strings, such as Key Vault secrets, and secure strings, such as connection strings, keys, and tokens, are parameterized.

示例参数化模板Sample parameterization template

下面是参数化模板的示例:Below is an example of what a parameterization template may look like:

{
    "Microsoft.DataFactory/factories/pipelines": {
        "properties": {
            "activities": [{
                "typeProperties": {
                    "waitTimeInSeconds": "-::number",
                    "headers": "=::object"
                }
            }]
        }
    },
    "Microsoft.DataFactory/factories/integrationRuntimes": {
        "properties": {
            "typeProperties": {
                "*": "="
            }
        }
    },
    "Microsoft.DataFactory/factories/triggers": {
        "properties": {
            "typeProperties": {
                "recurrence": {
                    "*": "=",
                    "interval": "=:triggerSuffix:number",
                    "frequency": "=:-freq"
                },
                "maxConcurrency": "="
            }
        }
    },
    "Microsoft.DataFactory/factories/linkedServices": {
        "*": {
            "properties": {
                "typeProperties": {
                    "accountName": "=",
                    "username": "=",
                    "connectionString": "|:-connectionString:secureString",
                    "secretAccessKey": "|"
                }
            }
        },
        "AzureDataLakeStore": {
            "properties": {
                "typeProperties": {
                    "dataLakeStoreUri": "="
                }
            }
        }
    },
    "Microsoft.DataFactory/factories/datasets": {
        "properties": {
            "typeProperties": {
                "*": "="
            }
        }
    }
}

下面说明如何构造上述模板,按资源类型细分。Below is an explanation of how the above template is constructed, broken down by resource type.

管道Pipelines

  • Path activity/typeProperties/waitTimeInSeconds 中的任何属性均已参数化。Any property in the path activities/typeProperties/waitTimeInSeconds is parameterized. 管道中具有名为 waitTimeInSeconds (例如,Wait 活动)的代码级属性的任何活动都参数化为数字,具有默认名称。Any activity in a pipeline that has a code-level property named waitTimeInSeconds (for example, the Wait activity) is parameterized as a number, with a default name. 但资源管理器模板中没有默认值。But it won't have a default value in the Resource Manager template. 在资源管理器部署过程中,它将是必需的输入。It will be a mandatory input during the Resource Manager deployment.
  • 同样,名为 headers 的属性(例如,在 Web 活动中)使用类型 object (JObject)进行参数化。Similarly, a property called headers (for example, in a Web activity) is parameterized with type object (JObject). 它有一个默认值,该值与源工厂中的值相同。It has a default value, which is the same value as in the source factory.

IntegrationRuntimesIntegrationRuntimes

  • 路径 typeProperties 下的所有属性均由其各自的默认值参数化。All properties under the path typeProperties are parameterized with their respective default values. 例如," IntegrationRuntimes " 类型 "属性: computeProperties" 和 "ssisProperties" 下有两个属性。For example, there are two properties under IntegrationRuntimes type properties: computeProperties and ssisProperties. 这两个属性类型都是用各自的默认值和类型(对象)创建的。Both property types are created with their respective default values and types (Object).

触发器Triggers

  • typeProperties下,参数化两个属性。Under typeProperties, two properties are parameterized. 第一个是 maxConcurrency,它指定为具有默认值,并且为string类型。The first one is maxConcurrency, which is specified to have a default value and is of typestring. 它的默认参数名称为 <entityName>_properties_typeProperties_maxConcurrencyIt has the default parameter name of <entityName>_properties_typeProperties_maxConcurrency.
  • recurrence 属性也是参数化的。The recurrence property also is parameterized. 在该级别下,将指定该级别的所有属性指定为字符串,并将默认值和参数名称指定为参数。Under it, all properties at that level are specified to be parameterized as strings, with default values and parameter names. 一个例外情况是 interval 属性,该属性参数化为数值类型,参数名称后缀为 <entityName>_properties_typeProperties_recurrence_triggerSuffixAn exception is the interval property, which is parameterized as number type, and with the parameter name suffixed with <entityName>_properties_typeProperties_recurrence_triggerSuffix. 同样,freq 属性是字符串,参数化为字符串。Similarly, the freq property is a string and is parameterized as a string. 但是,不使用默认值对 freq 属性进行参数化。However, the freq property is parameterized without a default value. 名称将被缩短并带有后缀。The name is shortened and suffixed. 例如,<entityName>_freqFor example, <entityName>_freq.

LinkedServicesLinkedServices

  • 链接服务是唯一的。Linked services are unique. 由于链接服务和数据集具有各种类型,因此你可以提供特定于类型的自定义。Because linked services and datasets have a wide range of types, you can provide type-specific customization. 在此示例中,所有类型的链接服务 AzureDataLakeStore,将应用特定的模板,所有其他模板(通过 *)将应用不同的模板。In this example, all linked services of type AzureDataLakeStore, a specific template will be applied, and for all others (via *) a different template will be applied.
  • connectionString 属性将参数化为 securestring 值,它不会有默认值,并且它将具有一个带有后缀 connectionString的短型参数名称。The connectionString property will be parameterized as a securestring value, it won't have a default value, and it will have a shortened parameter name that's suffixed with connectionString.
  • 属性 secretAccessKey AzureKeyVaultSecret (例如,AmazonS3 链接的服务)。The property secretAccessKey happens to be an AzureKeyVaultSecret (for example, in an AmazonS3 linked service). 它自动参数化为 Azure Key Vault 机密,并从配置的密钥保管库提取。It's automatically parameterized as an Azure Key Vault secret and fetched from the configured key vault. 还可以参数化密钥保管库本身。You can also parameterize the key vault itself.

数据集Datasets

  • 尽管特定于类型的自定义可用于数据集,但可以提供配置,而无需显式配置 *级别。Although type-specific customization is available for datasets, configuration can be provided without explicitly having a *-level configuration. 在上面的示例中,typeProperties 中的所有数据集属性都进行了参数化。In the above example, all dataset properties under typeProperties are parameterized.

默认参数化模板Default parameterization template

下面是当前默认参数化模板。Below is the current default parameterization template. 如果只需要添加一个或几个参数,则直接编辑此项可能会很有用,因为不会丢失现有的参数化结构。If you only need to add a one or a few parameters, editing this directly may be helpful as you will not lose the existing parameterization structure.

{
    "Microsoft.DataFactory/factories/pipelines": {
    },
    "Microsoft.DataFactory/factories/integrationRuntimes":{
        "properties": {
            "typeProperties": {
                "ssisProperties": {
                    "catalogInfo": {
                        "catalogServerEndpoint": "=",
                        "catalogAdminUserName": "=",
                        "catalogAdminPassword": {
                            "value": "-::secureString"
                        }
                    },
                    "customSetupScriptProperties": {
                        "sasToken": {
                            "value": "-::secureString"
                        }
                    }
                },
                "linkedInfo": {
                    "key": {
                        "value": "-::secureString"
                    },
                    "resourceId": "="
                }
            }
        }
    },
    "Microsoft.DataFactory/factories/triggers": {
        "properties": {
            "pipelines": [{
                    "parameters": {
                        "*": "="
                    }
                },  
                "pipelineReference.referenceName"
            ],
            "pipeline": {
                "parameters": {
                    "*": "="
                }
            },
            "typeProperties": {
                "scope": "="
            }

        }
    },
    "Microsoft.DataFactory/factories/linkedServices": {
        "*": {
            "properties": {
                "typeProperties": {
                    "accountName": "=",
                    "username": "=",
                    "userName": "=",
                    "accessKeyId": "=",
                    "servicePrincipalId": "=",
                    "userId": "=",
                    "clientId": "=",
                    "clusterUserName": "=",
                    "clusterSshUserName": "=",
                    "hostSubscriptionId": "=",
                    "clusterResourceGroup": "=",
                    "subscriptionId": "=",
                    "resourceGroupName": "=",
                    "tenant": "=",
                    "dataLakeStoreUri": "=",
                    "baseUrl": "=",
                    "database": "=",
                    "serviceEndpoint": "=",
                    "batchUri": "=",
                    "databaseName": "=",
                    "systemNumber": "=",
                    "server": "=",
                    "url":"=",
                    "aadResourceId": "=",
                    "connectionString": "|:-connectionString:secureString"
                }
            }
        },
        "Odbc": {
            "properties": {
                "typeProperties": {
                    "userName": "=",
                    "connectionString": {
                        "secretName": "="
                    }
                }
            }
        }
    },
    "Microsoft.DataFactory/factories/datasets": {
        "*": {
            "properties": {
                "typeProperties": {
                    "folderPath": "=",
                    "fileName": "="
                }
            }
        }}
}

下面是如何将单个值添加到默认参数化模板的示例。 我们只想要将 Databricks 链接服务的现有 Databricks 交互式群集 ID 添加到参数文件。 请注意,以下文件与上述文件相同,不同之处在于 Microsoft.DataFactory/factories/linkedServices的属性字段下包含 existingClusterIdNote the below file is the same as the above file except for existingClusterId included under the properties field of Microsoft.DataFactory/factories/linkedServices.

{
    "Microsoft.DataFactory/factories/pipelines": {
    },
    "Microsoft.DataFactory/factories/integrationRuntimes":{
        "properties": {
            "typeProperties": {
                "ssisProperties": {
                    "catalogInfo": {
                        "catalogServerEndpoint": "=",
                        "catalogAdminUserName": "=",
                        "catalogAdminPassword": {
                            "value": "-::secureString"
                        }
                    },
                    "customSetupScriptProperties": {
                        "sasToken": {
                            "value": "-::secureString"
                        }
                    }
                },
                "linkedInfo": {
                    "key": {
                        "value": "-::secureString"
                    },
                    "resourceId": "="
                }
            }
        }
    },
    "Microsoft.DataFactory/factories/triggers": {
        "properties": {
            "pipelines": [{
                    "parameters": {
                        "*": "="
                    }
                },  
                "pipelineReference.referenceName"
            ],
            "pipeline": {
                "parameters": {
                    "*": "="
                }
            },
            "typeProperties": {
                "scope": "="
            }
 
        }
    },
    "Microsoft.DataFactory/factories/linkedServices": {
        "*": {
            "properties": {
                "typeProperties": {
                    "accountName": "=",
                    "username": "=",
                    "userName": "=",
                    "accessKeyId": "=",
                    "servicePrincipalId": "=",
                    "userId": "=",
                    "clientId": "=",
                    "clusterUserName": "=",
                    "clusterSshUserName": "=",
                    "hostSubscriptionId": "=",
                    "clusterResourceGroup": "=",
                    "subscriptionId": "=",
                    "resourceGroupName": "=",
                    "tenant": "=",
                    "dataLakeStoreUri": "=",
                    "baseUrl": "=",
                    "database": "=",
                    "serviceEndpoint": "=",
                    "batchUri": "=",
                    "databaseName": "=",
                    "systemNumber": "=",
                    "server": "=",
                    "url":"=",
                    "aadResourceId": "=",
                    "connectionString": "|:-connectionString:secureString",
                    "existingClusterId": "-"
                }
            }
        },
        "Odbc": {
            "properties": {
                "typeProperties": {
                    "userName": "=",
                    "connectionString": {
                        "secretName": "="
                    }
                }
            }
        }
    },
    "Microsoft.DataFactory/factories/datasets": {
        "*": {
            "properties": {
                "typeProperties": {
                    "folderPath": "=",
                    "fileName": "="
                }
            }
        }}
}

链接的资源管理器模板Linked Resource Manager templates

如果已为数据工厂设置持续集成和部署(CI/CD),则当工厂增长更大时,你可能会遇到 Azure 资源管理器模板限制。If you've set up continuous integration and deployment (CI/CD) for your Data Factories, you may run into the Azure Resource Manager template limits as your factory grows bigger. 限制的一个示例是资源管理器模板中的最大资源数。An Example of a limit is the maximum number of resources in a Resource Manager template. 为了适应大型工厂,以及生成工厂的完整资源管理器模板,数据工厂现在会生成链接资源管理器模板。To accommodate large factories, along with generating the full Resource Manager template for a factory, Data Factory now generates Linked Resource Manager templates. 利用此功能,整个工厂有效负载分为多个文件,因此不会遇到限制。With this feature, the entire factory payload is broken down into several files so you don’t run into the limits.

如果已配置 Git,则会生成链接的模板,并将其与 adf_publish 分支中的完整资源管理器模板一起保存在名为 "linkedTemplates" 的新文件夹下。If you've configured Git, the linked templates are generated and saved alongside the full Resource Manager templates in the adf_publish branch under a new folder called linkedTemplates.

链接的资源管理器模板文件夹

链接的资源管理器模板通常有一个主模板和一组链接到主模板的子模板。The Linked Resource Manager templates usually have a master template and a set of child templates linked to the master. 父模板名为 ArmTemplate_master.json,子模板以如下模式命名:ArmTemplate_0.jsonArmTemplate_1.json,依此类推。The parent template is called ArmTemplate_master.json, and child templates are named with the pattern ArmTemplate_0.json, ArmTemplate_1.json, and so on. 若要使用链接模板而不是完整资源管理器模板,请将 CI/CD 任务更新为指向 ArmTemplate_master.json 而不是 ArmTemplateForFactory.json (完整资源管理器模板)。To use linked templates instead of the full Resource Manager template, update your CI/CD task to point to ArmTemplate_master.json instead of ArmTemplateForFactory.json (the full Resource Manager template). 资源管理器还要求将链接的模板上传到存储帐户,以便它们在部署期间可供 Azure 访问。Resource Manager also requires you to upload the linked templates into a storage account so that they can be accessed by Azure during deployment. 有关详细信息,请参阅通过 VSTS 部署链接的 ARM 模板For more info, see Deploying Linked ARM Templates with VSTS.

不要忘记在执行部署任务之前和之后在 CI/CD 管道中添加数据工厂脚本。Remember to add the Data Factory scripts in your CI/CD pipeline before and after the deployment task.

如果没有配置 Git,则可以通过导出 ARM 模板操作访问链接的模板。If you don’t have Git configured, the linked templates are accessible via the Export ARM template gesture.

热修复生产分支Hot-fix production branch

如果将工厂部署到生产环境,并意识到需要立即修复的 bug,但无法部署当前的协作分支,则可能需要部署热修补程序。If you deploy a factory to production and realize there's a bug that needs to be fixed right away, but you can't deploy the current collaboration branch, you may need to deploy a hot-fix. 此方法称为快速修补工程或 QFE。This approach is as known as quick-fix engineering or QFE.

  1. 在 Azure DevOps 中,请前往部署到生产环境的版本,并找到已部署的上次提交。In Azure DevOps, go to the release that was deployed to production and find the last commit that was deployed.

  2. 从提交消息获取协作分支的提交 ID。From the commit message, get the commit ID of collaboration branch.

  3. 从该提交创建新的热修复分支。Create a new hot-fix branch from that commit.

  4. 转到 Azure 数据工厂 UX,并切换到此分支。Go to the Azure Data Factory UX and switch to this branch.

  5. 使用 Azure 数据工厂 UX,修复错误。Using the Azure Data Factory UX, fix the bug. 测试更改。Test your changes.

  6. 验证修补程序后,请单击 "导出 ARM 模板" 以获取热修补资源管理器模板。Once the fix has been verified, click on Export ARM template to get the hot-fix Resource Manager template.

  7. 手动将此生成签入到 adf_publish 分支。Manually check in this build to the adf_publish branch.

  8. 如果已将发布管道配置为根据 adf_publish 签入自动触发,则新版本将自动启动。If you've configured your release pipeline to automatically trigger based on adf_publish check-ins, a new release will automatically start. 否则,手动将发布排队。Otherwise, manually queue a release.

  9. 将热修补版本部署到测试工厂和生产工厂。Deploy the hot-fix release to the test and production factories. 此版本包含之前的生产负载以及步骤5中所做的修复。This release contains the previous production payload plus the fix made in step 5.

  10. 将修复中的更改添加到开发分支,以便以后版本不会遇到同一 bug。Add the changes from the hot-fix to development branch so that later releases will not run into the same bug.

CI/CD 最佳做法Best practices for CI/CD

如果你使用数据工厂的 Git 集成,并且某个 CI/CD 管道会将更改从“开发”环境依次转移到“测试”和“生产”环境,则我们建议采用以下最佳做法:If you're using Git integration with your data factory, and you have a CI/CD pipeline that moves your changes from Development into Test and then to Production, we recommend the following best practices:

  • Git 集成Git Integration. 仅需通过 Git 集成配置开发数据工厂。You're only required to configure your Development data factory with Git integration. 通过 CI/CD 部署对测试和生产的更改,无需 Git 集成。Changes to Test and Production are deployed via CI/CD, and don't need Git integration.

  • 数据工厂 CI/CD 脚本Data Factory CI/CD script. 在 CI/CD 中资源管理器部署步骤之前,需要执行某些任务(如停止/启动触发器和清理)。Before the Resource Manager deployment step in CI/CD, certain tasks are required such as stopping/starting of triggers and cleanup. 建议在部署前后使用 powershell 脚本。We recommend using powershell scripts before and after deployment. 有关详细信息,请参阅Update active triggerFor more information, see Update active triggers.

  • 集成运行时和共享Integration Runtimes and sharing. 集成运行时不经常更改,并且在 CI/CD 中的所有阶段都是类似的。Integration Runtimes don't change often and are similar across all stages in your CI/CD. 因此,数据工厂预期集成运行时在 CI/CD 的所有阶段具有相同的名称和类型。As a result, Data Factory expects you to have the same name and same type of Integration Runtimes across all stages of CI/CD. 如果希望在所有阶段共享集成运行时,请考虑只使用三元工厂来包含共享集成运行时。If you're looking to share Integration Runtimes across all stages, consider using a ternary factory just for containing the shared Integration Runtimes. 可以在所有环境中将此共享工厂用作链接集成运行时类型。You can use this shared factory in all of your environments as a linked integration runtime type.

  • Key VaultKey Vault. 使用基于 Azure Key Vault 的链接服务时,可以通过为不同的环境保留单独的密钥保管库,进一步充分利用它。When you use Azure Key Vault based linked services, you can take advantages of it further by keeping separate key vaults for different environments. 此外,可为每个 Key Vault 单独配置权限级别。You can also configure separate permission levels for each of them. 例如,你可能不希望团队成员有权使用生产机密。For example, you may not want your team members to have permissions to production secrets. 如果遵循此方法,则建议在所有阶段保留相同的机密名称。If you follow this approach, it's recommended you to keep the same secret names across all stages. 如果保留相同的名称,则无需在 CI/CD 环境中更改资源管理器模板,因为唯一更改的是密钥保管库名称,这是一个资源管理器模板参数。If you keep the same names, you don't have to change your Resource Manager templates across CI/CD environments since the only thing that changes is the key vault name, which is one of the Resource Manager template parameters.

不支持的功能Unsupported features

  • 按照设计,ADF_不_允许挑拣提交或选择性地发布资源。By design, ADF does not allow cherry-picking commits or selective publishing of resources. 发布将包括数据工厂中所做的所有更改Publishes will include all changes made in the data factory

    • 数据工厂实体依赖于其他实体,例如,触发器依赖于管道,管道依赖于数据集和其他管道等。资源子集的选择性发布_可能_会导致意外的行为和错误Data factory entities depend on each other, for instance, triggers depend on pipelines, pipelines depend on datasets and other pipelines, etc. Selective publishing of a subset of resources may lead to unexpected behaviors and errors
    • 如果需要选择性发布,则可以考虑使用热修补程序。On rare occasions where selective publishing is required, you may consider a hot-fix. 有关详细信息,请参阅热修复生产分支For more information, see Hot-Fix Production Branch
  • 不能从私有分支发布You cannot publish from private branches

  • 目前,你不能在 Bitbucket 上托管项目As of now, you cannot host projects on Bitbucket