您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 数据工厂中的集成运行时Integration runtime in Azure Data Factory

集成运行时 (IR) 是 Azure 数据工厂用于在不同的网络环境之间提供以下数据集成功能的计算基础结构:The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments:

  • 数据移动:在公用网络中的数据存储和专用网络(本地或虚拟专用网络)中的数据存储之间移动数据。Data movement: Move data between data stores in public network and data stores in private network (on-premises or virtual private network). 它提供对内置连接器、格式转换、列映射以及性能和可扩展数据传输的支持。It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
  • 活动分派:分派和监视在各种计算服务(如 Azure HDInsight、Azure 机器学习、Azure SQL 数据库、SQL Server 等等)上运行的转换活动。Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more.
  • SSIS 包执行:在托管的 Azure 计算环境中本机执行 SQL Server 集成服务 (SSIS) 包。SSIS package execution: Natively execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.

在数据工厂中,活动定义要执行的操作。In Data Factory, an activity defines the action to be performed. 链接服务定义目标数据存储或计算服务。A linked service defines a target data store or a compute service. 集成运行时提供活动和链接服务之间的桥梁。An integration runtime provides the bridge between the activity and linked Services. 它被链接服务引用,并提供运行或分派活动的计算环境。It is referenced by the linked service, and provides the compute environment where the activity either runs on or gets dispatched from. 这样一来,可以在最接近目标数据存储的区域中执行活动,或者,以最优性能计算服务的同时满足安全和合规性需求。This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.

集成运行时类型Integration runtime types

数据工厂提供三种类型的集成运行时,应选择最能满足你所寻求的数据集成功能和网络环境需求的类型。Data Factory offers three types of Integration Runtime, and you should choose the type that best serve the data integration capabilities and network environment needs you are looking for. 这三种类型是:These three types are:

  • AzureAzure
  • 自承载Self-hosted
  • Azure-SSISAzure-SSIS

下表介绍了针对其中每个集成运行时类型提供的功能和网络支持:The following table describes the capabilities and network support for each of the integration runtime types:

IR 类型IR type 公用网络Public network 专用网络Private network
AzureAzure 数据移动Data movement
活动分派Activity dispatch
 
自承载Self-hosted 数据移动Data movement
活动分派Activity dispatch
数据移动Data movement
活动分派Activity dispatch
Azure-SSISAzure-SSIS SSIS 包执行SSIS package execution SSIS 包执行SSIS package execution

下图介绍了如何结合使用不同的集成运行时以提供丰富的数据集成功能和网络支持:The following diagram shows how the different integration runtimes can be used in combination to offer rich data integration capabilities and network support:

不同类型的集成运行时

Azure 集成运行时Azure integration runtime

Azure 集成运行时能够:An Azure integration runtime is capable of:

  • 在云数据存储之间运行复制活动Running copy activity between cloud data stores
  • 在公用网络中分派以下转换活动:HDInsight Hive 活动、HDInsight Pig 活动、HDInsight MapReduce 活动、HDInsight Spark 活动、HDInsight Streaming 活动、Machine Learning Batch Execution 活动、Machine Learning Update Resource 活动、Stored Procedure 活动、Data Lake Analytics U-SQL 活动、Net 自定义活动、Web 活动、Lookup 活动和 Get Metadata 活动。Dispatching the following transform activities in public network: HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .Net custom activity, Web activity, Lookup activity, and Get Metadata activity.

Azure IR 网络环境Azure IR network environment

Azure 集成运行时支持使用可公开访问的终结点连接到公用网络中的数据存储和计算服务。Azure Integration Runtime supports connecting to data stores and compute services in public network with public accessible endpoints. 为 Azure 虚拟网络环境使用自承载集成运行时。Use a self-hosted integration runtime for Azure Virtual Network environment.

Azure IR 计算资源和缩放Azure IR compute resource and scaling

Azure 集成运行时在 Azure 中提供完全托管的无服务器计算。Azure integration runtime provides a fully managed, serverless compute in Azure. 无需担心基础结构配置、软件安装、修补或功能扩展。You don’t have to worry about infrastructure provision, software installation, patching, or capacity scaling. 此外,只需为实际使用时间付费。In addition, you only pay for the duration of the actual utilization.

Azure 集成运行时提供了使用安全、可靠和高性能的方式在云数据存储之间移动数据的本机计算。Azure integration runtime provides the native compute to move data between cloud data stores in a secure, reliable, and high-performance manner. 可以设置在复制活动上要使用的数据集成单元的数量,相应地,Azure IR 的计算大小弹性地纵向扩展,无需显式调整 Azure 集成运行时的大小。You can set how many data integration units to use on the copy activity, and the compute size of the Azure IR is elastically scaled up accordingly without you having to explicitly adjusting size of the Azure Integration Runtime.

活动分派是将活动路由到目标计算服务的轻型操作,因此,无需纵向扩展此方案的计算大小。Activity dispatch is a lightweight operation to route the activity to the target compute service, so there isn’t need to scale up the compute size for this scenario.

有关创建和配置 Azure IR 的信息,请参阅操作方法指南下的如何创建和配置 Azure IR。For information about creating and configuring an Azure IR, see How to create and configure Azure IR under how to guides.

自承载集成运行时Self-hosted integration runtime

自承载 IR 能够:A self-hosted IR is capable of:

  • 在专用网络中的云数据存储和数据存储之间运行复制活动。Running copy activity between a cloud data stores and a data store in private network.
  • 对本地或 Azure 虚拟网络中的计算资源分派以下转换活动:HDInsight Hive 活动 (BYOC)、HDInsight Pig 活动 (BYOC)、HDInsight MapReduce 活动 (BYOC)、HDInsight Spark 活动 (BYOC)、HDInsight Streaming 活动 (BYOC)、Machine Learning Batch Execution 活动、Machine Learning Update Resource 活动、Stored Procedure 活动、Data Lake Analytics U-SQL 活动、.Net 自定义活动、Lookup 活动和 Get Metadata 活动。Dispatching the following transform activities against compute resources in On-Premise or Azure Virtual Network: HDInsight Hive activity (BYOC), HDInsight Pig activity (BYOC), HDInsight MapReduce activity (BYOC), HDInsight Spark activity (BYOC), HDInsight Streaming activity (BYOC), Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .Net custom activity, Lookup activity, and Get Metadata activity.

备注

使用自承载集成运行时支持需要自带驱动程序(如 SAP Hana、MySQL 等)的数据存储。有关详细信息,请参阅支持的数据存储Use self-hosted integration runtime to support data stores that requires bring-your-own driver such as SAP Hana, MySQL, etc. For more information, see supported data stores.

自承载 IR 网络环境Self-hosted IR network environment

如果想要在专用网络环境中安全地执行数据集成(不在公有云环境中建立直通连接),可以在企业防火墙后的本地环境中或虚拟专用网络内安装自承载 IR。If you want to perform data integration securely in a private network environment, which does not have a direct line-of-sight from the public cloud environment, you can install a self-hosted IR on premises environment behind your corporate firewall, or inside a virtual private network. 自承载集成运行时仅进行基于出站 HTTP 的连接,以打开 Internet。The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.

自承载 IR 计算资源和缩放Self-hosted IR compute resource and scaling

需要在本地计算机或专用网络中的虚拟机上安装自承载 IR。Self-hosted IR needs to be installed on an on-premises machine or a virtual machine inside a private network. 目前,仅支持在 Windows 操作系统上运行自承载 IR。Currently, we only support running the self-hosted IR on a Windows operating system.

为了获得高可用性和可伸缩性,可以通过在主动-主动模式中将逻辑实例与多个本地计算机相关联来向外扩展自承载 IR。For high availability and scalability, you can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode. 有关详细信息,请参阅操作方法指南下的“如何创建和配置自承载 IR”一文。For more information, see how to create and configure self-hosted IR article under how to guides for details.

Azure-SSIS 集成运行时Azure-SSIS Integration Runtime

若要提升和切换现有 SSIS 工作负荷,可以创建 Azure-SSIS IR 以本机执行 SSIS 包。To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.

Azure-SSIS IR 网络环境Azure-SSIS IR network environment

可以在公用网络或专用网络中配置 Azure-SSIS IR。Azure-SSIS IR can be provisioned in either public network or private network. 通过将 Azure-SSIS IR 加入连接到本地网络的虚拟网络来支持本地数据访问。On-premises data access is supported by joining Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

Azure-SSIS IR 计算资源和缩放Azure-SSIS IR compute resource and scaling

Azure-SSIS IR 是完全托管的 Azure VM 群集,专用于运行 SSIS 包。Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. 可以使用自己的 Azure SQL 数据库或托管实例(预览版)服务器托管附加到 SSIS 项目/包 (SSISDB) 的目录。You can bring your own Azure SQL Database or Managed Instance (Preview) server to host the catalog of SSIS projects/packages (SSISDB) that is going to be attached to it. 可以通过指定节点大小纵向扩展计算能力并通过指定群集中的节点数对其进行横向扩展。You can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster. 可以在认为合适时停止和启动 Azure-SSIS 集成运行时以管理运行的成本。You can manage the cost of running your Azure-SSIS Integration Runtime by stopping and starting it as you see fit.

有关详细信息,请参阅操作方法指南下的“如何创建和配置 Azure-SSIS IR”一文。For more information, see how to create and configure Azure-SSIS IR article under how to guides. 创建后,即可使用熟悉的工具(如 SQL Server 数据工具 (SSDT) 和 SQL Server Management Studio (SSMS))部署和管理现有 SSIS 包,无需对其更改或仅做少量更改。Once created, you can deploy and manage your existing SSIS packages with little to no change using familiar tools such as SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS), just like using SSIS on premises.

有关 Azure-SSIS 运行时的详细信息,请参阅以下文章:For more information about Azure-SSIS runtime, see the following articles:

  • 教程:将 SSIS 包部署到 AzureTutorial: deploy SSIS packages to Azure. 此文提供有关创建 Azure-SSIS IR,并使用 Azure SQL 数据库来承载 SSIS 目录的分步说明。This article provides step-by-step instructions to create an Azure-SSIS IR and uses an Azure SQL database to host the SSIS catalog.
  • 如何创建 Azure-SSIS 集成运行时How to: Create an Azure-SSIS integration runtime. 此文延伸了本教程的内容,提供了有关使用 Azure SQL 托管实例(预览版)以及将 IR 加入虚拟网络的说明。This article expands on the tutorial and provides instructions on using Azure SQL Managed Instance (Preview) and joining the IR to a virtual network.
  • 监视 Azure-SSIS IRMonitor an Azure-SSIS IR. 此文介绍如何检索有关 Azure-SSIS IR 的信息,以及返回的信息中的状态说明。This article shows you how to retrieve information about an Azure-SSIS IR and descriptions of statuses in the returned information.
  • 管理 Azure-SSIS IRManage an Azure-SSIS IR. 此文介绍如何停止、启动或删除 Azure-SSIS IR。This article shows you how to stop, start, or remove an Azure-SSIS IR. 此外,介绍如何通过在 Azure-SSIS IR 中添加更多节点来扩展 IR。It also shows you how to scale out your Azure-SSIS IR by adding more nodes to the IR.
  • 将 Azure-SSIS IR 加入虚拟网络Join an Azure-SSIS IR to a virtual network. 此文提供有关将 Azure-SSIS IR 加入 Azure 虚拟网络的概念性信息。This article provides conceptual information about joining an Azure-SSIS IR to an Azure virtual network. 此外,还介绍可以执行哪些步骤来使用 Azure 门户配置虚拟网络,以便 Azure-SSIS IR 能够加入虚拟网络。It also provides steps to use Azure portal to configure virtual network so that Azure-SSIS IR can join the virtual network.

集成运行时位置Integration runtime location

数据工厂位置是存储数据工厂元数据和启动管道触发所在的位置。The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from. 同时,数据工厂可以访问其他 Azure 区域的数据存储和计算数据,在数据存储之间移动数据或使用计算服务处理数据。Meanwhile, a data factory can access data stores and compute services in other Azure regions to move data between data stores or process data using compute services. 此行为通过全局可用 IR 来实现,以确保数据的符合性、有效性并减少网络对外费用。This behavior is realized through the globally available IR to ensure data compliance, efficiency, and reduced network egress costs.

IR 位置定义其后端计算的位置,尤其是执行数据移动、活动分派和 SSIS 包执行的位置。The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. IR 位置可能与数据工厂所属的位置不同。The IR location can be different from the location of the data factory it belongs to.

Azure IR 位置Azure IR location

可以设置 Azure IR 的特定位置,这样数据移动或活动调度就会发生在该特定区域。You can set a certain location of an Azure IR, in which case the data movement or activity dispatch will happen in that specific region.

如果选择使用默认的自动解析 Azure IR,则会出现以下情况:If you choose to use the auto-resolve Azure IR which is the default,

  • 对于复制活动,ADF 会尽量自动检测接收器和源数据存储,以便在可用的情况下选择同一区域的最佳位置,或者选择同一地理位置的最近位置,或者在不可检测的情况下使用数据工厂区域作为替代。For copy activity, ADF will make a best effort to automatically detect your sink and source data store to choose the best location either in the same region if available or the closest one in the same geography, or if not detectable to use the data factory region as alternative.
  • 若要执行查找/GetMetadata 活动和调度转换活动,ADF 会使用数据工厂区域中的 IR。For Lookup/GetMetadata activity execution and transformation activity dispatching, ADF will use the IR in the data factory region.

可以在 UI 或活动监视有效负载的管道活动监视视图中监视哪个 IR 位置在活动执行期间生效。You can monitor which IR location takes effect during activity execution in pipeline activity monitoring view on UI or activity monitoring payload.

提示

如果有严格的数据符合性要求,并需确保数据不离开特定的地域,则可在特定区域显式创建一个 Azure IR,然后使用 ConnectVia 属性将链接服务指向该 IR。If you have strict data compliance requirements and need ensure that data do not leave a certain geography, you can explicitly create an Azure IR in a certain region and point the Linked Service to this IR using ConnectVia property. 例如,若需将数据从英国南部的 Blob 复制到英国南部的 SQL DW,并且需确保数据不离开英国,请在英国南部创建一个 Azure IR,然后将两个链接服务均链接到该 IR。For example, if you want to copy data from Blob in UK South to SQL DW in UK South and want to ensure data do not leave UK, create an Azure IR in UK South and link both Linked Services to this IR.

自承载 IR 位置Self-hosted IR location

自承载 IR 逻辑上注册到数据工厂,用于支持其功能的计算由你提供。The self-hosted IR is logically registered to the Data Factory and the compute used to support its functionalities is provided by you. 因此,没有适用于自承载 IR 的显式位置属性。Therefore there is no explicit location property for self-hosted IR.

用于执行数据移动时,自承载 IR 从源提取数据并写入到目标。When used to perform data movement, the self-hosted IR extracts data from the source and writes into the destination.

Azure-SSIS IR 位置Azure-SSIS IR location

为你的 Azure-SSIS IR 选择正确的位置对在提取-转换-加载 (ETL) 工作流中实现高性能至关重要。Selecting the right location for your Azure-SSIS IR is essential to achieve high performance in your extract-transform-load (ETL) workflows.

  • Azure-SSIS IR 的位置无需与数据工厂的位置相同,但应与你自己的需要托管 SSISDB 的 Azure SQL 数据库/托管实例(预览版)服务器的位置相同。The location of your Azure-SSIS IR does not need be the same as the location of your data factory, but it should be the same as the location of your own Azure SQL Database/Managed Instance (Preview) server where SSISDB is to be hosted. 这样一来,Azure-SSIS 集成运行时可以轻松地访问 SSISDB,且不会在不同位置之间产生过多的流量。This way, your Azure-SSIS Integration Runtime can easily access SSISDB without incurring excessive traffics between different locations.
  • 如果没有托管 SSISDB 的现有 Azure SQL 数据库/托管实例(预览版)服务器,但有本地数据源/目标,应在连接到本地网络的虚拟网络的同一位置中创建新的 Azure SQL 数据库/托管实例(预览版)服务器。If you do not have an existing Azure SQL Database/Managed Instance (Preview) server to host SSISDB, but you have on-premises data sources/destinations, you should create a new Azure SQL Database/Managed Instance (Preview) server in the same location of a virtual network connected to your on-premises network. 这样一来,即可使用新的 Azure SQL 数据库/托管实例(预览版)服务器创建 Azure-SSIS IR 并加入该虚拟网络,全部在同一位置进行,从而有效地最大程度减少不同位置之间的数据移动。This way, you can create your Azure-SSIS IR using the new Azure SQL Database/Managed Instance (Preview) server and joining that virtual network, all in the same location, effectively minimizing data movements across different locations.
  • 如果托管 SSISDB 所在的现有 Azure SQL 数据库/托管实例(预览版)服务器的位置与连接到本地网络的虚拟网络的位置不同,请首先使用现有 Azure SQL 数据库/托管实例(预览版)服务器创建 Azure-SSIS IR,并在同一位置加入其他虚拟网络,然后配置不同位置之间的虚拟网络到虚拟网络连接。If the location of your existing Azure SQL Database/Managed Instance (Preview) server where SSISDB is hosted is not the same as the location of a virtual network connected to your on-premises network, first create your Azure-SSIS IR using an existing Azure SQL Database/Managed Instance (Preview) server and joining another virtual network in the same location, and then configure a virtual network to virtual network connection between different locations.

下图显示了数据工厂及其集成运行时的位置设置:The following diagram shows location settings of Data Factory and its integration run times:

集成运行时位置

确定要使用哪个 IRDetermining which IR to use

复制活动Copy activity

对于复制活动,它需要使用源和接收器链接服务,以定义数据流的方向。For Copy activity, it requires source and sink linked services to define the direction of data flow. 以下逻辑用于确定执行复制所使用的集成运行时实例的类型:The following logic is used to determine which integration runtime instance is used to perform the copy:

  • 在两个云数据源之间复制:当源和接收器链接服务都使用 Azure IR 时,ADF 会使用区域性的 Azure IR(如果已指定),或者自动确定 Azure IR 的位置,前提是你根据集成运行时位置部分的说明选择自动解析 IR(默认设置)。Copying between two cloud data sources: when both source and sink linked services are using Azure IR, ADF will use the regional Azure IR if you specified, or auto determine a location of Azure IR if you choose the auto-resolve IR (default) as described in Integration runtime location section.
  • 在云数据源和专用网络中的数据源之间复制:如果源或接收器链接服务指向自承载 IR,则在该自承载集成运行时上执行复制活动。Copying between a cloud data source and a data source in private network: if either source or sink linked service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
  • 在专用网络中的两个数据源之间复制:源和接收器链接服务必须同时指向同一集成运行时实例,且该集成运行时用于执行复制活动。Copying between two data sources in private network: both the source and sink Linked Service must point to the same instance of integration runtime, and that integration runtime is used to execute the copy Activity.

查找和 GetMetadata 活动Lookup and GetMetadata activity

查找和 GetMetadata 活动在关联到数据存储链接服务的集成运行时上执行。The Lookup and GetMetadata activity is executed on the integration runtime associated to the data store linked service.

转换活动Transformation activity

每个转换活动都有一个目标计算链接服务,该服务指向集成运行时。Each transformation activity has a target compute Linked Service, which points to an integration runtime. 该集成运行时实例是分派转换活动的实例。This integration runtime instance is where the transformation activity is dispatched from.

后续步骤Next steps

请参阅以下文章:See the following articles: