您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 数据工厂中的集成运行时Integration runtime in Azure Data Factory

集成运行时 (IR) 是 Azure 数据工厂用于在不同的网络环境之间提供以下数据集成功能的计算基础结构:The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments:

  • 数据移动:在公用网络中的数据存储和专用网络(本地或虚拟专用网络)中的数据存储之间移动数据。Data movement: Move data between data stores in public network and data stores in private network (on-premise or virtual private network). 它提供对内置连接器、格式转换、列映射以及性能和可扩展数据传输的支持。It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
  • 活动分派:分派和监视在各种计算服务(如 Azure HDInsight、Azure 机器学习、Azure SQL 数据库、SQL Server 等等)上运行的转换活动。Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more.
  • SSIS 包执行:在托管的 Azure 计算环境中本机执行 SQL Server 集成服务 (SSIS) 包。SSIS package execution: Natively execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.

备注

本文适用于目前处于预览版的数据工厂版本 2。This article applies to version 2 of Data Factory, which is currently in preview. 如果使用正式版 (GA) 1 版本的数据工厂服务,请参阅 数据工厂版本 1 文档If you are using version 1 of the Data Factory service, which is generally available (GA), see Data Factory version 1 documentation.

在数据工厂中,活动定义要执行的操作。In Data Factory, an activity defines the action to be performed. 链接服务定义目标数据存储或计算服务。A linked service defines a target data store or a compute service. 集成运行时提供活动和链接服务之间的桥梁。An integration runtime provides the bridge between the activity and linked Services. 它被链接服务引用,并提供运行或分派活动的计算环境。It is referenced by the linked service, and provides the compute environment where the activity either runs on or gets dispatched from. 这样一来,可以在最接近目标数据存储的区域中执行活动,或者,以最优性能计算服务的同时满足安全和合规性需求。This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.

集成运行时类型Integration runtime types

数据工厂提供三种类型的集成运行时,应选择最能满足你所寻求的数据集成功能和网络环境需求的类型。Data Factory offers three types of Integration Runtime, and you should choose the type that best serve the data integration capabilities and network environment needs you are looking for. 这三种类型是:These three types are:

  • AzureAzure
  • 自承载Self-hosted
  • Azure-SSISAzure-SSIS

下表介绍了针对其中每个集成运行时类型提供的功能和网络支持:The following table describes the capabilities and network support for each of the integration runtime types:

IR 类型IR type 公用网络Public network 专用网络Private network
AzureAzure 数据移动Data movement
活动分派Activity dispatch
 
自承载Self-hosted 数据移动Data movement
活动分派Activity dispatch
数据移动Data movement
活动分派Activity dispatch
Azure-SSISAzure-SSIS SSIS 包执行SSIS package execution SSIS 包执行SSIS package execution

下图介绍了如何结合使用不同的集成运行时以提供丰富的数据集成功能和网络支持:The following diagram shows how the different integration runtimes can be used in combination to offer rich data integration capabilities and network support:

不同类型的集成运行时

Azure 集成运行时Azure integration runtime

Azure 集成运行时能够:An Azure integration runtime is capable of:

  • 在云数据存储之间运行复制活动Running copy activity between cloud data stores
  • 在公用网络中分派以下转换活动:HDInsight Hive 活动、HDInsight Pig 活动、HDInsight MapReduce 活动、HDInsight Spark 活动、HDInsight Streaming 活动、Machine Learning Batch Execution 活动、Machine Learning Update Resource 活动、Stored Procedure 活动、Data Lake Analytics U-SQL 活动、Net 自定义活动、Web 活动、Lookup 活动和 Get Metadata 活动。Dispatching the following transform activities in public network: HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .Net custom activity, Web activity, Lookup activity, and Get Metadata activity.

网络环境Network environment

Azure 集成运行时支持使用可公开访问的终结点连接到公用网络中的数据存储和计算服务。Azure Integration Runtime supports connecting to data stores and compute services in public network with public accessible endpoints. 为 Azure 虚拟网络环境使用自承载集成运行时。Use a self-hosted integration runtime for Azure Virtual Network environment.

计算资源和缩放Compute resource and scaling

Azure 集成运行时在 Azure 中提供完全托管的无服务器计算。Azure integration runtime provides a fully managed, serverless compute in Azure. 无需担心基础结构配置、软件安装、修补或功能扩展。You don’t have to worry about infrastructure provision, software installation, patching, or capacity scaling. 此外,只需为实际使用时间付费。In addition, you only pay for the duration of the actual utilization.

Azure 集成运行时提供了使用安全、可靠和高性能的方式在云数据存储之间移动数据的本机计算。Azure integration runtime provides the native compute to move data between cloud data stores in a secure, reliable, and high-performance manner. 可以设置在复制活动上要使用的数据移动单位的数量,相应的,Azure IR 的计算大小弹性地纵向扩展,无需显式调整 Azure 集成运行时的大小。You can set how many data movement units to use on the copy activity, and the compute size of the Azure IR is elastically scaled up accordingly without you having to explicitly adjusting size of the Azure Integration Runtime.

活动分派是将活动路由到目标计算服务的轻型操作,因此,无需纵向扩展此方案的计算大小。Activity dispatch is a lightweight operation to route the activity to the target compute service, so there isn’t need to scale up the compute size for this scenario.

有关创建和配置 Azure IR 的信息,请参阅操作方法指南下的如何创建和配置 Azure IR。For information about creating and configuring an Azure IR, see How to create and configure Azure IR under how to guides.

自承载集成运行时Self-hosted integration runtime

自承载 IR 能够:A self-hosted IR is capable of:

  • 在专用网络中的云数据存储和数据存储之间运行复制活动。Running copy activity between a cloud data stores and a data store in private network.
  • 对本地或 Azure 虚拟网络中的计算资源分派以下转换活动:HDInsight Hive 活动 (BYOC)、HDInsight Pig 活动 (BYOC)、HDInsight MapReduce 活动 (BYOC)、HDInsight Spark 活动 (BYOC)、HDInsight Streaming 活动 (BYOC)、Machine Learning Batch Execution 活动、Machine Learning Update Resource 活动、Stored Procedure 活动、Data Lake Analytics U-SQL 活动、.Net 自定义活动、Lookup 活动和 Get Metadata 活动。Dispatching the following transform activities against compute resources in On-Premise or Azure Virtual Network: HDInsight Hive activity (BYOC), HDInsight Pig activity (BYOC), HDInsight MapReduce activity (BYOC), HDInsight Spark activity (BYOC), HDInsight Streaming activity (BYOC), Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .Net custom activity, Lookup activity, and Get Metadata activity.

备注

使用自承载集成运行时支持需要自带驱动程序(如 SAP Hana、MySQL 等)的数据存储。有关详细信息,请参阅支持的数据存储Use self-hosted integration runtime to support data stores that requires bring-your-own driver such as SAP Hana, MySQL, etc. For more information, see supported data stores.

网络环境Network environment

如果想要在专用网络环境中安全地执行数据集成(不在公有云环境中建立直通连接),可以在企业防火墙后的本地环境中或虚拟专用网络内安装自承载 IR。If you want to perform data integration securely in a private network environment, which does not have a direct line-of-sight from the public cloud environment, you can install a self-hosted IR on premises environment behind your corporate firewall, or inside a virtual private network. 自承载集成运行时仅进行基于出站 HTTP 的连接,以打开 Internet。The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.

计算资源和缩放Compute resource and scaling

需要在本地计算机或专用网络中的虚拟机上安装自承载 IR。Self-hosted IR needs to be installed on an on-premises machine or a virtual machine inside a private network. 目前,仅支持在 Windows 操作系统上运行自承载 IR。Currently, we only support running the self-hosted IR on a Windows operating system.

为了获得高可用性和可伸缩性,可以通过在主动-主动模式中将逻辑实例与多个本地计算机相关联来向外扩展自承载 IR。For high availability and scalability, you can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode. 有关详细信息,请参阅操作方法指南下的“如何创建和配置自承载 IR”一文。For more information, see how to create and configure self-hosted IR article under how to guides for details.

Azure-SSIS 集成运行时Azure-SSIS Integration Runtime

若要提升和切换现有 SSIS 工作负荷,可以创建 Azure-SSIS IR 以本机执行 SSIS 包。To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.

网络环境Network environment

可以在公用网络或专用网络中配置 Azure-SSIS IR。Azure-SSIS IR can be provisioned in either public network or private network. 通过将 Azure-SSIS IR 加入连接到本地网络的虚拟网络来支持本地数据访问。On-premises data access is supported by joining Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

计算资源和缩放Compute resource and scaling

Azure-SSIS IR 是完全托管的 Azure VM 群集,专用于运行 SSIS 包。Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. 可以使用自己的 Azure SQL 数据库或托管实例(预览版)服务器托管附加到 SSIS 项目/包 (SSISDB) 的目录。You can bring your own Azure SQL Database or Managed Instance (Preview) server to host the catalog of SSIS projects/packages (SSISDB) that is going to be attached to it. 可以通过指定节点大小纵向扩展计算能力并通过指定群集中的节点数对其进行横向扩展。You can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster. 可以在认为合适时停止和启动 Azure-SSIS 集成运行时以管理运行的成本。You can manage the cost of running your Azure-SSIS Integration Runtime by stopping and starting it as you see fit.

有关详细信息,请参阅操作方法指南下的“如何创建和配置 Azure-SSIS IR”一文。For more information, see how to create and configure Azure-SSIS IR article under how to guides. 创建后,即可使用熟悉的工具(如 SQL Server 数据工具 (SSDT) 和 SQL Server Management Studio (SSMS))部署和管理现有 SSIS 包,无需对其更改或仅做少量更改。Once created, you can deploy and manage your existing SSIS packages with little to no change using familiar tools such as SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS), just like using SSIS on premises.

有关 Azure-SSIS 运行时的详细信息,请参阅以下文章:For more information about Azure-SSIS runtime, see the following articles:

  • 教程:将 SSIS 包部署到 AzureTutorial: deploy SSIS packages to Azure. 此文提供有关创建 Azure-SSIS IR,并使用 Azure SQL 数据库来承载 SSIS 目录的分步说明。This article provides step-by-step instructions to create an Azure-SSIS IR and uses an Azure SQL database to host the SSIS catalog.
  • 如何创建 Azure-SSIS 集成运行时How to: Create an Azure-SSIS integration runtime. 此文延伸了本教程的内容,提供了有关使用 Azure SQL 托管实例(预览版)以及将 IR 加入虚拟网络的说明。This article expands on the tutorial and provides instructions on using Azure SQL Managed Instance (Preview) and joining the IR to a virtual network.
  • 监视 Azure-SSIS IRMonitor an Azure-SSIS IR. 此文介绍如何检索有关 Azure-SSIS IR 的信息,以及返回的信息中的状态说明。This article shows you how to retrieve information about an Azure-SSIS IR and descriptions of statuses in the returned information.
  • 管理 Azure-SSIS IRManage an Azure-SSIS IR. 此文介绍如何停止、启动或删除 Azure-SSIS IR。This article shows you how to stop, start, or remove an Azure-SSIS IR. 此外,介绍如何通过在 Azure-SSIS IR 中添加更多节点来扩展 IR。It also shows you how to scale out your Azure-SSIS IR by adding more nodes to the IR.
  • 将 Azure-SSIS IR 加入虚拟网络Join an Azure-SSIS IR to a virtual network. 此文提供有关将 Azure-SSIS IR 加入 Azure 虚拟网络的概念性信息。This article provides conceptual information about joining an Azure-SSIS IR to an Azure virtual network. 此外,还介绍可以执行哪些步骤来使用 Azure 门户配置虚拟网络,以便 Azure-SSIS IR 能够加入虚拟网络。It also provides steps to use Azure portal to configure virtual network so that Azure-SSIS IR can join the virtual network.

确定要使用哪个 IRDetermining which IR to use

每个转换活动都有一个目标计算链接服务,该服务指向集成运行时。Each transformation activity has a target compute Linked Service, which points to an integration runtime. 该集成运行时实例是分派转换活动的实例。This integration runtime instance is where the transformation activity is dispatched from.

对于复制活动,它需要使用源和接收器链接服务,以定义数据流的方向。For Copy activity, it requires source and sink linked services to define the direction of data flow. 以下逻辑用于确定执行复制所使用的集成运行时实例的类型:The following logic is used to determine which integration runtime instance is used to perform the copy:

  • 在两个云数据源之间复制:当源和接收器链接服务都使用 Azure IR 时,接收器链接服务所使用的集成运行时用于执行复制活动。Copying between two cloud data sources: when both source and sink linked services are using Azure IR, the integration runtime used by the sink linked Service is used to execute the Copy activity.
  • 在云数据源和专用网络中的数据源之间复制:如果源或接收器链接服务指向自承载 IR,则在该自承载集成运行时上执行复制活动。Copying between a cloud data source and a data source in private network: if either source or sink linked service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
  • 在专用网络中的两个数据源之间复制:源和接收器链接服务必须同时指向同一集成运行时实例,且该集成运行时用于执行复制活动。Copying between two data sources in private network: Both the source and sink Linked Service must point to the same instance of integration runtime, and that integration runtime is used to execute the copy Activity.

下图介绍两个复制活动示例:The following diagram shows two copy activity samples:

  • 对于复制活动 1,其源是引用自承载 IR A 的 SQL Server 链接服务,且其接收器是引用 Azure IR B 的 Azure 存储链接服务。当复制活动运行时,它在自承载 IR A 上执行。For Copy activity 1, its source is a SQL Server Linked Service referencing a self-hosted IR A, and its sink is an Azure Storage Linked Service referencing an Azure IR B. When the copy activity runs, it is executed on the self-hosted IR A.
  • 对于复制活动 2,其源是引用 Azure IR C 的 Azure SQL 数据库链接服务,且其接收器是引用 Azure IR B 的 Azure 存储链接服务。复制活动运行时,它在 Azure IR B 上执行,因为它是供接收器链接服务使用的集成运行时。For Copy activity 2, its source is an Azure SQL Database Linked Service referencing an Azure IR C, and its sink is an Azure Storage Linked Service referencing Azure IR B. When the copy activity runs, it is executed on the Azure IR B as it’s the integration runtime used by sink Linked Service.

要使用哪个 IR

集成运行时位置Integration runtime location

数据工厂位置是存储数据工厂元数据和启动管道触发所在的位置。The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from. 目前,支持的数据工厂位置有:美国东部、美国东部 2、东南亚、西欧。Currently, the supported Data Factory locations are: East US, East US 2, Southeast Asia and West Europe. 但是,数据工厂可以访问其他 Azure 区域的数据存储和计算数据,在数据存储之间移动数据或使用计算服务处理数据。However, a data factory can access data stores and compute services in other Azure regions to move data between data stores or process data using compute services. 此行为通过多个区域中全局可用的 IR 来实现,以确保数据的合规性、有效性并减少网络对外费用。This behavior is realized through the IR available globally in multiple regions to ensure data compliance, efficiency, and reduced network egress costs.

IR 位置定义其后端计算的位置,尤其是执行数据移动、活动分派和 SSIS 包执行的位置。The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. IR 位置可能与数据工厂所属的位置不同。The IR location can be different from the location of the data factory it belongs to. 下图显示了数据工厂及其集成运行时的位置设置:The following diagram shows location settings of Data Factory and its integration run times:

集成运行时位置

Azure IRAzure IR

数据工厂使用最接近同一地域的接收器的区域中的 Azure IR 来移动数据。Data Factory uses an Azure IR in the region that is closest to the sink in the same geography to move the data. 请参照下表进行映射:Refer to the following table for mapping:

接收器数据存储的地理位置Geography of the sink data store 接收器数据存储的位置Location of the sink data store Azure 集成运行时使用的位置Location used for Azure Integration Runtime
美国United States 美国东部East US 美国东部East US
  美国东部 2East US 2 美国东部 2East US 2
  美国中部Central US 美国中部Central US
  美国中北部North Central US 美国中北部North Central US
  美国中南部South Central US 美国中南部South Central US
  美国中西部West Central US 美国中西部West Central US
  美国西部West US 美国西部West US
  美国西部 2West US 2 美国西部 2West US 2
加拿大Canada 加拿大东部Canada East 加拿大中部Canada Central
  加拿大中部Canada Central 加拿大中部Canada Central
巴西Brazil 巴西南部Brazil South 巴西南部Brazil South
欧洲Europe 北欧North Europe 北欧North Europe
  欧洲西部West Europe 欧洲西部West Europe
英国United Kingdom 英国西部UK West 英国南部UK South
  英国南部UK South 英国南部UK South
亚太区Asia Pacific 东南亚Southeast Asia 东南亚Southeast Asia
  东亚East Asia 东南亚Southeast Asia
澳大利亚Australia 澳大利亚东部Australia East 澳大利亚东部Australia East
  澳大利亚东南部Australia Southeast 澳大利亚东南部Australia Southeast
日本Japan 日本东部Japan East 日本东部Japan East
  日本西部Japan West 日本东部Japan East
韩国Korea 韩国中部Korea Central 韩国中部Korea Central
  韩国南部Korea South 韩国中部Korea Central
印度India 印度中部Central India 印度中部Central India
  印度西部West India 印度中部Central India
  印度南部South India 印度中部Central India

也可将 Azure IR 的位置设置为自动解决,这意味着数据工厂在基于链接服务定义自动检测要使用的最佳位置时尽最大努力。You can also set the Location of an Azure IR to auto-resolve, which means Data Factory makes a best effort in automatically detecting the best location to use based on the linked service definition.

备注

如果目标数据存储的区域不在列表中或未找到该区域,出于合规性原因,活动会失败,而不会通过其他区域完成。If the region of the destination data store is not in the list or undetectable, the activity fails instead of going through an alternative region for compliance reasons. 在这种情况下,显式指示用于执行复制的其他位置。In this case, indicate explicitly the alternative Location to use to perform the copy.

下图显示了 Azure IR 的位置被设置为自动解决时的有效位置示例。The following picture shows an example of the effective location when the location of Azure IR is set as auto-resolve. 复制活动执行时,它检测数据目标的位置,在本示例中是“日本西部”。When a copy activity is executed, it detects the location of the data destination, in this example it is Japan West. 基于该表,“日本东部”中的 Azure IR 用于执行实际数据复制。Based on the table, an Azure IR in Japan East is used to perform the actual data copy. 使用同一 IR 连接到 Spark 活动的 HDInsight 时,Spark 应用程序提交从数据工厂位置进行(在该示例中,是美国东部),而 Spark 应用程序的实际执行从 HDInsight 服务器位置进行。When the same IR is used to connect to HDInsight for a Spark activity, the Spark application submission happens from Data Factory location, in this example it is East US, and the actual execution of the Spark application happens on the HDInsight server location.

有效位置

自承载 IRSelf-hosted IR

自承载 IR 逻辑上注册到数据工厂,用于支持其功能的计算由你提供。The self-hosted IR is logically registered to the Data Factory and the compute used to support its functionalities is provided by you. 因此,没有适用于自承载 IR 的显式位置属性。Therefore there is no explicit location property for self-hosted IR.

用于执行数据移动时,自承载 IR 从源提取数据并写入到目标。When used to perform data movement, the self-hosted IR extracts data from the source and writes into the destination.

Azure-SSIS IRAzure-SSIS IR

为你的 Azure-SSIS IR 选择正确的位置对在提取-转换-加载 (ETL) 工作流中实现高性能至关重要。Selecting the right location for your Azure-SSIS IR is essential to achieve high performance in your extract-transform-load (ETL) workflows. 预览版最初可在六个位置使用(美国东部、美国东部 2、美国中部、澳大利亚东部、北欧和西欧)。Six locations are initially available for preview (East US, East US 2, Central US, Australia East, North Europe, and West Europe).

  • Azure-SSIS IR 的位置无需与数据工厂的位置相同,但应与你自己的需要托管 SSISDB 的 Azure SQL 数据库/托管实例(预览版)服务器的位置相同。The location of your Azure-SSIS IR does not need be the same as the location of your data factory, but it should be the same as the location of your own Azure SQL Database/Managed Instance (Preview) server where SSISDB is to be hosted. 这样一来,Azure-SSIS 集成运行时可以轻松地访问 SSISDB,且不会在不同位置之间产生过多的流量。This way, your Azure-SSIS Integration Runtime can easily access SSISDB without incurring excessive traffics between different locations.
  • 如果没有托管 SSISDB 的现有 Azure SQL 数据库/托管实例(预览版)服务器,但有本地数据源/目标,应在连接到本地网络的虚拟网络的同一位置中创建新的 Azure SQL 数据库/托管实例(预览版)服务器。If you do not have an existing Azure SQL Database/Managed Instance (Preview) server to host SSISDB, but you have on-premises data sources/destinations, you should create a new Azure SQL Database/Managed Instance (Preview) server in the same location of a virtual network connected to your on-premises network. 这样一来,即可使用新的 Azure SQL 数据库/托管实例(预览版)服务器创建 Azure-SSIS IR 并加入该虚拟网络,全部在同一位置进行,从而有效地最大程度减少不同位置之间的数据移动。This way, you can create your Azure-SSIS IR using the new Azure SQL Database/Managed Instance (Preview) server and joining that virtual network, all in the same location, effectively minimizing data movements across different locations.
  • 如果托管 SSISDB 所在的现有 Azure SQL 数据库/托管实例(预览版)服务器的位置与连接到本地网络的虚拟网络的位置不同,请首先使用现有 Azure SQL 数据库/托管实例(预览版)服务器创建 Azure-SSIS IR,并在同一位置加入其他虚拟网络,然后配置不同位置之间的虚拟网络到虚拟网络连接。If the location of your existing Azure SQL Database/Managed Instance (Preview) server where SSISDB is hosted is not the same as the location of a virtual network connected to your on-premises network, first create your Azure-SSIS IR using an existing Azure SQL Database/Managed Instance (Preview) server and joining another virtual network in the same location, and then configure a virtual network to virtual network connection between different locations.

后续步骤Next steps

请参阅以下文章:See the following articles: