Azure Data Factory 中的整合執行階段Integration runtime in Azure Data Factory

整合執行階段 (IR) 是 Azure Data Factory 所使用的計算基礎結構,可跨不同網路環境提供下列資料整合功能:The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments:

  • 資料流程:在受控 Azure 計算環境中執行資料流程Data Flow: Execute a Data Flow in managed Azure compute environment.
  • 資料移動:在公用網路中的資料存放區與私人網路 (內部部署或虛擬私人網路) 中的資料存放區之間複製資料。Data movement: Copy data across data stores in public network and data stores in private network (on-premises or virtual private network). 它支援內建的連接器、格式轉換、資料行對應,以及高效能和可調式資料轉送。It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
  • 活動分派:分派和監視在各種計算服務 (例如 Azure Databricks、Azure HDInsight、Azure Machine Learning、Azure SQL Database、SQL Server 等等) 上執行的轉換活動。Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure Databricks, Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more.
  • SSIS 套件執行:在受控 Azure 計算環境中,以原生方式執行 SQL Server Integration Services (SSIS) 套件。SSIS package execution: Natively execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.

在 Data Factory 中,活動可定義要執行的動作。In Data Factory, an activity defines the action to be performed. 連結服務可定義目標資料存放區或計算服務。A linked service defines a target data store or a compute service. 整合執行階段提供活動與連結服務之間的橋樑。An integration runtime provides the bridge between the activity and linked Services. 連結服務或活動會參考它, 並提供在其上執行或分派活動的計算環境。It is referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from. 如此一來,就能在最接近目標資料存放區或計算服務的區域執行活動,效率最高,又滿足安全性和合規性需求。This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.

整合執行階段類型Integration runtime types

Data Factory 提供三種整合執行階段,您應該選擇最符合所需之資料整合功能和網路環境需求的類型。Data Factory offers three types of Integration Runtime, and you should choose the type that best serve the data integration capabilities and network environment needs you are looking for. 這三種類型為:These three types are:

  • AzureAzure
  • 自我裝載Self-hosted
  • Azure-SSISAzure-SSIS

下表描述每個整合執行階段類型的功能和網路支援:The following table describes the capabilities and network support for each of the integration runtime types:

IR 類型IR type 公用網路Public network 私人網路Private network
AzureAzure 資料流程Data Flow
資料移動Data movement
活動分派Activity dispatch
 
自我裝載Self-hosted 資料移動Data movement
活動分派Activity dispatch
資料移動Data movement
活動分派Activity dispatch
Azure-SSISAzure-SSIS SSIS 套件執行SSIS package execution SSIS 套件執行SSIS package execution

下圖顯示如何結合使用不同的整合執行階段,以提供豐富的資料整合功能和網路支援:The following diagram shows how the different integration runtimes can be used in combination to offer rich data integration capabilities and network support:

不同類型的整合執行階段

Azure 整合執行階段Azure integration runtime

Azure 整合執行階段能夠:An Azure integration runtime is capable of:

  • 在 Azure 中執行資料流程Running Data Flows in Azure
  • 在雲端資料存放區之間執行複製活動Running copy activity between cloud data stores
  • 在公用網路中分派下列轉換活動:Databricks 筆記本/Jar/Python 活動、HDInsight Hive 活動、HDInsight Pig 活動、HDInsight MapReduce 活動、HDInsight Spark 活動、HDInsight 串流活動、Machine Learning 批次執行活動、Machine Learning 更新資源活動、預存程式活動、Data Lake Analytics 的 U-SQL 活動、.NET 自訂活動、Web 活動、查閱活動, 以及取得中繼資料活動。Dispatching the following transform activities in public network: Databricks Notebook/ Jar/ Python activity, HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .NET custom activity, Web activity, Lookup activity, and Get Metadata activity.

Azure IR 網路環境Azure IR network environment

Azure Integration Runtime 支援連接到資料存放區和具有公用可存取端點的計算服務。Azure Integration Runtime supports connecting to data stores and compute services with public accessible endpoints. 在 Azure 虛擬網路環境中使用自我裝載整合執行階段。Use a self-hosted integration runtime for Azure Virtual Network environment.

Azure IR 計算資源和調整規模Azure IR compute resource and scaling

Azure 整合執行階段在 Azure 中提供完全受控、無伺服器的計算。Azure integration runtime provides a fully managed, serverless compute in Azure. 您不必擔心基礎結構佈建、軟體安裝、修補或容量大小調整。You don’t have to worry about infrastructure provision, software installation, patching, or capacity scaling. 此外,您只需支付實際使用時間。In addition, you only pay for the duration of the actual utilization.

Azure 整合執行階段提供原生計算,能夠以安全、可靠且高效能的方式,在雲端資料存放區之間移動資料。Azure integration runtime provides the native compute to move data between cloud data stores in a secure, reliable, and high-performance manner. 您可以設定要在複製活動上使用的資料整合單位數量,Azure IR 的計算大小會很有彈性地相應增加,您不必明確地調整 Azure Integration Runtime 的大小。You can set how many data integration units to use on the copy activity, and the compute size of the Azure IR is elastically scaled up accordingly without you having to explicitly adjusting size of the Azure Integration Runtime.

活動分派是輕量型的作業,可將活動路由傳送至目標計算服務,所以在此情節中,不需要相應增加計算大小。Activity dispatch is a lightweight operation to route the activity to the target compute service, so there isn’t need to scale up the compute size for this scenario.

如需有關建立及設定 Azure IR 的資訊,請參閱操作說明指南中的<如何建立和設定 Azure IR>。For information about creating and configuring an Azure IR, see How to create and configure Azure IR under how to guides.

注意

Azure Integration runtime 具有與資料流程執行時間相關的屬性, 它會定義用來執行資料流程的基礎計算基礎結構。Azure Integration runtime has properties related to Data Flow runtime, which defines the underlying compute infrastructure that would be used to run the data flows on.

自我裝載整合執行階段Self-hosted integration runtime

自我裝載 IR 能夠:A self-hosted IR is capable of:

  • 在雲端資料存放區和私人網路中的資料存放區之間執行複製活動。Running copy activity between a cloud data stores and a data store in private network.
  • 在內部部署或 Azure 虛擬網路中,針對計算資源分派下列轉換活動:HDInsight Hive 活動 (BYOC-攜帶您自己的叢集)、HDInsight Pig 活動 (BYOC)、hdinsight MapReduce 活動 (BYOC)、HDInsight Spark 活動 (BYOC)、HDInsight 串流活動 (BYOC)、Machine Learning 批次執行活動、Machine Learning更新資源活動、預存程式活動、Data Lake Analytics 的 U-SQL 活動、自訂活動 (在 Azure Batch 上執行)、查閱活動, 以及取得中繼資料活動。Dispatching the following transform activities against compute resources in On-Premise or Azure Virtual Network: HDInsight Hive activity (BYOC-Bring Your Own Cluster), HDInsight Pig activity (BYOC), HDInsight MapReduce activity (BYOC), HDInsight Spark activity (BYOC), HDInsight Streaming activity (BYOC), Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, Custom activity (runs on Azure Batch), Lookup activity, and Get Metadata activity.

注意

使用自我裝載整合執行階段來支援需要自備驅動程式 (例如 SAP Hana、MySQL 等) 的資料存放區。如需詳細資訊,請參閱支援的資料存放區Use self-hosted integration runtime to support data stores that requires bring-your-own driver such as SAP Hana, MySQL, etc. For more information, see supported data stores.

自我裝載 IR 網路環境Self-hosted IR network environment

如果想要在私人網路環境 (無法從公用雲端環境直接存取) 中安全地執行資料整合,您可以在內部部署環境、公司防火牆後方或虛擬私人網路內安裝自我裝載 IR。If you want to perform data integration securely in a private network environment, which does not have a direct line-of-sight from the public cloud environment, you can install a self-hosted IR on premises environment behind your corporate firewall, or inside a virtual private network. 自我裝載整合執行階段對於開放網際網路,只會建立以 HTTP 為基礎的輸出連線。The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.

自我裝載 IR 計算資源和調整規模Self-hosted IR compute resource and scaling

自我裝載 IR 必須安裝在私人網路內的內部部署機器或虛擬機器上。Self-hosted IR needs to be installed on an on-premises machine or a virtual machine inside a private network. 目前,我們只支援在 Windows 作業系統上執行自我裝載 IR。Currently, we only support running the self-hosted IR on a Windows operating system.

若要達到高可用性和延展性,您可以在主動-主動模式下,將邏輯執行個體和多個內部部署機器產生關聯,以相應放大自我裝載 IR。For high availability and scalability, you can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode. 如需詳細資訊, 請參閱如何建立和設定自我裝載 IR 一文, 以取得詳細資料。For more information, see how to create and configure self-hosted IR article under how to guides for details.

Azure-SSIS 整合執行階段Azure-SSIS Integration Runtime

若要隨即轉移現有的 SSIS 工作負載,您可以建立 Azure-SSIS IR,以原生方式執行 SSIS 套件。To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.

Azure-SSIS IR 網路環境Azure-SSIS IR network environment

Azure-SSIS IR 可以佈建在公用網路或私人網路中。Azure-SSIS IR can be provisioned in either public network or private network. 將 Azure-SSIS IR 加入已連線至內部網路的虛擬網路,即可支援內部部署資料存取。On-premises data access is supported by joining Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

Azure-SSIS IR 計算資源和調整規模Azure-SSIS IR compute resource and scaling

Azure-SSIS IR 是一個完全受控的 Azure VM 叢集,專門用來執行您的 SSIS 套件。Azure-SSIS IR is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. 您可以自備 Azure SQL Database 或受控執行個體伺服器,以裝載要附加至伺服器的 SSIS 專案/套件 (SSISDB) 目錄。You can bring your own Azure SQL Database or Managed Instance server to host the catalog of SSIS projects/packages (SSISDB) that is going to be attached to it. 指定節點大小可以相應增加計算能力,指定叢集的節點數目可以相應放大計算能力。You can scale up the power of the compute by specifying node size and scale it out by specifying the number of nodes in the cluster. 您可以依需求來停止和啟動 Azure-SSIS 整合執行階段,以掌控其執行成本。You can manage the cost of running your Azure-SSIS Integration Runtime by stopping and starting it as you see fit.

如需詳細資訊,請參閱操作說明指南中的<如何建立和設定 Azure-SSIS IR>一文。For more information, see how to create and configure Azure-SSIS IR article under how to guides. 建立之後,您可以使用熟悉的工具,例如 SQL Server Data Tools (SSDT) 和 SQL Server Management Studio (SSMS),就像在內部部署環境中使用 SSIS 一樣,不太需要變更就能部署和管理現有的 SSIS 套件。Once created, you can deploy and manage your existing SSIS packages with little to no change using familiar tools such as SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS), just like using SSIS on premises.

如需 Azure-SSIS 執行階段的詳細資訊,請參閱下列文章:For more information about Azure-SSIS runtime, see the following articles:

  • 教學課程:將 SSIS 套件部署至 AzureTutorial: deploy SSIS packages to Azure. 本文逐步說明如何建立 Azure-SSIS IR,並使用 Azure SQL 資料庫裝載 SSIS 目錄。This article provides step-by-step instructions to create an Azure-SSIS IR and uses an Azure SQL database to host the SSIS catalog.
  • 操作說明:建立 Azure-SSIS 整合執行階段How to: Create an Azure-SSIS integration runtime. 這篇文章會詳述教學課程,並提供使用 Azure SQL Database 受控執行個體,以及將 IR 加入虛擬網路的指示。This article expands on the tutorial and provides instructions on using Azure SQL Database Managed Instance and joining the IR to a virtual network.
  • 監視 Azure-SSIS IR.Monitor an Azure-SSIS IR. 本文示範如何在傳回的資訊中擷取 Azure-SSIS IR 的相關資訊和狀態描述。This article shows you how to retrieve information about an Azure-SSIS IR and descriptions of statuses in the returned information.
  • 管理 Azure-SSIS IR.Manage an Azure-SSIS IR. 本文示範如何停止、啟動或移除 Azure-SSIS IR。This article shows you how to stop, start, or remove an Azure-SSIS IR. 它也會示範如何將更多節點新增至 IR,藉此相應放大 Azure-SSIS IR。It also shows you how to scale out your Azure-SSIS IR by adding more nodes to the IR.
  • 將 Azure-SSIS IR 加入虛擬網路Join an Azure-SSIS IR to a virtual network. 這篇文章提供將 Azure SSIS IR 加入至 Azure 虛擬網路的概念資訊。This article provides conceptual information about joining an Azure-SSIS IR to an Azure virtual network. 它也提供使用 Azure 入口網站來設定虛擬網路,好讓 Azure SSIS IR 可加入虛擬網路的步驟。It also provides steps to use Azure portal to configure virtual network so that Azure-SSIS IR can join the virtual network.

整合執行階段位置Integration runtime location

Data Factory 位置中儲存資料處理站的中繼資料,也是觸發管道的源頭。The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from. 同時,資料處理站可以存取其他 Azure 區域的資料存放區和計算資料,以在資料存放區之間移動資料或使用計算服務處理資料。Meanwhile, a data factory can access data stores and compute services in other Azure regions to move data between data stores or process data using compute services. 此行為會透過全域可用的 IR 來達成,以確保資料合規性、效率,並降低網路輸出成本。This behavior is realized through the globally available IR to ensure data compliance, efficiency, and reduced network egress costs.

「IR 位置」定義其後端計算的位置,基本上還會定義執行資料移動、活動分派和 SSIS 套件執行的位置。The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. IR 位置及其所屬的資料處理站位置可能不同。The IR location can be different from the location of the data factory it belongs to.

Azure IR 位置Azure IR location

您可以設定特定的 Azure IR 位置,在此情況下,資料移動或活動分派將會在該區域中執行。You can set a certain location of an Azure IR, in which case the data movement or activity dispatch will happen in that specific region.

如果您選擇使用自動解析 Azure IR這是預設值,If you choose to use the auto-resolve Azure IR which is the default,

  • 針對複製活動,ADF 會盡可能地自動偵測您的接收和來源資料存放區,以選擇位於相同區域中的最佳位置 (如果有的話) 或相同地理位置中最接近的位置;如果未偵測到,則使用資料處理站區域作為替代區域。For copy activity, ADF will make a best effort to automatically detect your sink and source data store to choose the best location either in the same region if available or the closest one in the same geography, or if not detectable to use the data factory region as alternative.

  • 用於查閱/GetMetadata/刪除活動執行 (也稱為管線活動)、轉換活動分派 (也稱為外部活動) 和撰寫作業 (測試連接、流覽資料夾清單和資料表清單、預覽資料)、ADF將使用 data factory 區域中的 IR。For Lookup/GetMetadata/Delete activity execution (also known as Pipeline activities), transformation activity dispatching (also known as External activities), and authoring operations (test connection, browse folder list and table list, preview data), ADF will use the IR in the data factory region.

  • 針對資料流程, ADF 會使用 data factory 區域中的 IR。For Data Flow, ADF will use the IR in the data factory region.

    提示

    最好的作法是確保資料流程會在與對應資料存放區相同的區域中執行 (如果可能的話)。A good practice would be to ensure Data flow runs in the same region as your corresponding data stores (if possible). 若要達到此目的, 您可以自動解決 Azure IR (如果資料存放區位置與 Data Factory 位置相同), 或在與資料存放區相同的區域中建立新的 Azure IR 實例, 然後在其上執行資料流程。You can either achieve this by auto-resolve Azure IR (if data store location is same as Data Factory location), or by creating a new Azure IR instance in the same region as your data stores and then execute the data flow on it.

您可以在 UI 上的管線活動監視檢視中或活動監視承載中,監視在活動執行期間生效的 IR 位置。You can monitor which IR location takes effect during activity execution in pipeline activity monitoring view on UI or activity monitoring payload.

提示

如果您有嚴格的資料合規性需求,且必須確定資料不會離開特定地理位置,您可以明確地在特定區域中建立 Azure IR,並使用 ConnectVia 屬性將連結服務指向此 IR。If you have strict data compliance requirements and need ensure that data do not leave a certain geography, you can explicitly create an Azure IR in a certain region and point the Linked Service to this IR using ConnectVia property. 例如,如果您想要將位於英國南部的 Blob 資料複製到英國南部的 SQL DW,且想要確保資料不會離開英國,請在英國南部中建立 Azure IR,並將兩個連結服務都連結至此 IR。For example, if you want to copy data from Blob in UK South to SQL DW in UK South and want to ensure data do not leave UK, create an Azure IR in UK South and link both Linked Services to this IR.

自我裝載 IR 位置Self-hosted IR location

自我裝載 IR 在邏輯上會向 Data Factory 註冊,而用來支援其功能的計算由您提供。The self-hosted IR is logically registered to the Data Factory and the compute used to support its functionalities is provided by you. 因此,自我裝載 IR 沒有明確的位置屬性。Therefore there is no explicit location property for self-hosted IR.

用來執行資料移動時,自我裝載 IR 會從來源取出資料,並寫入目的地。When used to perform data movement, the self-hosted IR extracts data from the source and writes into the destination.

Azure-SSIS IR 位置Azure-SSIS IR location

為了在擷取、轉換和下載 (ETL) 工作流程中達到高效能,務必選取正確的 Azure-SSIS IR 位置。Selecting the right location for your Azure-SSIS IR is essential to achieve high performance in your extract-transform-load (ETL) workflows.

  • Azure-SSIS IR 的位置不需要與資料處理站的位置相同,但應該與您自己的 Azure SQL Database/受控執行個體伺服器 (要裝載 SSISDB) 的位置相同。The location of your Azure-SSIS IR does not need be the same as the location of your data factory, but it should be the same as the location of your own Azure SQL Database/Managed Instance server where SSISDB is to be hosted. 如此一來,您的 Azure-SSIS 整合執行階段就可以輕易存取 SSISDB,而不會在不同的位置之間產生過多流量。This way, your Azure-SSIS Integration Runtime can easily access SSISDB without incurring excessive traffics between different locations.
  • 如果您沒有現有的 Azure SQL Database/受控執行個體伺服器來裝載 SSISDB,但有內部部署資料來源/目的地,您應該在已連線至內部部署網路之虛擬網路的相同位置中,建立新的 Azure SQL Database/受控執行個體伺服器。If you do not have an existing Azure SQL Database/Managed Instance server to host SSISDB, but you have on-premises data sources/destinations, you should create a new Azure SQL Database/Managed Instance server in the same location of a virtual network connected to your on-premises network. 如此一來,您就可以使用新的 Azure SQL Database/受控執行個體伺服器並加入該虛擬網路,以建立 Azure-SSIS IR,全部都在相同的位置中,能儘量避免在不同位置之間移動資料。This way, you can create your Azure-SSIS IR using the new Azure SQL Database/Managed Instance server and joining that virtual network, all in the same location, effectively minimizing data movements across different locations.
  • 如果要裝載 SSISDB 的現有 Azure SQL Database/受控執行個體伺服器的位置,與連線至內部部署網路的虛擬網路的位置不相同,請先在相同位置使用現有的 Azure SQL Database/受控執行個體伺服器並加入另一個,以建立您的 Azure-SSIS IR,然後設定不同位置之間的虛擬網路對虛擬網路連線。If the location of your existing Azure SQL Database/Managed Instance server where SSISDB is hosted is not the same as the location of a virtual network connected to your on-premises network, first create your Azure-SSIS IR using an existing Azure SQL Database/Managed Instance server and joining another virtual network in the same location, and then configure a virtual network to virtual network connection between different locations.

下圖顯示 Data Factory 及其整合執行階段的位置設定:The following diagram shows location settings of Data Factory and its integration run times:

整合執行階段位置

決定使用哪一個 IRDetermining which IR to use

複製活動Copy activity

若是複製活動,需要來源和接收連結服務來定義資料流程的方向。For Copy activity, it requires source and sink linked services to define the direction of data flow. 下列邏輯可決定使用哪個整合執行階段執行個體來執行複製:The following logic is used to determine which integration runtime instance is used to perform the copy:

  • 在兩個雲端資料來源之間複製:當來源和接收連結服務皆使用 Azure IR 時,ADF 將會在您已指定區域 Azure IR 時使用該 IR,或在您選擇自動解析 IR (預設值) 時自動判斷 Azure IR 的位置,如整合執行階段位置一節所說明。Copying between two cloud data sources: when both source and sink linked services are using Azure IR, ADF will use the regional Azure IR if you specified, or auto determine a location of Azure IR if you choose the auto-resolve IR (default) as described in Integration runtime location section.
  • 在雲端資料來源與私人網路中的資料來源之間複製:如果任一來源或接收連結服務指向自我裝載 IR,則會在自我裝載整合執行階段上執行複製活動。Copying between a cloud data source and a data source in private network: if either source or sink linked service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
  • 在私人網路中的兩個資料來源之間複製:來源和接收連結服務必須指向相同的整合執行階段執行個體,而該整合執行階段會用來執行複製活動。Copying between two data sources in private network: both the source and sink Linked Service must point to the same instance of integration runtime, and that integration runtime is used to execute the copy Activity.

查閱和 GetMetadata 活動Lookup and GetMetadata activity

查閱和 GetMetadata 的活動會在與資料存放區連結服務相關聯的整合執行階段上執行。The Lookup and GetMetadata activity is executed on the integration runtime associated to the data store linked service.

轉換活動Transformation activity

每個轉換活動都有一個指向整合執行階段的目標計算「連結服務」。Each transformation activity has a target compute Linked Service, which points to an integration runtime. 轉換活動就是從這個整合執行階段執行個體分派而來。This integration runtime instance is where the transformation activity is dispatched from.

資料流程活動Data Flow activity

資料流程活動會在與其相關聯的整合執行時間上執行。Data Flow activity is executed on the integration runtime associated to it.

後續步驟Next steps

請參閱下列文章:See the following articles: