Configure a self-hosted IR as a proxy for an Azure-SSIS IR in Azure Data Factory

APPLIES TO: yesAzure Data Factory yesAzure Synapse Analytics (Preview)

This article describes how to run SQL Server Integration Services (SSIS) packages on an Azure-SSIS Integration Runtime (Azure-SSIS IR) in Azure Data Factory with a self-hosted integration runtime (self-hosted IR) configured as a proxy.

With this feature, you can access data on-premises without having to join your Azure-SSIS IR to a virtual network. The feature is useful when your corporate network has a configuration too complex or a policy too restrictive for you to inject your Azure-SSIS IR into it.

This feature splits any SSIS data flow task that has an on-premises data source into two staging tasks:

  • The first task, which runs on your self-hosted IR, first moves data from the on-premises data source into a staging area in your Azure Blob storage.
  • The second task, which runs on your Azure-SSIS IR, then moves data from the staging area into the intended data destination.

Other benefits and capabilities of this feature allow you to, for example, set up your self-hosted IR in regions that are not yet supported by an Azure-SSIS IR, and allow the public static IP address of your self-hosted IR on the firewall of your data sources.

Prepare the self-hosted IR

To use this feature, you first create a data factory and set up an Azure-SSIS IR in it. If you have not already done so, follow the instructions in Set up an Azure-SSIS IR.

You then set up your self-hosted IR in the same data factory where your Azure-SSIS IR is set up. To do so, see Create a self-hosted IR.

Finally, you download and install the latest version of the self-hosted IR, as well as the additional drivers and runtime, on your on-premises machine or Azure virtual machine (VM), as follows:

  • Download and install the latest version of the self-hosted IR.

  • If you use Object Linking and Embedding Database (OLEDB) connectors in your packages, download and install the relevant OLEDB drivers on the same machine where your self-hosted IR is installed, if you haven't already done so.

    If you use the earlier version of the OLEDB driver for SQL Server (SQL Server Native Client [SQLNCLI]), download the 64-bit version.

    If you use the latest version of OLEDB driver for SQL Server (MSOLEDBSQL), download the 64-bit version.

    If you use OLEDB drivers for other database systems, such as PostgreSQL, MySQL, Oracle, and so on, you can download the 64-bit versions from their websites.

  • If you haven't already done so, download and install the 64-bit version of Visual C++ (VC) runtime on the same machine where your self-hosted IR is installed.

Prepare the Azure Blob storage-linked service for staging

If you haven't already done so, create an Azure Blob storage-linked service in the same data factory where your Azure-SSIS IR is set up. To do so, see Create an Azure data factory-linked service. Be sure to do the following:

  • For Data Store, select Azure Blob Storage.

  • For Connect via integration runtime, select AutoResolveIntegrationRuntime (not your Azure-SSIS IR nor your self-hosted IR), because we use the default Azure IR to fetch access credentials for your Azure Blob Storage.

  • For Authentication method, select Account key, SAS URI, or Service Principal.

    Tip

    If you select the Service Principal method, grant your service principal at least a Storage Blob Data Contributor role. For more information, refer to Azure Blob storage connector.

Prepare the Azure Blob storage-linked service for staging

Configure an Azure-SSIS IR with your self-hosted IR as a proxy

Having prepared your self-hosted IR and Azure Blob storage-linked service for staging, you can now configure your new or existing Azure-SSIS IR with the self-hosted IR as a proxy in your data factory portal or app. Before you do so, though, if your existing Azure-SSIS IR is already running, stop it and then restart it.

  1. In the Integration runtime setup pane, skip past the General Settings and SQL Settings sections by selecting Next.

  2. In the Advanced settings section, do the following:

    1. Select the Set up Self-Hosted Integration Runtime as a proxy for your Azure-SSIS Integration Runtime check box.

    2. In the Self-Hosted Integration Runtime drop-down list, select your existing self-hosted IR as a proxy for the Azure-SSIS IR.

    3. In the Staging storage linked service drop-down list, select your existing Azure Blob storage-linked service or create a new one for staging.

    4. In the Staging path box, specify a blob container in your selected Azure Blob storage account or leave it empty to use a default one for staging.

    5. Select Continue.

    Advanced settings with a self-hosted IR

You can also configure your new or existing Azure-SSIS IR with the self-hosted IR as a proxy by using PowerShell.

$ResourceGroupName = "[your Azure resource group name]"
$DataFactoryName = "[your data factory name]"
$AzureSSISName = "[your Azure-SSIS IR name]"
# Self-hosted integration runtime info - This can be configured as a proxy for on-premises data access 
$DataProxyIntegrationRuntimeName = "" # OPTIONAL to configure a proxy for on-premises data access 
$DataProxyStagingLinkedServiceName = "" # OPTIONAL to configure a proxy for on-premises data access 
$DataProxyStagingPath = "" # OPTIONAL to configure a proxy for on-premises data access 

# Add self-hosted integration runtime parameters if you configure a proxy for on-premises data accesss
if(![string]::IsNullOrEmpty($DataProxyIntegrationRuntimeName) -and ![string]::IsNullOrEmpty($DataProxyStagingLinkedServiceName))
{
    Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
        -DataFactoryName $DataFactoryName `
        -Name $AzureSSISName `
        -DataProxyIntegrationRuntimeName $DataProxyIntegrationRuntimeName `
        -DataProxyStagingLinkedServiceName $DataProxyStagingLinkedServiceName

    if(![string]::IsNullOrEmpty($DataProxyStagingPath))
    {
        Set-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
            -DataFactoryName $DataFactoryName `
            -Name $AzureSSISName `
            -DataProxyStagingPath $DataProxyStagingPath
    }
}
Start-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName `
    -DataFactoryName $DataFactoryName `
    -Name $AzureSSISName `
    -Force

Enable SSIS packages to connect by proxy

By using either the latest SSDT with SSIS Projects extension for Visual Studio or a standalone installer, you can find a new ConnectByProxy property that has been added in OLEDB or Flat File connection managers.

When you design new packages that contain data flow tasks with OLEDB or Flat File sources, which let you access databases or files on-premises, you can enable this property by setting it to True in the Properties pane of the relevant connection managers.

Enable ConnectByProxy property

You can also enable this property when you run existing packages, without having to manually change them one by one. There are two options:

  • Option A: Open, rebuild, and redeploy the project containing those packages with the latest SSDT to run on your Azure-SSIS IR. You can then enable the property by setting it to True for the relevant connection managers. When they're running packages from SSMS, these connection managers appear on the Connection Managers tab of the Execute Package pop-up window.

    Enable ConnectByProxy property2

    You can also enable the property by setting it to True for the relevant connection managers that appear on the Connection Managers tab of Execute SSIS Package activity when they're running packages in Data Factory pipelines.

    Enable ConnectByProxy property3

  • Option B: Redeploy the project containing those packages to run on your SSIS IR. You can then enable the property by providing its property path, \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy], and setting it to True as a property override on the Advanced tab of the Execute Package pop-up window when you're running packages from SSMS.

    Enable ConnectByProxy property4

    You can also enable the property by providing its property path, \Package.Connections[YourConnectionManagerName].Properties[ConnectByProxy], and setting it to True as a property override on the Property Overrides tab of Execute SSIS Package activity when you're running packages in Data Factory pipelines.

    Enable ConnectByProxy property5

Debug the first and second staging tasks

On your self-hosted IR, you can find the runtime logs in the C:\ProgramData\SSISTelemetry folder and the execution logs of first staging tasks in the C:\ProgramData\SSISTelemetry\ExecutionLog folder. You can find the execution logs of second staging tasks in your SSISDB or specified logging paths, depending on whether you store your packages in SSISDB or file system, file shares, or Azure Files. You can also find the unique IDs of first staging tasks in the execution logs of second staging tasks.

Unique ID of the first staging task

Use Windows authentication in staging tasks

If the staging tasks on your self-hosted IR require Windows authentication, configure your SSIS packages to use the same Windows authentication.

Your staging tasks will be invoked with the self-hosted IR service account (NT SERVICE\DIAHostService, by default), and your data stores will be accessed with the Windows authentication account. Both accounts require certain security policies to be assigned to them. On the self-hosted IR machine, go to Local Security Policy > Local Policies > User Rights Assignment, and then do the following:

  1. Assign the Adjust memory quotas for a process and Replace a process level token policies to the self-hosted IR service account. This should occur automatically when you install your self-hosted IR with the default service account. If it doesn't, assign those policies manually. If you use a different service account, assign the same policies to it.

  2. Assign the Log on as a service policy to the Windows Authentication account.

Billing for the first and second staging tasks

The first staging tasks that run on your self-hosted IR are billed separately, just as any data movement activities that run on a self-hosted IR are billed. This is specified in the Azure Data Factory data pipeline pricing article.

The second staging tasks that run on your Azure-SSIS IR are not be billed separately, but your running Azure-SSIS IR is billed as specified in the Azure-SSIS IR pricing article.

Enabling TLS 1.2

If you need to use strong cryptography/more secure network protocol (TLS 1.2) and disable older SSL/TLS versions on your self-hosted IR, you can download and run the main.cmd script that can be found in the CustomSetupScript/UserScenarios/TLS 1.2 folder of our public preview container. Using Azure Storage Explorer, you can connect to our public preview container by entering the following SAS URI:

https://ssisazurefileshare.blob.core.windows.net/publicpreview?sp=rl&st=2020-03-25T04:00:00Z&se=2025-03-25T04:00:00Z&sv=2019-02-02&sr=c&sig=WAD3DATezJjhBCO3ezrQ7TUZ8syEUxZZtGIhhP6Pt4I%3D

Current limitations

  • Only data flow tasks with Open Database Connectivity (ODBC)/OLEDB/Flat File sources or OLEDB destination are currently supported.
  • Only Azure Blob storage-linked services that are configured with Account key, Shared Access Signature (SAS) URI, or Service Principal authentication are currently supported.
  • ParameterMapping in OLEDB Source is not supported yet. As a workaround, please use SQL Command From Variable as the AccessMode and use Expression to insert your variables/parameters in a SQL command. As an illustration, see the ParameterMappingSample.dtsx package that can be found in the SelfHostedIRProxy/Limitations folder of our public preview container. Using Azure Storage Explorer, you can connect to our public preview container by entering the above SAS URI.

Next steps

After you've configured your self-hosted IR as a proxy for your Azure-SSIS IR, you can deploy and run your packages to access data on-premises as Execute SSIS Package activities in Data Factory pipelines. To learn how, see Run SSIS packages as Execute SSIS Package activities in Data Factory pipelines.