question

Fritter13-0717 avatar image
0 Votes"
Fritter13-0717 asked ChristofferMartinsson-3946 commented

Connecting ADF to on-premise SQL Server: Is IR absolutely required?

I want to execute an ADF copy task to extract data from my on-premise MS SQL Server DB into a blob object on my Azure storage account. Everything I read suggests that I need to install a self-hosted Integration Runtime (IR) on a host in my on-premise network; but I have done a test where I successfully extracted data from the SQL Server instance doing the following:
1. adding (allow) whitelist for the appropriate ADF IPs for the region where my ADF exists (east-us)
2. creating my copy source using the fully qualified domain information for the on-premise SQL Server and entering appropriate SQL Server authentication credentials

What am I missing? Why does all the documentation suggest that this can only be done using an on-premise, self-hosted IR?

Finally, if I would decide to utilize the on-premise, self-hosted IR how does the IR avoid the need for firewall (allow) whitelist entries? Does the IR initiate the communication with ADF? If so, how do I avoid running up ADF costs if there is this constant (keep alive) communication?

Thanks for any assistance clearing up these mysteries.




azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered ChristofferMartinsson-3946 commented

Hello @Fritter13-0717 and welcome to Microsoft Q&A. Please allow me to explain and educate.

The Self-Hosted Integration Runtime (SHIR) is required in cases where a network is not open to the internet, or does not allow incoming connections. The network might be isolated for security reasons, and only allow outgoing connections.

The SHIR accomplishes a work-around by sitting in your network, and making outbound requests to Azure Data Factory (ADF), asking whether there are any jobs it needs to perform. If there are tasks to perform, the details are retrieved and executed inside your network, only making outbound connections. The SHIR also checks for updates. In this manner, ADF does not need to make inbound connections to the network.

In your case, it sounds like your network, or at least your Database is open to the internet, by dint of having a Fully Qualified Domain Name (FQDN). For any data source accessible via the internet, a SHIR is not required. In your case, as you have already discovered, a SHIR is not necessary.

Yes, connections are initiated by Integration Runtimes. This is true for both SHIR and Azure Integration Runtimes.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you! This makes perfect sense.

One follow up: since there is an ongoing communication, albeit maybe limited (SHIR checking for work on some scheduled basis), is there any Azure account charges associated with this communication?

0 Votes 0 ·

@Fritter13-0717 T
o my knowledge there are no Data Factory costs associated with a SHIR outside of an actively running job. The checking for work, to my knowledge, has no associated costs. You are only charged for activity/work.

Pricing Calculator

93733-image.png


0 Votes 0 ·
image.png (134.1 KiB)

Picking up on an old thread.

We use a hub and spoke network model with the hub connected to on-prem via VPN and hybrid DNS configuration for cross-premise name resolution. (Custom DNS).


We use network peerings and private endpoints so and have route all Azure traffic through a firewall.

This model is very much like running a second datacenter from a networking perspective which we like since it omits lot's of the "behind the scenes" stuff in Azure and brings visibility into the solution.

However, with ADF, Synapse and Purview the concept of "managed virtual networks" and integration runtimes comes in.
IWhy is it not possible from within the ADF to just use the private endpoint i've assigned to it and access the on-prem SQL server through the on-prem VPN gateway?
Im confused as to how im even able to assign a private endpoint to it if ADF cant use it. Is it only for studio access on isolated networks?




0 Votes 0 ·