How to Reduce SQL Server On-Premise Connection Latency from Windows Azure Web Role – First Impressions of Azure Connect

In this post I describe my experience from implementing Azure Connect and key findings regarding factors that impact connection latency in a scenario where SQL Server uses integrated authentication and the Azure web role instance is domain joined.

Azure Connect Why Bother?

We have introduced Azure Connect in PDC 2010 Conference. It is currently it is available as a beta (CTP2) technology. You can request access to Azure beta programs on the Windows Azure developer portal. After you log in click Home and then Beta Programs on the left panel. You will see Azure Connect listed there. You can now push the button Apply for Access to request participation. It can take some days before you get approval and this feature is enabled for your subscription.

Azure Connect opens the possibility of new scenarios – building hybrid solutions that include Azure virtual machines (role instances) and on-premise machines. You can join the Azure virtual machines to your domain and access them on your local network just as you access your physical servers.

There are some limitations of course. The biggest one is that the Azure roles do not preserve local state. This means that when the Azure host OS is upgraded you will loose any configuration you did manually. Consequently, we need to use Startup task to automate any configuration changes (pre-requisite components, configuration related to IIS, etc). For a scenario where the Azure role instances are domain joined we need to remember that the identity (machine name) of the role instance may change so we cannot really manage the Azure virtual machines as if they local domain resources.

Some customers object to having a solution hosted in Azure because they do not want to move their data to the Cloud because of legal requirements. Azure Connect now enables a scenario where you can keep your SQL server on your corporate network but move to Azure other servers (web servers). Event better, you do not have to compromise on security. You can domain join the Azure web role instances so that they use windows integrated security when accessing a SQL Server!

Azure Connect Basics

The latest Windows Azure Platform Training Kit contains a lab Connecting an Azure Web Role to an External SQL Server Database with Windows Azure Connect. It describes a scenario where the Azure web role instance contains an ASP.Net page configured for anonymous access. When the page loads it connects to the SQL Server that is located in the on-premise network and uses SQL Authentication. The Azure web role instance is not domain joined but rather works in the workgroup. The instructions in this lab are pretty exhaustive there are just two important points to emphasize:

  • The firewall rule to allow incoming traffic on port 1433 on the SQL server machine should be applied not just to Domain or Private but also Public profiles because your Azure role instance connects from the internet.
  • Almost certainly we need to control the Connection Timeout property in the SQL connection string. The Azure Connect Relays are now available in US, Europe and Asia so you should select the closes datacenter location.However, there may be cases where so the default connection timeout of 15 seconds is not sufficient and you may need to increase the default value.
 <connectionStrings>
  <add name="NorthwindConnectionString" connectionString="Data Source=SQLServer.CONTOSO.COM,1433;Initial Catalog=Northwind;Persist Security Info=True;User ID=labUser;Password=Passw0rd!;Connection Timeout=60" providerName="System.Data.SqlClient" />
</connectionStrings>

A more interesting scenario is when the Azure role instance VM is joined to the local, on-premise domain. In this scenario IIS application pool of the Azure web role can be configured with the domain account instead of the Network Service. Consequently, the on-premise SQL server can be locked down with Windows integrated authentication. So lets have a look what would be required. Essentially, there are two parts of the technical solution:

  1. Joining the Azure web role to the domain
  2. Configuring the IIS application pool

I should mention that in my case the on-premise components (SQL Server and Active Directory) were located in Madrid, Spain and that I was deploying Azure web roles to Dublin datacenter.

Lets have a look at these two points in turn.

1. The procedure to join an Azure role to the on-premise domain is well documented in MSDN article Overview of Windows Azure Connect When Roles Are Joined to a Domain. and in the product team blog Domain Joining Azure Roles.  Perhaps the point that is worthwhile to call out some practical points.

The first is regarding the certificate that you need to encrypt the domain password (required to join the VM to domain).  it is possible to use the same certificate you use to secure the Remote Desktop password. Most likely RD access will be enabled anyway so if security policy of the customer allow it we can reuse the certificate we already have configured. 

It is also beneficial to configure a domain account for the Microsoft.WindowsAzure.Plugins.Connect.Administrators Connect setting. You can then Remote Desktop into your Web role with the domain account rather than local machine account. Obviously, that you give use the access to the domain resources from the RD session.

The second practical point is that although the Connect agent icon may be telling you that there is some issue do not rely on it. For example on my SQL server I see the agent apparently in the eternal connecting state

image

but actually it is working fine and this is a UI refresh issue that we expect to be corrected. What is worthwhile to do is to open a firewall for ping so that you can test that the communication between the Web Role VM and the SQL server actually works. You can do this using command

 netsh advfirewall firewall add rule name="ICMPv6" dir=in action=allow enable=yes protocol=icmpv6

For the web role you would put this into a startup task of course. Than you can ping your Azure VMs to verify connectivity. Following is a ping from SQL server on-premise to a Web Role VM (in this case the Azure Relay was in US and the solution in Europe hence large latencies for ping here).

image

2. What is not documented currently on MSDN is how you can set an account for the IIS application pool programmatically from the Web Role. Because the role instances can be recycled in Azure this has to be automated with code. Fortunately, the product team has just published a blog Windows Azure Connect Use Case: Web Role / Application Pool Access SQL Server using Windows Authentication. This code is incorporated into the OnStart() method of the WebRole and has some nice features.

  • It will detect if Connect network and DNS are available and if the role is domain joined. If not, it will keep on waiting until they are. Obviously, you will have to manually set up the Connect policy to allow communication between the Connect endpoint on the Web Role and the machines (SQL server, Domain controller) that are on-premise. Once you do so the Connect plug-in will perform the domain join and eventually your role will be in Ready state.
  • Only after you are domain joined and have network connectivity will the code set the domain account for the application pool. The code uses AppCmd.exe which is better than other approaches based on writing to the IIS metabase that may require you to configure additional Windows features on the machine.
  • The associated IIS application pool password is encrypted with the certificate in the same way as the domain join password and is stored in the configuration file. So it is not being hardcoded into your code.

When you deploy the solution to Azure your role will be Busy. This is expected because although the role has the Connect Agent installed the Azure Connect policy does not exist that would allow it to communicate with the domain controller – making the Azure domain join impossible.

We need to click the Virtual Network in the left panel in the Azure management Portal, expand the Connect node , the subscription and lookup the list of activated endpoints:

image

Here you can see the Azure web role instance RD00155D361666 listed confirming that it has an active Connect Agent. We now have to define the Azure Connect policy that will allow it to communicate with Active Directory and the SQLServer machine on-premise.

This is easily done by clicking on the Groups and Roles node and Create Group button

image 

In the scenario where we use integrated authentication it is important NOT to check “Interconnected:” check box. What is the reason? More about it below in the section “Reduction of Connection Latency”.

When we save this Azure Connect policy, the Azure connect agents will pick up the policy change automatically (it should happen automatically every 5 minutes or you can force the update right clicking on the Azure Connect icon in the systray and selecting “Refresh Policy”.

You also want to make sure that you use the Relay in the datacenter that is closest to your location (in my case it is Europe). You can change the Relay location clicking the Relay Region button.

image

With the Azure Connect policy in place the solution deployment will proceed to domain join the web role and eventually the deployment status of the solution will be Ready.

So finally we get the result. We can connect to the WebRole and pull the data from the SQL Server on-premise protected with the Windows integrated authentication

image

Reduction of Connection Latency

When I made the initial tests with this Web Role I was disappointed by the initial SQL Server connection latency which was about 20 seconds. Investigation with the help of the product group turned out a number of additional issues (sources of latency) that need attention.

The Obvious Ones

In CTP2 of Azure Connect Azure Connect Relays have been deployed to all Microsoft datacenter locations. In most cases it makes sense select the Relay closes to your intranet geographical location. Also the size of the Azure web role matters with higher spec roles having higher performance.

Below you can see ping times from machine in Spain to Azure web role in Dublin (Relay in Europe):

image

The same ping times for relay in US take 3 times longer :

image

SQL Server Native Client Connection Latency

When I made the initial tests with this Web Role I was disappointed by the latency which for opening the connection first time which was about 20 seconds. After talking to the product group it turns out Azure Connect will assign both the IPv4 as well as IPV6 address to the SQL Server.

 Resolving HostName: [sqlserver] ....
IP: 2a01:111:f102:140:4099:2a08:7370:e519
IP: 10.10.10.3

Because Azure Connect works with IPV6 addresses attempting to contact SQL Server on IPv4 address will lead to a timeout. The SQL Server documentation comments that SQL Server Native client will try first to connect on IPv4 and then on IPv6 address if both are configured. The workaround to this problem is to configure Azure web role instance firewall to block IPv4 traffic. With IPv4 traffic blocked by firewall the SQL Server Native client will try the IPv6 address right away.

Because my “intranet” is on 10.X.X.X sub network following rule worked for me – the initial latency dropped from 20 seconds to 3-4 seconds.

 netsh advfirewall firewall add rule name="BlockIPv4SQL" dir=out action=block protocol=tcp remoteport=1433 remoteip=10.0.0.0-10.255.255.255 >> %SystemRoot%\logs\Startup.log

Azure Connect team blog has now published a post that details the configuration required.

SQL Server to Active Directory Latency (Azure Connect Policy induced)

In the scenario where SQL Server uses integrated authentication it is important NOT to check “Interconnected:” check box in Azure Connect policy mentioned previously. What is the reason? If this checkbox is selected then Azure Connect is used for network communication between all endpoints in the group. This means that when Azure web role (WebRole1) attempts to query SQL Server machine then the SQL Server will use Azure Connect to contact Active Directory (rather than local network) to validate the credentials (these are IIS application pool credentials). So your traffic goes from SQL Server machine to Azure Connect Relay (in Europe it may be Dublin) and then to Active Directory-clearly increasing the latency. If the check box in NOT selected then Azure Connect will only be used for communication between WebRole1 and SQL Server and WebRole1 and Active Directory (in both directions). Therefore, SQL Server will contact Active directory using the local intranet. That configuration took off another second from the latency I was experiencing. See following blog from the Azure Connect team to get full understanding.

Web Site Related Latencies

There are additional potential sources of latency. Some of them are located in the web server (Azure web role) itself. One very good source of information is the document “Best Practices for Speeding up Your Website” on Yahoo Developer Network. This document gives a number of tips on speeding up the loading times for the web page ranging from the use of: Content Delivery Network, caching, compression and optimization of java script and CSS.

I used two tips from this document: ” No 404s” and “Make favicon.ico Small and Cacheable” as they are fairly generic and easy to implement. If you Remote Desktop to Azure web role you can look up the IIS log – you will notice the requests browser makes for favicon.ico file. Just aggregate one to your website and it will load a fraction faster.

image

Summary

The key experience from my experience with Azure Connect implementation was that additional sometimes, non-trivial configuration required in order to reduce the connection latency. The factors or potentials sources of latency are:

  1. Azure Connect Relay location and Azure web role size
  2. Physical distance of your intranet machines from the datacenter
  3. SQL Server Native Client related latency
  4. SQL Server to Active Directory latency (Azure Connect policy)
  5. Web site related latencies

I commented on how to tackle all of them except for 2 which we cannot do anything about.