Connecting to Availability Group Listener in Hybrid IT

In this blog, I'd like to explore client connectivity strategy when you have an availability group that spans both your on-premise servers and Windows Azure VMs. From the outset, I should clarify that the availability group listener is not supported in Windows Azure at the moment, which contributes to the interesting scenario in hybrid IT where the availability group listener is useful part of the time (i.e. when an on-premise replica is the primary replica). In a simple scenario, you have a primary replica and synchronous secondary replica running in your on-premise network for high availability. You then have an asynchronous secondary replica running in a Windows Azure VM for disaster recovery. The Windows Azure virtual network is connected to your on-premise network through a site-to-site VPN tunnel.

If you configure an availability group listener for this hybrid-IT scenario, as a matter of course you will specify two IP addresses for the listener's network name, one for your on-premise subnet and one for the Windows Azure subnet. Purely as a behavior of AlwaysOn Availability Groups, the multi-subnet cluster will always assign the listener IP for the availability group to be the one with same subnet mask as the primary replica. For example, if you create a listener called AG1 with the IP addresses 10.1.1.111/10.2.2.222 for the respective subnets, as shown below, the listener IP address 10.1.1.111 is active because the primary replica is in the 10.1.1.0/24 subnet.

The IP address 10.1.1.111 will be the active IP as long as the availability group fails over between the two on-premise replicas. Clients can connect to the listener name AG1 and be automatically redirected to the correct primary replica. However, when the availability group fails over to the asynchronous secondary in Windows Azure, the listener IP 10.2.2.222 becomes active instead. You can observe this in Failover Cluster Manager, where 10.1.1.111 is brought offline and 10.2.2.222 is brought online during the failover. At this point, if clients are configured to connect to the listener name AG1, the IP address resolves to 10.2.2.222 but the clients cannot connect to that IP address (again, because the listener is not supported in Windows Azure). Failover Cluster Manager may indicate that 10.2.2.222 is online, but actually no traffic can be routed to it in Windows Azure.

While you cannot connect to the availability group listener (yet), you can still connect directly to the SQL Server instance on the Windows Azure VM. So whenever you perform a failover from an on-premise replica to a Windows Azure replica, you must also change the client connection strings to connect directly to the Windows Azure VM. In a production environment, a failover to the Windows Azure VM should only happen in a disaster recovery scenario, and you should have a disaster recovery plan that is fully tested. As part of your disaster recovery procedure you want to do the following:

  1. bring the cluster back online by forcing quorum on the Windows Azure VM,
  2. perform a forced failover with data loss,
  3. change the client applications (e.g. SharePoint) to connect to the Windows Azure VM directly,
  4. recover your on-premise network/servers,
  5. fail over back to one of the on-premise replicas,
  6. change the client database connection to use the availability group listener again.

Simplifying Client Connectivity During Disaster Recovery

Reconfiguring client applications during disaster recovery is a complex process that can be simplified by creating dedicated DNS records. In the example above, you can create an A record that points to 10.1.1.111 or a CNAME record that points to AG1, then point the client applications to use this DNS record to connect to the availability database. Your disaster recovery procedure then may look as follows:

  1. bring the cluster back online by forcing quorum on the Windows Azure VM,
  2. perform a forced failover with data loss,
  3. change the DNS record to point to the Windows Azure VM (if an A record, it's the VM's IP address; if a CNAME record, it's the VM's hostname),
  4. recover your on-premise network/servers,
  5. fail over back to one of the on-premise replicas,
  6. change the DNS record to point to the availability group listener again.

This technique is particularly helpful if you have multiple clients that point to the same availability group. Instead of reconfiguring client applications, you only need to make sure that the one DNS record points to the proper IP address (A record) or network name (CNAME record).

When Replicas are in Named Instances with Different Names

If the replicas in your availability group are located in named instances that do not have the same instance names, you can still use the same technique, but you must make sure that the named instance on the Windows Azure VM listens on the same static port that the availability group listener is configured with and that client applications connect directly to the server name and port number. In other words, the availability group listener and the named instance on the Windows Azure VM need to be configured such that clients can use the same connection string to connect to either of them, as long as the DNS record points to the correct network name. In the example below, you can configure AG1 and NamedInst both to use port number 10000. The port numbers for the on-premise replicas can remain dynamically allocated because you can the availability databases through the listener.

Once you have configured your solution as shown above, configure all client applications to AG-VNN and port number 10000. For example: "Data Source=AG-VNN,10000;Network Library=dbmssocn;Initial Catalog=MyDataBaseName;Integrated Security=SSPI". This way, the on-premise.

For information on specifying a static port number for a named SQL Server instance, see Configure a Server to Listen on a Specific TCP Port (SQL Server Configuration Manager).