Troubleshooting Availability Group Listener in Azure
Configuring an availability group listener in Azure has additional steps involved when compared to creating an availability group listener on premises. This topic helps you troubleshoot your availability group listener, whether your AlwaysOn Availability Groups deployment is in Azure only or in a hybrid IT environment using a site-to-site VPN.
Some steps in the listener configuration involve configuration of Azure itself, such as the load-balanced virtual machine (VM) endpoint and direct server return. However, Azure currently does not provide any tools to help you verify your configuration is working as expected. Therefore, you need a network analyzer to help you verify your configuration as well as troubleshoot any problem. This topic shows you how to use Microsoft Network Monitor to troubleshoot your availability group listener.
Availability Group Listener Configuration Summary
This section provides a list of configuration options to check while troubleshooting your availability group listener.
|Load-balanced Endpoint (Configured in Azure)||Configuration inside VMs||Configuration of Client Connectivity|
Verify Probe Pings in Availability Group Listener Configuration
Note: Azure Load Balancers don’t support the ping (ICMP) protocol.
To determine whether the probe port on the load-balanced endpoint is working properly on the Azure VMs, you use Network Monitor to flter your packet capture on the probe port.
When the load-balanced endpoint is configured properly, the Azure load balancer continuously pings each of the VMs to determine if it has the primary replica so that it can route client connections to the correct VM. If the VM is the primary replica, the cluster service is configured with the probe port and responds to the probe pings. This traffic can be seen in Network Monitor by performing packet capture with the following flter applied in the Display Filter pane:
TCP.DstPort == 59999 OR TCP.SrcPort == 59999
The first clause captures the incoming pings from the Azure load balancer and the second clause captures the reply by the primary replica. The screenshot below shows what it looks like when you do a packet capture on the primary replica in Azure.
The Con Id column shows you packets related to the same probe ping (shown in yellow). The response message from your VM acknowledges each ping message by incrementing the Seq value in the Ack value. You can also see the load balancer’s source IP address (shown in orange).
If you do not see the response messages from the VM, but SynReTransmit messages instead from the load balancer, the VM is not responding to the load balancer, which can mean that it is not the primary replica or that the probe ping are not working as expected.
Verify Listener Connectivity in Availability Group Listener Configuration
To determine whether the availability group listener is working properly on the Azure VMs, you use Network Monitor to filter your packet capture on the listener port.
When the client tries to connect to the availability group listener using the cloud service’s IP address and the listener port, Azure verifies that the connection port is the same as the one configured in the load-balanced endpoint and lets the TCP connection through to the primary replica, which has been responding to the probe pings. If the VM’s firewall has a corresponding rule, the client access point is configured properly, and the listener port is configured in your availability group, the availability group listener accepts the connection and the client can perform updates and queries. This traffic can be seen in Network Monitor by performing packet capture with the following filter applied in the Display Filter pane (assuming that the listener port is 10000):
TCP.DstPort == 10000 OR TCP.SrcPort == 10000
The screenshot below shows what it looks like when you successfully connect to the availability group listener in Azure and perform a simple query.
The Con Id column shows you packets related to the same client connection (shown in yellow). In the packets sent from the VM, you can see information regarding the client’s hostname, domain, and username (shown in red). You can also see the client’s IP address on the internet (shown in orange). In this case, it is a client VM in a different cloud service, so the IP address shown is the client VM’s cloud service IP address.
Troubleshoot Availability Group Listener Configuration in Azure
The table below lists some of the common symptoms when troubleshooting availability group listeners in Azure, and possible causes for each symptom.
Tip: The Windows Ping.exe command does not work on the availability group listener in Azure. The load-balanced endpoint only accepts TCP connections, while Ping.exe uses ICMP.
|No traffic on probe port (59999)||
|Probe port receives pings and SynReTransmit packets, but no replies||
||This symptom indicates that the probe port is configured properly in the load-balanced endpoint and that the VM firewall has allowed the incoming packet. To test whether the clustered service is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for rhs.exe in the list.|
|No traffic on listener port||
||The listener port should match the public/local port specified on the load-balanced endpoint.|
|Listener port receives incoming traffic and SynReTransmit packets, but no replies||
||This symptom indicates that the load-balanced endpoint is configured properly and that the Azure load balancer has successfully routed the client’s connection request to the primary replica, but no listener is actively listening on that port. To test whether the listener is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for sqlservr.exe.The common mistake in configuring dependencies for the client access point is to set the availability group resource to depend on the IP address resource(s). Instead, you should configure the listener name to depend on the IP address(s) and configure the availability group resource to depend on the listener name.|
|Listener only accessible from the primary replica node itself||
|Client lost connectivity to listener after failover||
|All IP address resources in client access point are offline, but listener name is online||
|Listener name is offline, but availability group resource is online||
|At least one IP address is online, but listener name is offline||