Troubleshoot SQL Server Always On issues

This article helps you resolve the common problem about Always On configuration on SQL Server.

Note

For a guided walk through experience of this article, see Troubleshooting SQL Server Always On Issues.

Original product version:   SQL Server 2012 Enterprise, SQL Server 2014 Enterprise, SQL Server 2016 Enterprise
Original KB number:   10179

Important notes

I need pointers on setting up and configuring Always On Availability groups

If you are looking for documentation on setting up Always On configuration, please see the following documents:

Getting Started with Always On Availability Groups (SQL Server) - The document provides answers to many questions you may have about Availability groups and setup. Following all the steps in this article and reviewing Prerequisites, Restrictions, and Recommendations for Always On Availability Groups (SQL Server) will help prevent many issues that you may run into with setting up and maintaining availability groups in your environment.

Additional resources

If this information isn't helpful, see More information about Always On Availability Groups.

I am having problems configuring Always On Availability groups

Typical configuration problems include Always On Availability Groups are disabled, accounts are incorrectly configured, the database mirroring endpoint doesn't exist, the endpoint is inaccessible (SQL Server Error 1418), network access doesn't exist, and a join database command fails (SQL Server Error 35250). Review the following document for help on troubleshooting these issues:

Troubleshoot Always On Availability Groups Configuration (SQL Server)

Additional link: Fix: Error 41009 when you try to create multiple availability groups

If the issue still exists, see More information about Always On Availability Groups.

I am having issues with Listener configuration (19471, 19476, and other errors)

One of the most common configuration issues customers encounter is availability group listener creation. The errors are similar to the following:

  • Msg 19471, Level 16, State 0, Line 2The WSFC cluster could not bring the Network Name resource with DNS name '' online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.

  • Msg 19476, Level 16, State 4, Line 2The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

The majority of time, listener creation failure resulting in the previous messages are due to a lack of permissions for the Cluster Name Object (CNO) in Active Directory to create and read the listener computer object. For troubleshooting this problem, please review the following articles:

If the issue still exists, see More information about Always On Availability Groups.

Automatic Failover isn't working as expected

If you notice that the automatic failover isn't working as expected either during testing or in production, see: Troubleshooting automatic failover problems in SQL Server 2012 Always On environments.

Improper configuration of Maximum failures in the specified period is one of the leading causes for primary not automatically failing over to the secondary. The default value for this setting is N-1, where N is the number of replicas. For more information, see Failover cluster (group) maximum failures limit.

If the issue still exists, see More information about Always On Availability Groups.

I am having issues connecting to Always On Availability groups

After you configure the availability group listener for an Always On Availability Group in SQL Server 2012, you may be unable to ping the listener or connect to it from an application. You may get an error that's similar to the following:

Sqlcmd: Error: Microsoft SQL Native Client : Login timeout expired.

To troubleshoot this and similar errors, review the following:

More information links:

If the issue still exists, see More information about Always On Availability Groups.

I am having issues configuring Always On Availability groups in my Azure VM (IaaS)

  1. Lot of issues related to Always On occur due to improper configuration of the listener. If you are having connection issues to the listener,

    1. Make sure you read all the limitations of ILB listener and followed all the steps documented in the following article paying particular attention to dependency configuration, IP address, and various other parameters in the PowerShell script.

    2. If unsure, you may want to delete and recreate the listener as per the above document.

  2. If you recently moved your VM to a different service or if the IP addresses changed, you need to update the value of the IP address resource to reflect the new address and you need to recreate the load balanced endpoint for your AG. You can update the IP address using the Get or Set commands as follows:

    Get-ClusterResource "IPResourceName" | Set-ClusterParameter -name Address -value "w.x.y.z"
    

Recommended documents:

If the issue still exists, see More information about Always On Availability Groups.

It takes a long time to failover from primary to secondary or vice-versa

After an automatic failover or a planned manual failover without data loss on an availability group, you may find that the failover time exceeds your recovery time objective (RTO). To troubleshoot the causes and potential resolutions, see Troubleshoot: Availability Group Exceeded RTO.

If the issue still exists, see More information about Always On Availability Groups.

Changes on the Primary Replica are either not reflected on or slow to replicate to the Secondary Replica

You may notice that changes on primary replica are not getting propagated to secondary in a timely manner. To troubleshoot and resolve these problems, try the following:

If the issue still exists, see More information about Always On Availability Groups.

How to manage the size of transaction log for my AG databases

You can reduce the transaction log sizes by configuring regular backups at either primary or secondary servers.

Review the following topics for additional information:

If this information isn't helpful, see More information about Always On Availability Groups.

Primary or Secondary Servers struck in Resolving State or you experience unexpected failovers

If the issue still exists, see More information about Always On Availability Groups.

Not able to bring resources online

Check if the databases are taking a long time to recover by reviewing the messages in the SQL ErrorLog.

If the issue still exists, see More information about Always On Availability Groups.

Frequently asked questions

  1. Is it possible to have two Listeners for one availability group?

    Yes, you can set up multiple listeners for the same availability group. See How to create multiple listeners for same availability group (Goden Yao).

  2. Is it possible to have a separate NIC card for always on traffic and Client connectivity?

    Yes, you can have dedicated NIC card for Always On traffic. See Configure Availability Group to Communicate on a Dedicated Network.

  3. What editions support Always On failover cluster instances?

    This topic in SQL Server Books Online has more information: Editions and Supported Features for SQL Server 2016.

  4. How to recover in case of a failure on all nodes of your cluster?

    See WSFC Disaster Recovery through Forced Quorum (SQL Server).

  5. Where can I find information on support for distributed transactions in AG configurations?

    See Transactions - availability groups and database mirroring.

  6. How to update Always On configurations?

    See Upgrading Always On Availability Group Replica Instances.

  7. How to add TDE (Transparent Data Encryption) enabled database to AG configuration?

    To add TDE enabled DB to AG, see How to configure Always On for a TDE database.

  8. How to configure alerts for checking if the secondary is lagging behind the primary?

    You can use the following script:

    SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
    dr_state.database_id AS database_id,
    is_ag_replica_local = CASE
        WHEN ar_state.is_local = 1 THEN N'LOCAL'
        ELSE 'REMOTE'
        END,
    ag_replica_role = CASE
        WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
        ELSE ar_state.role_desc
        END,
    dr_state.last_hardened_lsn, dr_state.last_hardened_time,
    datediff(s,last_hardened_time, getdate()) AS 'seconds behind primary'
    FROM (( sys.availability_groups AS ag
    JOIN sys.availability_replicas AS ar
        ON ag.group_id = ar.group_id)
    JOIN sys.dm_hadr_availability_replica_states AS ar_state
        ON ar.replica_id = ar_state.replica_id)
    JOIN sys.dm_hadr_database_replica_states dr_state
        ON ag.group_id = dr_state.group_id AND dr_state.replica_id = ar_state.replica_id
    
  9. How to get alerted if the state of the database is other than synchronized?

    You can use the following script:

    SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
    dr_state.database_id AS database_id,
    is_ag_replica_local = CASE
        WHEN ar_state.is_local = 1 THEN N'LOCAL'
        ELSE 'REMOTE'
        END,
    ag_replica_role = CASE
        WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
        ELSE ar_state.role_desc
        END,
    ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
    FROM (( sys.availability_groups AS ag
    JOIN sys.availability_replicas AS ar
        ON ag.group_id = ar.group_id )
    JOIN sys.dm_hadr_availability_replica_states AS ar_state
        ON ar.replica_id = ar_state.replica_id)
    JOIN sys.dm_hadr_database_replica_states dr_state
        ON ag.group_id = dr_state.group_id AND dr_state.replica_id = ar_state.replica_id
    

    You can also review the following links for additional methods to monitor Always On groups:

More information about Always On Availability Groups