Troubleshoot SQL Server AlwaysOn issues

This article helps you resolve the common problem about AlwaysOn configuration on SQL server.

Note

For a guided walk through experience of this article, see Troubleshooting SQL Server AlwaysOn Issues.

Original product version:   SQL Server 2012 Enterprise, SQL Server 2014 Enterprise, SQL Server 2016 Enterprise
Original KB number:   10179

Important notes

I need pointers on setting up and configuring AlwaysOn Availability groups

If you are looking for documentation on setting up AlwaysOn configuration, please review the following documents:

Getting Started with AlwaysOn Availability Groups (SQL Server) - The document provides answers to many questions you may have related to Availability groups, setup etc. Following all the steps in this doc and reviewing Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server) will help prevent many issues that you may run into with setting up and maintaining availability groups in your environment.

Additional resources

If this information is not helpful, see the More information about AlwaysOn Availability Groups.

I am having problems configuring AlwaysOn Availability groups

Typical configuration problems include AlwaysOn Availability Groups are disabled, accounts are incorrectly configured, the database mirroring endpoint does not exist, the endpoint is inaccessible (SQL Server Error 1418), network access does not exist, and a join database command fails (SQL Server Error 35250). Review the following document for help on troubleshooting these issues:

Troubleshoot AlwaysOn Availability Groups Configuration (SQL Server)

Additional links: Fix: Error 41009 when you try to create multiple availability groups

If the issue still exists, see the More information about AlwaysOn Availability Groups.

I am having issues with Listener configuration (19471, 19476 and other errors)

One of the most common configuration issues customers encounter is availability group listener creation. The errors are similar to the following:

  • Msg 19471, Level 16, State 0, Line 2The WSFC cluster could not bring the Network Name resource with DNS name '' online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.

  • Msg 19476, Level 16, State 4, Line 2The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

The majority of time, listener creation failure resulting in the messages above are due to a lack of permissions for the Cluster Name Object (CNO) in Active Directory to create and read the listener computer object. For troubleshooting this problem, please review the following articles:

If the issue still exists, see the More information about AlwaysOn Availability Groups.

Automatic Failover is not working as expected

If you notice that the automatic failover is not working as expected either during testing or in production, see: Troubleshooting automatic failover problems in SQL Server 2012 AlwaysOn environments.

Improper configuration of Maximum failures in the specified period is one of the leading causes for primary not automatically failing over to the secondary. The default value for this setting is N-1, where N is the number of replicas. For more information, see: Failover cluster (group) maximum failures limit.

If the issue still exists, see the More information about AlwaysOn Availability Groups.

I am having issues connecting to AlwaysOn Availability groups

After you configure the availability group listener for an AlwaysOn Availability Group in SQL Server 2012, you may be unable to ping the listener or connect to it from an application. You may get an error that is similar to the following:

Sqlcmd: Error: Microsoft SQL Native Client : Login timeout expired.

To troubleshoot this and similar errors review the following:

More information links:

If the issue still exists, see the More information about AlwaysOn Availability Groups.

I am having issues configuring AlwaysOn Availability groups in my Azure VM (IaaS)

  1. Lot of issues related to AlwaysOn occur due to improper configuration of the listener. If you are having connection issues to the listener,

    1. Ensure you read all the limitations of ILB listener and followed all the steps documented in the following article paying particular attention to dependency configuration, IP address, and various other parameters in the PowerShell script.

    2. If unsure, you may want to delete and recreate the listener as per the above document.

  2. If you recently moved your VM to a different service or if the IP addresses changed, you need to update the value of the IP address resource to reflect the new address and you need to recreate the load balanced endpoint for your AG. You can update the IP address using the Get/Set commands as follows:

    Get-ClusterResource "IPResourceName" | Set-ClusterParameter -name Address -value "w.x.y.z"
    

Recommended docs:

If the issue still exists, see the More information about AlwaysOn Availability Groups.

It takes a long time to failover from primary to secondary or vice-versa

After an automatic failover or a planned manual failover without data loss on an availability group, you may find that the failover time exceeds your recovery time objective (RTO). To troubleshoot the causes and potential resolutions, see: Troubleshoot: Availability Group Exceeded RTO.

If the issue still exists, see the More information about AlwaysOn Availability Groups.

Changes on the Primary Replica are either not reflected on or slow to replicate to the Secondary Replica

You may notice that changes on primary replica are not getting propagated to secondary in a timely manner. To troubleshoot and resolve these problems, try the following:

If the issue still exists, see the More information about AlwaysOn Availability Groups.

How to manage the size of transaction log for my AG databases

You can reduce the transaction log sizes by configuring regular Backups at either primary or secondary servers.

Review the following topics for additional information:

If this information is not helpful, see the More information about AlwaysOn Availability Groups.

Primary or Secondary Servers struck in Resolving State or you experience unexpected failovers

If the issue still exists, see the More information about AlwaysOn Availability Groups.

Not able to bring resources online

Check if the databases are taking a long time to recover by reviewing the messages in the SQL ErrorLog.

If the issue still exists, see the More information about AlwaysOn Availability Groups.

Frequently asked questions

  1. Is it possible to have two Listeners for one availability group?

    Yes, you can set up multiple listeners for the same availability group. See: How to create multiple listeners for same availability group (Goden Yao).

  2. Is it possible to have a separate NIC card for always on traffic and Client connectivity?

    Yes, you can have dedicated NIC card for AlwaysOn traffic. See: Configure Availability Group to Communicate on a Dedicated Network.

  3. What editions support Always On failover cluster instances?

    This topic in SQL Server Books Online has more information: Editions and Supported Features for SQL Server 2016.

  4. How to recover in case of a failure on all nodes of your cluster?

    See: WSFC Disaster Recovery through Forced Quorum (SQL Server).

  5. Where can I find information on support for distributed transactions in AG configurations?

    See: Transactions - availability groups and database mirroring.

  6. How to update AlwaysOn configurations?

    See: Upgrading Always On Availability Group Replica Instances.

  7. How to add TDE (Transparent Data Encryption) enabled database to AG configuration?

    To add TDE enabled DB to AG, see: How to configure Always On for a TDE database.

  8. How to configure alerts for checking if the secondary is lagging behind the primary?

    You can use the following script:

    SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
    dr_state.database_id as database_id,
    is_ag_replica_local = CASE
        WHEN ar_state.is_local = 1 THEN N'LOCAL'
        ELSE 'REMOTE'
        END,
    ag_replica_role = CASE
        WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
        ELSE ar_state.role_desc
        END,
    dr_state.last_hardened_lsn, dr_state.last_hardened_time,
    datediff(s,last_hardened_time, getdate()) as 'seconds behind primary'
    FROM (( sys.availability_groups AS ag
    JOIN sys.availability_replicas AS ar
        ON ag.group_id = ar.group_id)
    JOIN sys.dm_hadr_availability_replica_states AS ar_state
        ON ar.replica_id = ar_state.replica_id)
    JOIN sys.dm_hadr_database_replica_states dr_state
        on ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id
    
  9. How to get alerted if the state of the database is other than synchronized?

    You can use the following script:

    SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
    dr_state.database_id as database_id,
    is_ag_replica_local = CASE
        WHEN ar_state.is_local = 1 THEN N'LOCAL'
        ELSE 'REMOTE'
        END,
    ag_replica_role = CASE
        WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
        ELSE ar_state.role_desc
        END,
    ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
    FROM (( sys.availability_groups AS ag
    JOIN sys.availability_replicas AS ar
        ON ag.group_id = ar.group_id )
    JOIN sys.dm_hadr_availability_replica_states AS ar_state
        ON ar.replica_id = ar_state.replica_id)
    JOIN sys.dm_hadr_database_replica_states dr_state
        ON ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id
    

    You can also review the following links for additional methods to monitor AlwaysOn groups:

More information about AlwaysOn Availability Groups