Configure RHEL Cluster for SQL Server Availability Group

THIS TOPIC APPLIES TO: noSQL Server on WindowsyesSQL Server on LinuxnoAzure SQL DatabasenoAzure SQL Data WarehousenoParallel Data Warehouse

This document explains how to create a three-node availability group cluster for SQL Server on Red Hat Enterprise Linux. For high availability, an availability group on Linux requires three nodes - see High availability and data protection for availability group configurations. The clustering layer is based on Red Hat Enterprise Linux (RHEL) HA add-on built on top of Pacemaker.

Note

Access to Red Hat full documentation requires a valid subscription.

For more details on cluster configuration, resource agents options, and management, visit RHEL reference documentation.

Note

SQL Server is not as tightly integrated with Pacemaker on Linux as it is with Windows Server failover clustering. A SQL Server instance is not aware of the cluster. Pacemaker provides cluster resource orchestration. Also, the virtual network name is specific to Windows Server failover clustering - there is no equivalent in Pacemaker. Availability group dynamic management views (DMVs) that query cluster information return empty rows on Pacemaker clusters. To create a listener for transparent reconnection after failover, manually register the listener name in DNS with the IP used to create the virtual IP resource.

The following sections walk through the steps to set up a Pacemaker cluster and add an availability group as resource in the cluster for high availability.

Roadmap

The steps to create an availability group on Linux servers for high availability are different from the steps on a Windows Server failover cluster. The following list describes the high-level steps:

  1. Configure SQL Server on the cluster nodes.

  2. Create the availability group.

  3. Configure a cluster resource manager, like Pacemaker. These instructions are in this document.

    The way to configure a cluster resource manager depends on the specific Linux distribution.

    Important

    Production environments require a fencing agent, like STONITH for high availability. The demonstrations in this documentation do not use fencing agents. The demonstrations are for testing and validation only.

    A Linux cluster uses fencing to return the cluster to a known state. The way to configure fencing depends on the distribution and the environment. Currently, fencing is not available in some cloud environments. For more information, see Support Policies for RHEL High Availability Clusters - Virtualization Platforms.

  4. Add the availability group as a resource in the cluster.

Configure high availability for RHEL

To configure high availability for RHEL, enable the high availability subscription and then configure Pacemaker.

Enable the high availability subscription for RHEL

Each node in the cluster must have an appropriate subscription for RHEL and the High Availability Add on. Review the requirements at How to install High Availability cluster packages in Red Hat Enterprise Linux. Follow these steps to configure the subscription and repos:

  1. Register the system.

    sudo subscription-manager register
    

    Provide your user name and password.

  2. List the available pools for registration.

    sudo subscription-manager list --available
    

    From the list of available pools, note the pool ID for the high availability subscription.

  3. Update the following script. Replace <pool id> with the pool ID for high availability from the preceding step. Run the script to attach the subscription.

    sudo subscription-manager attach --pool=<pool id>
    
  4. Enable the repository.

    sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
    

For more information, see Pacemaker – The Open Source, High Availability Cluster.

After you have configured the subscription, complete the following steps to configure Pacemaker:

Configure Pacemaker

After you register the subscription, complete the following steps to configure Pacemaker:

  1. On all cluster nodes, open the Pacemaker firewall ports. To open these ports with firewalld, run the following command:

    sudo firewall-cmd --permanent --add-service=high-availability
    sudo firewall-cmd --reload
    

    If the firewall doesn’t have a built-in high-availability configuration, open the following ports for Pacemaker.

    • TCP: Ports 2224, 3121, 21064
    • UDP: Port 5405
  2. Install Pacemaker packages on all nodes.

    sudo yum install pacemaker pcs fence-agents-all resource-agents
    
  3. Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on all nodes.

    sudo passwd hacluster
    
  4. To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following command on all nodes.

    sudo systemctl enable pcsd
    sudo systemctl start pcsd
    sudo systemctl enable pacemaker
    
  5. Create the Cluster. To create the cluster, run the following command:

    sudo pcs cluster auth <node1> <node2> <node3> -u hacluster -p <password for hacluster>
    sudo pcs cluster setup --name <clusterName> <node1> <node2> <node3> 
    sudo pcs cluster start --all
    
    Note

    If you previously configured a cluster on the same nodes, you need to use --force option when running pcs cluster setup. This option is equivalent to running pcs cluster destroy. To re-enable pacemaker, run sudo systemctl enable pacemaker.

  6. Install SQL Server resource agent for SQL Server. Run the following commands on all nodes.

    sudo yum install mssql-server-ha
    

After Pacemaker is configured, use pcs to interact with the cluster. Execute all commands on one node from the cluster.

Configure fencing (STONITH)

Pacemaker cluster vendors require STONITH to be enabled and a fencing device configured for a supported cluster setup. STONITH stands for "shoot the other node in the head." When the cluster resource manager cannot determine the state of a node or of a resource on a node, fencing brings the cluster to a known state again.

Resource level fencing ensures that there is no data corruption in case of an outage by configuring a resource. For example, you can use resource level fencing to mark the disk on a node as outdated when the communication link goes down.

Node level fencing ensures that a node does not run any resources. This is done by resetting the node. Pacemaker supports a great variety of fencing devices. Examples include an uninterruptible power supply or management interface cards for servers.

For information about STONITH, and fencing, see the following articles:

Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later). The following script disables node level fencing:

sudo pcs property set stonith-enabled=false
Important

Disabling STONITH is just for testing purposes. If you plan to use Pacemaker in a production environment, you should plan a STONITH implementation depending on your environment and keep it enabled. RHEL does not provide fencing agents for any cloud environments (including Azure) or Hyper-V. Consequentially, the cluster vendor does not offer support for running production clusters in these environments. We are working on a solution for this gap that will be available in future releases.

Set cluster property start-failure-is-fatal to false

start-failure-is-fatal indicates whether a failure to start a resource on a node prevents further start attempts on that node. When set to false, the cluster decides whether to try starting on the same node again based on the resource's current failure count and migration threshold. After failover occurs, Pacemaker retries starting the availability group resource on the former primary once the SQL instance is available. Pacemaker demotes the replica to secondary and it automatically rejoins the availability group.

To update the property value to false run:

sudo pcs property set start-failure-is-fatal=false
Warning

After an automatic failover, when start-failure-is-fatal = true the resource manager will attempt to start the resource. If it fails on the first attempt, manually run pcs resource cleanup <resourceName> to clean up the resource failure count and reset the configuration.

For information on Pacemaker cluster properties, see Pacemaker Clusters Properties.

Create a SQL Server login for Pacemaker

  1. On all SQL Servers, create a Server login for Pacemaker. The following Transact-SQL creates a login:

    USE [master]
    GO
    CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!'
    
    ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]
    

    Alternatively, you can set the permissions at a more granular level. The Pacemaker login requires ALTER, CONTROL, and VIEW DEFINITION PERMISSION for managing the availability group as well as VIEW SERVER STATE for the login to be able to run sp_server_diagnostics. For more information, see GRANT Availability Group Permissions (Transact-SQL) and sp_server_diagnostic permissions.

    The following Transact-SQL grants only the required permission to the Pacemaker login. In the statement below 'ag1' is the name of the availability group that will be added as a cluster resource.

    GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin
    GRANT VIEW SERVER STATE TO pacemakerLogin
    
  2. On all SQL Servers, save the credentials for the SQL Server login.

    echo 'pacemakerLogin' >> ~/pacemaker-passwd
    echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd
    sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd
    sudo chown root:root /var/opt/mssql/secrets/passwd
    sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root
    

Create availability group resource

To create the availability group resource, use pcs resource create command and set the resource properties. The following command creates a ocf:mssql:ag master/slave type resource for availability group with name ag1.

sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 --master meta notify=true
Note

When you create the resource, and periodically afterwards, the Pacemaker resource agent automatically sets the value of REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT on the availability group based on the availability group's configuration. For example, if the availability group has three synchronous replicas, the agent will set REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT to 1. For details and additional configuration options, see High availability and data protection for availability group configurations.

Create virtual IP resource

To create the virtual IP address resource, run the following command on one node. Use an available static IP address from the network. Replace the IP address between **<10.128.16.240>** with a valid IP address.

sudo pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=**<10.128.16.240>**

There is no virtual server name equivalent in Pacemaker. To use a connection string that points to a string server name instead of an IP address, register the virtual IP resource address and desired virtual server name in DNS. For DR configurations, register the desired virtual server name and IP address with the DNS servers on both primary and DR site.

Add colocation constraint

Almost every decision in a Pacemaker cluster, like choosing where a resource should run, is done by comparing scores. Scores are calculated per resource. The cluster resource manager chooses the node with the highest score for a particular resource. If a node has a negative score for a resource, the resource cannot run on that node.

On a pacemaker cluster, you can manipulate the decisions of the cluster with constraints. Constraints have a score. If a constraint has a score lower than INFINITY, Pacemaker regards it as recommendation. A score of INFINITY is mandatory.

To ensure that primary replica and the virtual ip resources run on the same host, define a colocation constraint with a score of INFINITY. To add the colocation constraint, run the following command on one node.

sudo pcs constraint colocation add virtualip ag_cluster-master INFINITY with-rsc-role=Master

Add ordering constraint

The colocation constraint has an implicit ordering constraint. It moves the virtual IP resource before it moves the availability group resource. By default the sequence of events is:

  1. User issues pcs resource move to the availability group primary from node1 to node2.
  2. The virtual IP resource stops on node 1.
  3. The virtual IP resource starts on node 2.

    Note

    At this point, the IP address temporarily points to node 2 while node 2 is still a pre-failover secondary.

  4. The availability group primary on node 1 is demoted to secondary.

  5. The availability group secondary on node 2 is promoted to primary.

To prevent the IP address from temporarily pointing to the node with the pre-failover secondary, add an ordering constraint.

To add an ordering constraint, run the following command on one node:

sudo pcs constraint order promote ag_cluster-master then start virtualip
Important

After you configure the cluster and add the availability group as a cluster resource, you cannot use Transact-SQL to fail over the availability group resources. SQL Server cluster resources on Linux are not coupled as tightly with the operating system as they are on a Windows Server Failover Cluster (WSFC). SQL Server service is not aware of the presence of the cluster. All orchestration is done through the cluster management tools. In RHEL or Ubuntu use pcs and in SLES use crm tools.

Manually fail over the availability group with pcs. Do not initiate failover with Transact-SQL. For instructions, see Failover.

Next steps

Operate HA availability group