Configure SLES Cluster for SQL Server Availability Group

This guide provides instructions to create a two-node cluster for SQL Server on SUSE Linux Enterprise Server (SLES) 12 SP2. The clustering layer is based on SUSE High Availability Extension (HAE) built on top of Pacemaker.

For more details on cluster configuration, resource agent options, management, best practices, and recommendations, see SUSE Linux Enterprise High Availability Extension 12 SP2.

Note

At this point, SQL Server's integration with Pacemaker on Linux is not as coupled as with WSFC on Windows. SQL Server service on Linux is not cluster aware. Pacemaker controls all of the orchestration of the cluster resources as if SQL Server were a standalone instance. Also, virtual network name is specific to WSFC, there is no equivalent of the same in Pacemaker. On Linux, Always On Availability Group Dynamic Management Views (DMVs) will return empty rows. You can still create a listener to use it for transparent reconnection after failover, but you will have to manually register the listener name in the DNS server with the IP used to create the virtual IP resource (as explained below).

Prerequisites

To complete the end-to-end scenario below you need two machines to deploy the two nodes cluster. The steps below outline how to configure these servers.

Setup and configure the operating system on each cluster node

The first step is to configure the operating system on the cluster nodes. For this walk through, use SLES 12 SP2 with a valid subscription for the HA add-on.

Install and configure SQL Server service on each cluster node

  1. Install and setup SQL Server service on both nodes. For detailed instructions see Install SQL Server on Linux.

  2. Designate one node as primary and the other as secondary, for purposes of configuration. Use these terms throughout this guide.

  3. Make sure nodes that are going to be part of the cluster can communicate to each other.

    The following example shows /etc/hosts with additions for two nodes named SLES1 and SLES2.

    127.0.0.1   localhost
    10.128.18.128 SLES1
    10.128.16.77 SLES2
    

    All cluster nodes must be able to access each other via SSH. Tools like hb_report or crm_report (for troubleshooting) and Hawk's History Explorer require passwordless SSH access between the nodes, otherwise they can only collect data from the current node. In case you use a non-standard SSH port, use the -X option (see man page). For example, if your SSH port is 3479, invoke a crm_report with:

    crm_report -X "-p 3479" [...]
    

    For additional information, see the SLES Administration Guide - Miscellaneous section.

Create a SQL Server login for Pacemaker

  1. On all SQL Servers, create a Server login for Pacemaker. The following Transact-SQL creates a login:

    USE [master]
    GO
    CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!'
    
    ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]
    

    Alternatively, you can set the permissions at a more granular level. The Pacemaker login requires ALTER, CONTROL, and VIEW DEFINITION PERMISSION. For more information, see GRANT Availability Group Permissions (Transact-SQL).

    The following Transact-SQL grants only the required permission to the Pacemaker login. In the statement below 'ag1' is the name of the availability group that will be added as a cluster resource.

    GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin
    
  2. On all SQL Servers, save the credentials for the SQL Server login.

    echo 'pacemakerLogin' >> ~/pacemaker-passwd
    echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd
    sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd
    sudo chown root:root /var/opt/mssql/secrets/passwd
    sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root
    

Configure an Always On Availability Group

On Linux servers configure the availability group and then configure the cluster resources. To configure the availability group, see Configure Always On Availability Group for SQL Server on Linux

Install and configure Pacemaker on each cluster node

  1. Install the High Availability extension

    For reference, see Installing SUSE Linux Enterprise Server and High Availability Extension

  2. Install SQL Server resource agent package on both nodes.

    sudo zypper install mssql-server-ha
    

Set up the first node

Refer to SLES installation instructions

  1. Log in as root to the physical or virtual machine you want to use as cluster node.
  2. Start the bootstrap script by executing:

    ha-cluster-init
    

    If NTP has not been configured to start at boot time, a message appears.

    If you decide to continue anyway, the script will automatically generate keys for SSH access and for the Csync2 synchronization tool and start the services needed for both.

  3. To configure the cluster communication layer (Corosync):

    a. Enter a network address to bind to. By default, the script will propose the network address of eth0. Alternatively, enter a different network address, for example the address of bond0.

    b. Enter a multicast address. The script proposes a random address that you can use as default.

    c. Enter a multicast port. The script proposes 5405 as default.

    d. To configure SBD (), enter a persistent path to the partition of your block device that you want to use for SBD. The path must be consistent across all nodes in the cluster. Finally, the script will start the Pacemaker service to bring the one-node cluster online and enable the Web management interface Hawk2. The URL to use for Hawk2 is displayed on the screen.

  4. For any details of the setup process, check /var/log/sleha-bootstrap.log. You now have a running one-node cluster. Check the cluster status with crm status:

    crm status
    

    You can also see cluster configuration with crm configure show xml or crm configure show.

  5. The bootstrap procedure creates a Linux user named hacluster with the password linux. Replace the default password with a secure one as soon as possible:

    passwd hacluster
    

Add nodes to the existing cluster

If you have a cluster running with one or more nodes, add more cluster nodes with the ha-cluster-join bootstrap script. The script only needs access to an existing cluster node and will complete the basic setup on the current machine automatically. Follow the steps below:

If you have configured the existing cluster nodes with the YaST cluster module, make sure the following prerequisites are fulfilled before you run ha-cluster-join:

  • The root user on the existing nodes has SSH keys in place for passwordless login.
  • Csync2 is configured on the existing nodes. For details, refer to Configuring Csync2 with YaST.
  1. Log in as root to the physical or virtual machine supposed to join the cluster.
  2. Start the bootstrap script by executing:

    ha-cluster-join
    

    If NTP has not been configured to start at boot time, a message appears.

  3. If you decide to continue anyway, you will be prompted for the IP address of an existing node. Enter the IP address.
  4. If you have not already configured a passwordless SSH access between both machines, you will also be prompted for the root password of the existing node. After logging in to the specified node, the script will copy the Corosync configuration, configure SSH and Csync2, and will bring the current machine online as new cluster node. Apart from that, it will start the service needed for Hawk. If you have configured shared storage with OCFS2, it will also automatically create the mountpoint directory for the OCFS2 file system.
  5. Repeat the steps above for all machines you want to add to the cluster.

  6. For details of the process, check /var/log/ha-cluster-bootstrap.log.

  7. Check the cluster status with crm status. If you have successfully added a second node, the output will be similar to the following:

    crm status
    
    2 nodes configured
    1 resource configured
    Online: [ SLES1 SLES2 ]
    Full list of resources:
     admin_addr     (ocf::heartbeat:IPaddr2):       Started SLES1
    
    Note

    admin_addr is the virtual IP cluster resource which is configured during initial one-node cluster setup.

After adding all nodes, check if you need to adjust the no-quorum-policy in the global cluster options. This is especially important for two-node clusters. For more information, refer to Section 4.1.2, Option no-quorum-policy.

Configure the cluster resources for SQL Server

Refer to SLES Administration Guid

Create availability group resource

The following command creates and configures the availability group resource for 2 replicas of availability group [ag1]. Run the command on one of the nodes in the cluster:

  1. Run crm configure to open the crm prompt:

    crm configure 
    
  2. In the crm prompt, run the command below to configure the resource properties.

    primitive ag_cluster \
       ocf:mssql:ag \
       params ag_name="ag1" \
       op monitor interval="10" role="Master" \
       op monitor interval="11" role="Slave" 
    ms ms-ag_cluster ag_cluster \
       meta master-max="1" master-node-max="1" clone-max="2" \
       clone-node-max="1" notify="true"
    commit
    

Create virtual IP resource

If you did not create the virtual IP resource when you ran ha-cluster-init you can create this resource now. The following command creates a virtual IP resource. Replace <**0.0.0.0**> with an available address from your network and <**24**> with the number of bits in the CIDR subnet mask. Run on one node.

crm configure \
primitive admin_addr \
   ocf:heartbeat:IPaddr2 \
   params ip=<**0.0.0.0**> \
      cidr_netmask=<**24**> \
   op monitor interval="12s"

Add colocation constraint

To set colocation constraint for the virtual IP to run on same node as the master, run the following command on one node:

crm configure
colocation vip_on_master inf: \
    admin_addr ms-ag_cluster:Master
commit

Add ordering constraint

The colocation constraint has an implicit ordering constraint. It moves the virtual IP resource before it moves the availability group resource. By default the sequence of events is:

  1. User issues resource migrate to the availability group master from node1 to node2.
  2. The virtual IP resource stops on node 1.
  3. The virtual IP resource starts on node 2. At this point, the IP address temporarily points to node 2 while node 2 is still a pre-failover secondary.
  4. The availability group master on node 1 is demoted to slave.
  5. The availability group slave on node 2 is promoted to master.

To prevent the IP address from temporarily pointing to the node with the pre-failover secondary, add an ordering constraint. To add an ordering constraint, run the following command on one node:

crm configure \
   order ag_first inf: ms-ag_cluster:promote admin_addr:start

Manual failover

Important

After you configure the cluster and add the availability group as a cluster resource, you cannot use Transact-SQL to fail over the availability group resources. SQL Server cluster resources on Linux are not coupled as tightly with the operating system as they are on a Windows Server Failover Cluster (WSFC). SQL Server service is not aware of the presence of the cluster. All orchestration is done through the cluster management tools. In SLES use crm.

Important

If the availability group is a cluster resource, there is a known issue in current release where manual failover to an asynchronous replica does not work. This will be fixed in the upcoming release. Manual or automatic failover to a synchronous replica will succeed.

Manage failover of the availability group with crm. Do not initiate failover with Transact-SQL. To manually failover to cluster node2, run the following command.

crm resource migrate ms-ag_cluster sles1
Note

At this time manual failover to an asynchronous replica does not work properly. This will be fixed in a future release.

During a manual move, the migrate command adds a location constraint for the resource to be placed on the new target node. To see the new constraint, run the following command after manually moving the resource:

crm config show

To remove the constraint run the following command. In the following command ms-ag_cluster is the name of the resource that was moved. Replace this name with the name of your resource:

crm resource clear ms-ag_cluster

Alternatively, you can run the following command to remove the location constraint. In the following command cli-prefer-ms-ag_cluster is the ID of the constraint. crm config show returns this ID.

crm configure
delete cli-prefer-ms-ag_cluster 
commit
Note

Automatic failover does not add a location constraint, so no cleanup is necessary.

For more information, see SLES Admininstration Guide - Resources

For additional details see:

Managing synchronous commit mode

Warning

In some cases, a SQL Server availability group in synchronous commit mode on Linux may be vulnerable to data loss. See the following details on the root cause and the workaround to make sure data loss is avoided. This issue is going to be fixed in the upcoming releases.

Pacemaker notification for Availability Group resource promotion

Before CTP 1.4 release, the Pacemaker resource agent for availability groups could not know if a replica marked as SYNCHRONOUS_COMMIT was really up-to-date or not. It was possible that the replica had stopped synchronizing with the primary but was not aware. Thus the agent could promote an out-of-date replica to primary - which if successful would cause data loss.

SQL Server vNext CTP 1.4 adds sequence_number to sys.availability_groups to solve this issue. sequence_number is a monotonically increasing BIGINT that represents how up-to-date the local AG replica is with respect to the rest of the system. Performing failovers, adding/removing replicas and other AG operations update this number. The number is updated on the primary, then pushed to secondaries. Thus a secondary that is up-to-date will have the same number as the primary.

When Pacemaker decides to promote a replica to primary, it first sends a notification to all replicas to extract the sequence number and store it. Next, when Pacemaker actually tries to promote a replica to primary, the replica only promotes itself if its sequence number is the highest of all the sequence numbers and rejects the promote operation otherwise. In this way only the replica with the highest sequence number can be promoted to primary, ensuring no data loss.

Note that this is only guaranteed to work as long as at least one replica available for promotion has the same sequence number as the previous primary. Because the resource agent requires Pacemaker to send notifications to all replicas, the availability resource needs to be configured with notify=true. For every existing ocf:mssql:ag resource, the user will need to update he resource configuration with notify=true before updating the mssql-server-ha package to CTP 1.4, otherwise the resource will go into error state.

How to avoid potential for data loss

In the following case, an availability group may still be vulnerable to data loss. In a cluster that includes five nodes, if there is an availability group with replicas on three instances of SQL Server, it is possible for the primary replica and one secondary replica to be ahead of the other secondary replica. When they are ahead, the sequence_number in sys.availability_groups will be higher than the other replicas. In this condition, if the two replicas lose connectivity with the rest of the cluster, Pacemaker may see the remaining three nodes as a quorum, and allow the latent replica to promote itself to primary. In this situation, there is potential for data loss.

sqlVnext introduces a new feature to force a certain number of secondaries to be available before any transactions can be committed on the primary. REQUIRED_COPIES_TO_COMMIT allows you to set a number of replicas that must commit to secondary replica database transaction logs before a transaction can proceed. You can use this option with CREATE AVAILABILITY GROUP or ALTER AVAILABILITY GROUP. See CREATE AVAILABILITY GROUP.

To avoid the potential of data loss described above set REQUIRED_COPIES_TO_COMMIT to the maximum number of synchronized replicas. Upcoming releases will have built in logic to handle setting REQUIRED_COPIES_TO_COMMIT automatically. When REQUIRED_COPIES_TO_COMMIT is set, transactions at the primary replica databases will wait until the transaction is committed on the required number of synchronous secondary replica database transaction logs. If enough synchronous secondary replicas are not online, transactions will stop until communication with sufficient secondary replicas resume.

The following example sets an availability group name [ag1] to REQUIRED_COPIES_TO_COMMIT = 2.

ALTER AVAILABILITY GROUP [ag1]
SET (REQUIRED_COPIES_TO_COMMIT = 2)

In the example above, if the availability group has two secondary replicas, it will wait until both secondary replicas acknowledge commits. If one becomes unresponsive, transactions will be blocked.

Removing Nodes From An Existing Cluster

If you have a cluster running (with at least two nodes), you can remove single nodes from the cluster with the sleha-remove bootstrap script. You need to know the IP address or host name of the node you want to remove from the cluster. Follow the steps below:

  1. Log in as root to one of the cluster nodes.
  2. Start the bootstrap script by executing:

    ha-cluster-remove -c IP_ADDR_OR_HOSTNAME
    

    The script enables the sshd, stops the pacemaker service on the specified node, and propagates the files to synchronize with Csync2 across the remaining nodes.

    If you specified a host name and the node to remove cannot be contacted (or the host name cannot be resolved), the script will inform you and ask whether to remove the node anyway. If you specified an IP address and the node cannot be contacted, you will be asked to enter the host name and to confirm whether to remove the node anyway.

  3. To remove more nodes, repeat the step above.

  4. For details of the process, check /var/log/ha-cluster-bootstrap.log.

Removing the High Availability Extension Software From a Machine

To remove the High Availability Extension software from a machine that you no longer need as cluster node, proceed as follows:

  1. Stop the cluster service:

    rcopenais stop
    
  2. Remove the High Availability Extension add-on:

    zypper rm -t products sle-hae
    

Next steps

Create SQL Server Availability Group