Setting up Pacemaker on Red Hat Enterprise Linux in Azure

Read the following SAP Notes and papers first:

Cluster installation

Pacemaker on RHEL overview

Note

Red Hat doesn't support software-emulated watchdog. Red Hat doesn't support SBD on cloud platforms. For details see Support Policies for RHEL High Availability Clusters - sbd and fence_sbd. The only supported fencing mechanism for Pacemaker Red Hat Enterprise Linux clusters on Azure, is Azure fence agent.

The following items are prefixed with either [A] - applicable to all nodes, [1] - only applicable to node 1 or [2] - only applicable to node 2.

  1. [A] Register

    Register your virtual machines and attach it to a pool that contains repositories for RHEL 7.

    sudo subscription-manager register
    # List the available pools
    sudo subscription-manager list --available --matches '*SAP*'
    sudo subscription-manager attach --pool=<pool id>
    

    Note that by attaching a pool to an Azure Marketplace PAYG RHEL image, you will be effectively double-billed for your RHEL usage: once for the PAYG image, and once for the RHEL entitlement in the pool you attach. To mitigate this, Azure now provides BYOS RHEL images. More information is available here.

  2. [A] Enable RHEL for SAP repos

    In order to install the required packages, enable the following repositories.

    sudo subscription-manager repos --disable "*"
    sudo subscription-manager repos --enable=rhel-7-server-rpms
    sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
    sudo subscription-manager repos --enable=rhel-sap-for-rhel-7-server-rpms
    sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-eus-rpms
    
  3. [A] Install RHEL HA Add-On

    sudo yum install -y pcs pacemaker fence-agents-azure-arm nmap-ncat
    

    Important

    We recommend the following versions of Azure Fence agent (or later) for customers to benefit from a faster failover time, if a resource stop fails or the cluster nodes cannot communicate which each other anymore:
    RHEL 7.6: fence-agents-4.2.1-11.el7_6.8
    RHEL 7.5: fence-agents-4.0.11-86.el7_5.8
    RHEL 7.4: fence-agents-4.0.11-66.el7_4.12
    For more information, see Azure VM running as a RHEL High Availability cluster member take a very long time to be fenced, or fencing fails / times-out before the VM shuts down.

    Check the version of the Azure fence agent. If necessary, update it to a version equal to or later than the stated above.

    # Check the version of the Azure Fence Agent
     sudo yum info fence-agents-azure-arm
    

    Important

    If you need to update the Azure Fence agent, and if using custom role, make sure to update the custom role to include action powerOff. For details see Create a custom role for the fence agent.

  4. [A] Setup host name resolution

    You can either use a DNS server or modify the /etc/hosts on all nodes. This example shows how to use the /etc/hosts file. Replace the IP address and the hostname in the following commands. The benefit of using /etc/hosts is that your cluster becomes independent of DNS, which could be a single point of failures too.

    sudo vi /etc/hosts
    

    Insert the following lines to /etc/hosts. Change the IP address and hostname to match your environment

    # IP address of the first cluster node
    10.0.0.6 prod-cl1-0
    # IP address of the second cluster node
    10.0.0.7 prod-cl1-1
    
  5. [A] Change hacluster password to the same password

    sudo passwd hacluster
    
  6. [A] Add firewall rules for pacemaker

    Add the following firewall rules to all cluster communication between the cluster nodes.

    sudo firewall-cmd --add-service=high-availability --permanent
    sudo firewall-cmd --add-service=high-availability
    
  7. [A] Enable basic cluster services

    Run the following commands to enable the Pacemaker service and start it.

    sudo systemctl start pcsd.service
    sudo systemctl enable pcsd.service
    
  8. [1] Create Pacemaker cluster

    Run the following commands to authenticate the nodes and create the cluster. Set the token to 30000 to allow Memory preserving maintenance. For more information, see this article for Linux.

    sudo pcs cluster auth prod-cl1-0 prod-cl1-1 -u hacluster
    sudo pcs cluster setup --name nw1-azr prod-cl1-0 prod-cl1-1 --token 30000
    sudo pcs cluster start --all
    
    # Run the following command until the status of both nodes is online
    sudo pcs status
    
    # Cluster name: nw1-azr
    # WARNING: no stonith devices and stonith-enabled is not false
    # Stack: corosync
    # Current DC: prod-cl1-1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
    # Last updated: Fri Aug 17 09:18:24 2018
    # Last change: Fri Aug 17 09:17:46 2018 by hacluster via crmd on prod-cl1-1
    #
    # 2 nodes configured
    # 0 resources configured
    #
    # Online: [ prod-cl1-0 prod-cl1-1 ]
    #
    # No resources
    #
    #
    # Daemon Status:
    #   corosync: active/disabled
    #   pacemaker: active/disabled
    #   pcsd: active/enabled
    
  9. [A] Set Expected Votes

    sudo pcs quorum expected-votes 2
    

Create STONITH device

The STONITH device uses a Service Principal to authorize against Microsoft Azure. Follow these steps to create a Service Principal.

  1. Go to https://portal.azure.com
  2. Open the Azure Active Directory blade
    Go to Properties and write down the Directory ID. This is the tenant ID.
  3. Click App registrations
  4. Click New Registration
  5. Enter a Name, select "Accounts in this organization directory only"
  6. Select Application Type "Web", enter a sign-on URL (for example http://localhost) and click Add
    The sign-on URL is not used and can be any valid URL
  7. Select Certificates and Secrets, then click New client secret
  8. Enter a description for a new key, select "Never expires" and click Add
  9. Write down the Value. It is used as the password for the Service Principal
  10. Select Overview. Write down the Application ID. It is used as the username (login ID in the steps below) of the Service Principal

[1] Create a custom role for the fence agent

The Service Principal does not have permissions to access your Azure resources by default. You need to give the Service Principal permissions to start and stop (power-off) all virtual machines of the cluster. If you did not already create the custom role, you can create it using PowerShell or Azure CLI

Use the following content for the input file. You need to adapt the content to your subscriptions that is, replace c276fc76-9cd4-44c9-99a7-4fd71546436e and e91d47c4-76f3-4271-a796-21b4ecfe3624 with the Ids of your subscription. If you only have one subscription, remove the second entry in AssignableScopes.

{
  "Name": "Linux Fence Agent Role",
  "Id": null,
  "IsCustom": true,
  "Description": "Allows to power-off and start virtual machines",
  "Actions": [
    "Microsoft.Compute/*/read",
    "Microsoft.Compute/virtualMachines/powerOff/action",
    "Microsoft.Compute/virtualMachines/start/action"
  ],
  "NotActions": [
  ],
  "AssignableScopes": [
    "/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e",
    "/subscriptions/e91d47c4-76f3-4271-a796-21b4ecfe3624"
  ]
}

[A] Assign the custom role to the Service Principal

Assign the custom role "Linux Fence Agent Role" that was created in the last chapter to the Service Principal. Do not use the Owner role anymore!

  1. Go to https://portal.azure.com
  2. Open the All resources blade
  3. Select the virtual machine of the first cluster node
  4. Click Access control (IAM)
  5. Click Add role assignment
  6. Select the role "Linux Fence Agent Role"
  7. Enter the name of the application you created above
  8. Click Save

Repeat the steps above for the second cluster node.

[1] Create the STONITH devices

After you edited the permissions for the virtual machines, you can configure the STONITH devices in the cluster.


sudo pcs property set stonith-timeout=900

Use the following command to configure the fence device.

Note

Option 'pcmk_host_map' is ONLY required in the command, if the RHEL host names and the Azure node names are NOT identical. Refer to the bold section in the command.

sudo pcs stonith create rsc_st_azure fence_azure_arm login="login ID" passwd="password" resourceGroup="resource group" tenantId="tenant ID" subscriptionId="subscription id" pcmk_host_map="prod-cl1-0:10.0.0.6;prod-cl1-1:10.0.0.7" power_timeout=240 pcmk_reboot_timeout=900

[1] Enable the use of a STONITH device

sudo pcs property set stonith-enabled=true

Next steps