Setting up Pacemaker on SUSE Linux Enterprise Server in Azure

There are two options to set up a Pacemaker cluster in Azure. You can either use a fencing agent, which takes care of restarting a failed node via the Azure APIs or you can use an SBD device.

The SBD device requires at least one additional virtual machine that acts as an iSCSI target server and provides an SBD device. These iSCSI target servers can however be shared with other Pacemaker clusters. The advantage of using an SBD device is a faster failover time and, if you are using SBD devices on-premises, doesn't require any changes on how you operate the pacemaker cluster. You can use up to three SBD devices for a Pacemaker cluster to allow an SBD device to become unavailable, for example during OS patching of the iSCSI target server. If you want to use more than one SBD device per Pacemaker, make sure to deploy multiple iSCSI target servers and connect one SBD from each iSCSI target server. We recommend using either one SBD device or three. Pacemaker will not be able to automatically fence a cluster node if you only configure two SBD devices and one of them is not available. If you want to be able to fence when one iSCSI target server is down, you have to use three SBD devices and therefore three iSCSI target servers.

If you don't want to invest in one additional virtual machine, you can also use the Azure Fence agent. The downside is that a failover can take between 10 to 15 minutes if a resource stop fails or the cluster nodes cannot communicate which each other anymore.

Pacemaker on SLES overview

Important

When planning and deploying Linux Pacemaker clustered nodes and SBD devices, it is essential for the overall reliability of the complete cluster configuration that the routing between the VMs involved and the VM(s) hosting the SBD device(s) is not passing through any other devices like NVAs. Otherwise, issues and maintenance events with the NVA can have a negative impact on the stability and reliability of the overall cluster configuration. In order to avoid such obstacles, don't define routing rules of NVAs or User Defined Routing rules that route traffic between clustered nodes and SBD devices through NVAs and similar devices when planning and deploying Linux Pacemaker clustered nodes and SBD devices.

SBD fencing

Follow these steps if you want to use an SBD device for fencing.

Set up iSCSI target servers

You first need to create the iSCSI target virtual machines. iSCSI target servers can be shared with multiple Pacemaker clusters.

  1. Deploy new SLES 12 SP1 or higher virtual machines and connect to them via ssh. The machines don't need to be large. A virtual machine size like Standard_E2s_v3 or Standard_D2s_v3 is sufficient. Make sure to use Premium storage the OS disk.

Run the following commands on all iSCSI target virtual machines.

  1. Update SLES

    sudo zypper update
    
  2. Remove packages

    To avoid a known issue with targetcli and SLES 12 SP3, uninstall the following packages. You can ignore errors about packages that cannot be found

    sudo zypper remove lio-utils python-rtslib python-configshell targetcli
    
  3. Install iSCSI target packages

    sudo zypper install targetcli-fb dbus-1-python
    
  4. Enable the iSCSI target service

    sudo systemctl enable targetcli
    sudo systemctl start targetcli
    

Create iSCSI device on iSCSI target server

Run the following commands on all iSCSI target virtual machines to create the iSCSI disks for the clusters used by your SAP systems. In the following example, SBD devices for multiple clusters are created. It shows you how you would use one iSCSI target server for multiple clusters. The SBD devices are placed on the OS disk. Make sure that you have enough space.

nfs is used to identify the NFS cluster, ascsnw1 is used to identify the ASCS cluster of NW1, dbnw1 is used to identify the database cluster of NW1, nfs-0 and nfs-1 are the hostnames of the NFS cluster nodes, nw1-xscs-0 and nw1-xscs-1 are the hostnames of the NW1 ASCS cluster nodes, and nw1-db-0 and nw1-db-1 are the hostnames of the database cluster nodes. Replace them with the hostnames of your cluster nodes and the SID of your SAP system.

# Create the root folder for all SBD devices
sudo mkdir /sbd

# Create the SBD device for the NFS server
sudo targetcli backstores/fileio create sbdnfs /sbd/sbdnfs 50M write_back=false
sudo targetcli iscsi/ create iqn.2006-04.nfs.local:nfs
sudo targetcli iscsi/iqn.2006-04.nfs.local:nfs/tpg1/luns/ create /backstores/fileio/sbdnfs
sudo targetcli iscsi/iqn.2006-04.nfs.local:nfs/tpg1/acls/ create iqn.2006-04.nfs-0.local:nfs-0
sudo targetcli iscsi/iqn.2006-04.nfs.local:nfs/tpg1/acls/ create iqn.2006-04.nfs-1.local:nfs-1

# Create the SBD device for the ASCS server of SAP System NW1
sudo targetcli backstores/fileio create sbdascsnw1 /sbd/sbdascsnw1 50M write_back=false
sudo targetcli iscsi/ create iqn.2006-04.ascsnw1.local:ascsnw1
sudo targetcli iscsi/iqn.2006-04.ascsnw1.local:ascsnw1/tpg1/luns/ create /backstores/fileio/sbdascsnw1
sudo targetcli iscsi/iqn.2006-04.ascsnw1.local:ascsnw1/tpg1/acls/ create iqn.2006-04.nw1-xscs-0.local:nw1-xscs-0
sudo targetcli iscsi/iqn.2006-04.ascsnw1.local:ascsnw1/tpg1/acls/ create iqn.2006-04.nw1-xscs-1.local:nw1-xscs-1

# Create the SBD device for the database cluster of SAP System NW1
sudo targetcli backstores/fileio create sbddbnw1 /sbd/sbddbnw1 50M write_back=false
sudo targetcli iscsi/ create iqn.2006-04.dbnw1.local:dbnw1
sudo targetcli iscsi/iqn.2006-04.dbnw1.local:dbnw1/tpg1/luns/ create /backstores/fileio/sbddbnw1
sudo targetcli iscsi/iqn.2006-04.dbnw1.local:dbnw1/tpg1/acls/ create iqn.2006-04.nw1-db-0.local:nw1-db-0
sudo targetcli iscsi/iqn.2006-04.dbnw1.local:dbnw1/tpg1/acls/ create iqn.2006-04.nw1-db-1.local:nw1-db-1

# save the targetcli changes
sudo targetcli saveconfig

You can check if everything was set up correctly with

sudo targetcli ls

o- / .......................................................................................................... [...]
  o- backstores ............................................................................................... [...]
  | o- block ................................................................................... [Storage Objects: 0]
  | o- fileio .................................................................................. [Storage Objects: 3]
  | | o- sbdascsnw1 ................................................ [/sbd/sbdascsnw1 (50.0MiB) write-thru activated]
  | | | o- alua .................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ........................................................ [ALUA state: Active/optimized]
  | | o- sbddbnw1 .................................................... [/sbd/sbddbnw1 (50.0MiB) write-thru activated]
  | | | o- alua .................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ........................................................ [ALUA state: Active/optimized]
  | | o- sbdnfs ........................................................ [/sbd/sbdnfs (50.0MiB) write-thru activated]
  | |   o- alua .................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ........................................................ [ALUA state: Active/optimized]
  | o- pscsi ................................................................................... [Storage Objects: 0]
  | o- ramdisk ................................................................................. [Storage Objects: 0]
  o- iscsi ............................................................................................. [Targets: 3]
  | o- iqn.2006-04.ascsnw1.local:ascsnw1 .................................................................. [TPGs: 1]
  | | o- tpg1 ................................................................................ [no-gen-acls, no-auth]
  | |   o- acls ........................................................................................... [ACLs: 2]
  | |   | o- iqn.2006-04.nw1-xscs-0.local:nw1-xscs-0 ............................................... [Mapped LUNs: 1]
  | |   | | o- mapped_lun0 ............................................................ [lun0 fileio/sbdascsnw1 (rw)]
  | |   | o- iqn.2006-04.nw1-xscs-1.local:nw1-xscs-1 ............................................... [Mapped LUNs: 1]
  | |   |   o- mapped_lun0 ............................................................ [lun0 fileio/sbdascsnw1 (rw)]
  | |   o- luns ........................................................................................... [LUNs: 1]
  | |   | o- lun0 .......................................... [fileio/sbdascsnw1 (/sbd/sbdascsnw1) (default_tg_pt_gp)]
  | |   o- portals ..................................................................................... [Portals: 1]
  | |     o- 0.0.0.0:3260 ...................................................................................... [OK]
  | o- iqn.2006-04.dbnw1.local:dbnw1 ...................................................................... [TPGs: 1]
  | | o- tpg1 ................................................................................ [no-gen-acls, no-auth]
  | |   o- acls ........................................................................................... [ACLs: 2]
  | |   | o- iqn.2006-04.nw1-db-0.local:nw1-db-0 ................................................... [Mapped LUNs: 1]
  | |   | | o- mapped_lun0 .............................................................. [lun0 fileio/sbddbnw1 (rw)]
  | |   | o- iqn.2006-04.nw1-db-1.local:nw1-db-1 ................................................... [Mapped LUNs: 1]
  | |   |   o- mapped_lun0 .............................................................. [lun0 fileio/sbddbnw1 (rw)]
  | |   o- luns ........................................................................................... [LUNs: 1]
  | |   | o- lun0 .............................................. [fileio/sbddbnw1 (/sbd/sbddbnw1) (default_tg_pt_gp)]
  | |   o- portals ..................................................................................... [Portals: 1]
  | |     o- 0.0.0.0:3260 ...................................................................................... [OK]
  | o- iqn.2006-04.nfs.local:nfs .......................................................................... [TPGs: 1]
  |   o- tpg1 ................................................................................ [no-gen-acls, no-auth]
  |     o- acls ........................................................................................... [ACLs: 2]
  |     | o- iqn.2006-04.nfs-0.local:nfs-0 ......................................................... [Mapped LUNs: 1]
  |     | | o- mapped_lun0 ................................................................ [lun0 fileio/sbdnfs (rw)]
  |     | o- iqn.2006-04.nfs-1.local:nfs-1 ......................................................... [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ................................................................ [lun0 fileio/sbdnfs (rw)]
  |     o- luns ........................................................................................... [LUNs: 1]
  |     | o- lun0 .................................................. [fileio/sbdnfs (/sbd/sbdnfs) (default_tg_pt_gp)]
  |     o- portals ..................................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ...................................................................................... [OK]
  o- loopback .......................................................................................... [Targets: 0]
  o- vhost ............................................................................................. [Targets: 0]
  o- xen-pvscsi ........................................................................................ [Targets: 0]

Set up SBD device

Connect to the iSCSI device that was created in the last step from the cluster. Run the following commands on the nodes of the new cluster you want to create. The following items are prefixed with either [A] - applicable to all nodes, [1] - only applicable to node 1 or [2] - only applicable to node 2.

  1. [A] Connect to the iSCSI devices

    First, enable the iSCSI and SBD services.

    sudo systemctl enable iscsid
    sudo systemctl enable iscsi
    sudo systemctl enable sbd
    
  2. [1] Change the initiator name on the first node

    sudo vi /etc/iscsi/initiatorname.iscsi
    

    Change the content of the file to match the ACLs you used when creating the iSCSI device on the iSCSI target server, for example for the NFS server.

    InitiatorName=iqn.2006-04.nfs-0.local:nfs-0
    
  3. [2] Change the initiator name on the second node

    sudo vi /etc/iscsi/initiatorname.iscsi
    

    Change the content of the file to match the ACLs you used when creating the iSCSI device on the iSCSI target server

    InitiatorName=iqn.2006-04.nfs-1.local:nfs-1
    
  4. [A] Restart the iSCSI service

    Now restart the iSCSI service to apply the change

    sudo systemctl restart iscsid
    sudo systemctl restart iscsi
    

    Connect the iSCSI devices. In the example below, 10.0.0.17 is the IP address of the iSCSI target server and 3260 is the default port. iqn.2006-04.nfs.local:nfs is one of the target names that is listed when you run the first command below (iscsiadm -m discovery).

    sudo iscsiadm -m discovery --type=st --portal=10.0.0.17:3260   
    sudo iscsiadm -m node -T iqn.2006-04.nfs.local:nfs --login --portal=10.0.0.17:3260
    sudo iscsiadm -m node -p 10.0.0.17:3260 --op=update --name=node.startup --value=automatic
    
    # If you want to use multiple SBD devices, also connect to the second iSCSI target server
    sudo iscsiadm -m discovery --type=st --portal=10.0.0.18:3260   
    sudo iscsiadm -m node -T iqn.2006-04.nfs.local:nfs --login --portal=10.0.0.18:3260
    sudo iscsiadm -m node -p 10.0.0.18:3260 --op=update --name=node.startup --value=automatic
    
    # If you want to use multiple SBD devices, also connect to the third iSCSI target server
    sudo iscsiadm -m discovery --type=st --portal=10.0.0.19:3260   
    sudo iscsiadm -m node -T iqn.2006-04.nfs.local:nfs --login --portal=10.0.0.19:3260
    sudo iscsiadm -m node -p 10.0.0.19:3260 --op=update --name=node.startup --value=automatic
    

    Make sure that the iSCSI devices are available and note down the device name (in the following example /dev/sde)

    lsscsi
    
    # [2:0:0:0]    disk    Msft     Virtual Disk     1.0   /dev/sda
    # [3:0:1:0]    disk    Msft     Virtual Disk     1.0   /dev/sdb
    # [5:0:0:0]    disk    Msft     Virtual Disk     1.0   /dev/sdc
    # [5:0:0:1]    disk    Msft     Virtual Disk     1.0   /dev/sdd
    # [6:0:0:0]    disk    LIO-ORG  sbdnfs           4.0   /dev/sdd
    # [7:0:0:0]    disk    LIO-ORG  sbdnfs           4.0   /dev/sde
    # [8:0:0:0]    disk    LIO-ORG  sbdnfs           4.0   /dev/sdf
    

    Now, retrieve the IDs of the iSCSI devices.

    ls -l /dev/disk/by-id/scsi-* | grep sdd
    
    # lrwxrwxrwx 1 root root  9 Aug  9 13:20 /dev/disk/by-id/scsi-1LIO-ORG_sbdnfs:afb0ba8d-3a3c-413b-8cc2-cca03e63ef42 -> ../../sdd
    # lrwxrwxrwx 1 root root  9 Aug  9 13:20 /dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03 -> ../../sdd
    # lrwxrwxrwx 1 root root  9 Aug  9 13:20 /dev/disk/by-id/scsi-SLIO-ORG_sbdnfs_afb0ba8d-3a3c-413b-8cc2-cca03e63ef42 -> ../../sdd
    
    ls -l /dev/disk/by-id/scsi-* | grep sde
    
    # lrwxrwxrwx 1 root root  9 Feb  7 12:39 /dev/disk/by-id/scsi-1LIO-ORG_cl1:3fe4da37-1a5a-4bb6-9a41-9a4df57770e4 -> ../../sde
    # lrwxrwxrwx 1 root root  9 Feb  7 12:39 /dev/disk/by-id/scsi-360014053fe4da371a5a4bb69a419a4df -> ../../sde
    # lrwxrwxrwx 1 root root  9 Feb  7 12:39 /dev/disk/by-id/scsi-SLIO-ORG_cl1_3fe4da37-1a5a-4bb6-9a41-9a4df57770e4 -> ../../sde
    
    ls -l /dev/disk/by-id/scsi-* | grep sdf
    
    # lrwxrwxrwx 1 root root  9 Aug  9 13:32 /dev/disk/by-id/scsi-1LIO-ORG_sbdnfs:f88f30e7-c968-4678-bc87-fe7bfcbdb625 -> ../../sdf
    # lrwxrwxrwx 1 root root  9 Aug  9 13:32 /dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf -> ../../sdf
    # lrwxrwxrwx 1 root root  9 Aug  9 13:32 /dev/disk/by-id/scsi-SLIO-ORG_sbdnfs_f88f30e7-c968-4678-bc87-fe7bfcbdb625 -> ../../sdf
    

    The command list three device IDs for every SBD device. We recommend using the ID that starts with scsi-3, in the example above this is

    • /dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03
    • /dev/disk/by-id/scsi-360014053fe4da371a5a4bb69a419a4df
    • /dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf
  5. [1] Create the SBD device

    Use the device ID of the iSCSI devices to create the new SBD devices on the first cluster node.

    sudo sbd -d /dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03 -1 60 -4 120 create
    
    # Also create the second and third SBD devices if you want to use more than one.
    sudo sbd -d /dev/disk/by-id/scsi-360014053fe4da371a5a4bb69a419a4df -1 60 -4 120 create
    sudo sbd -d /dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf -1 60 -4 120 create
    
  6. [A] Adapt the SBD config

    Open the SBD config file

    sudo vi /etc/sysconfig/sbd
    

    Change the property of the SBD device, enable the pacemaker integration, and change the start mode of SBD.

    [...]
    SBD_DEVICE="/dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03;/dev/disk/by-id/scsi-360014053fe4da371a5a4bb69a419a4df;/dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf"
    [...]
    SBD_PACEMAKER="yes"
    [...]
    SBD_STARTMODE="always"
    [...]
    SBD_WATCHDOG="yes"
    

    Create the softdog configuration file

    echo softdog | sudo tee /etc/modules-load.d/softdog.conf
    

    Now load the module

    sudo modprobe -v softdog
    

Cluster installation

The following items are prefixed with either [A] - applicable to all nodes, [1] - only applicable to node 1 or [2] - only applicable to node 2.

  1. [A] Update SLES

    sudo zypper update
    
  2. [A] Install component, needed for cluster resources

    sudo zypper in socat
    
  3. [A] Configure the operating system

    In some cases, Pacemaker creates many processes and thereby exhausts the allowed number of processes. In such a case, a heartbeat between the cluster nodes might fail and lead to failover of your resources. We recommend increasing the maximum allowed processes by setting the following parameter.

    # Edit the configuration file
    sudo vi /etc/systemd/system.conf
    
    # Change the DefaultTasksMax
    #DefaultTasksMax=512
    DefaultTasksMax=4096
    
    #and to activate this setting
    sudo systemctl daemon-reload
    
    # test if the change was successful
    sudo systemctl --no-pager show | grep DefaultTasksMax
    

    Reduce the size of the dirty cache. For more information, see Low write performance on SLES 11/12 servers with large RAM.

    sudo vi /etc/sysctl.conf
    
    # Change/set the following settings
    vm.dirty_bytes = 629145600
    vm.dirty_background_bytes = 314572800
    
  4. [A] Configure cloud-netconfig-azure for HA Cluster

    Change the configuration file for the network interface as shown below to prevent the cloud network plugin from removing the virtual IP address (Pacemaker must control the VIP assignment). For more information, see SUSE KB 7023633.

    # Edit the configuration file
    sudo vi /etc/sysconfig/network/ifcfg-eth0 
    
    # Change CLOUD_NETCONFIG_MANAGE
    # CLOUD_NETCONFIG_MANAGE="yes"
    CLOUD_NETCONFIG_MANAGE="no"
    
  5. [1] Enable ssh access

    sudo ssh-keygen
    
    # Enter file in which to save the key (/root/.ssh/id_rsa): -> Press ENTER
    # Enter passphrase (empty for no passphrase): -> Press ENTER
    # Enter same passphrase again: -> Press ENTER
    
    # copy the public key
    sudo cat /root/.ssh/id_rsa.pub
    
  6. [2] Enable ssh access

    
    sudo ssh-keygen
    
    # Enter file in which to save the key (/root/.ssh/id_rsa): -> Press ENTER
    # Enter passphrase (empty for no passphrase): -> Press ENTER
    # Enter same passphrase again: -> Press ENTER
    
    # insert the public key you copied in the last step into the authorized keys file on the second server
    sudo vi /root/.ssh/authorized_keys   
    
    # copy the public key
    sudo cat /root/.ssh/id_rsa.pub
    
  7. [1] Enable ssh access

    # insert the public key you copied in the last step into the authorized keys file on the first server
    sudo vi /root/.ssh/authorized_keys
    
  8. [A] Install Fence agents

    sudo zypper install fence-agents
    

    Important

    If using Suse Linux Enterprise Server for SAP 15, be aware that you need to activate additional module and install additional component, that is prerequisite for using Azure Fence Agent. To learn more about SUSE modules and extensions see Modules and Extensions explained. Follow the instructions bellow to install Azure Python SDK.

    The following instructions on how to install Azure Python SDK are only applicable for Suse Enterprise Server for SAP 15.

    • If you are using Bring-Your-Own-Subscription, follow these instructions
    
     #Activate module PackageHub/15/x86_64
     sudo SUSEConnect -p PackageHub/15/x86_64
     #Install Azure Python SDK
     sudo zypper in python3-azure-sdk
     
    • If you are using Pay-As-You-Go subscription, follow these instructions
    #Activate module PackageHub/15/x86_64
     zypper ar https://download.opensuse.org/repositories/openSUSE:/Backports:/SLE-15/standard/ SLE15-PackageHub
     #Install Azure Python SDK
     sudo zypper in python3-azure-sdk
     
  9. [A] Setup host name resolution

    You can either use a DNS server or modify the /etc/hosts on all nodes. This example shows how to use the /etc/hosts file. Replace the IP address and the hostname in the following commands. The benefit of using /etc/hosts is that your cluster becomes independent of DNS, which could be a single point of failures too.

    sudo vi /etc/hosts
    

    Insert the following lines to /etc/hosts. Change the IP address and hostname to match your environment

    # IP address of the first cluster node
    10.0.0.6 prod-cl1-0
    # IP address of the second cluster node
    10.0.0.7 prod-cl1-1
    
  10. [1] Install Cluster

    sudo ha-cluster-init -u
    
    # ! NTP is not configured to start at system boot.
    # Do you want to continue anyway (y/n)? y
    # /root/.ssh/id_rsa already exists - overwrite (y/n)? n
    # Address for ring0 [10.0.0.6] Press ENTER
    # Port for ring0 [5405] Press ENTER
    # SBD is already configured to use /dev/disk/by-id/scsi-36001405639245768818458b930abdf69;/dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03;/dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf - overwrite (y/n)? n
    # Do you wish to configure an administration IP (y/n)? n
    
  11. [2] Add node to cluster

    sudo ha-cluster-join
    
    # ! NTP is not configured to start at system boot.
    # Do you want to continue anyway (y/n)? y
    # IP address or hostname of existing node (e.g.: 192.168.1.1) []10.0.0.6
    # /root/.ssh/id_rsa already exists - overwrite (y/n)? n
    
  12. [A] Change hacluster password to the same password

    sudo passwd hacluster
    
  13. [A] Adjust corosync settings.

    sudo vi /etc/corosync/corosync.conf
    

    Add the following bold content to the file if the values are not there or different. Make sure to change the token to 30000 to allow Memory preserving maintenance. For more information, see this article for Linux or Windows.

    [...]
      token:          30000
      token_retransmits_before_loss_const: 10
      join:           60
      consensus:      36000
      max_messages:   20
    
      interface { 
         [...] 
      }
      transport:      udpu
    } 
    nodelist {
      node {
       ring0_addr:10.0.0.6
      }
      node {
       ring0_addr:10.0.0.7
      } 
    }
    logging {
      [...]
    }
    quorum {
         # Enable and configure quorum subsystem (default: off)
         # see also corosync.conf.5 and votequorum.5
         provider: corosync_votequorum
         expected_votes: 2
         two_node: 1
    }
    

    Then restart the corosync service

    sudo service corosync restart
    

Create Azure Fence agent STONITH device

The STONITH device uses a Service Principal to authorize against Microsoft Azure. Follow these steps to create a Service Principal.

  1. Go to https://portal.azure.com
  2. Open the Azure Active Directory blade
    Go to Properties and write down the Directory ID. This is the tenant ID.
  3. Click App registrations
  4. Click New Registration
  5. Enter a Name, select "Accounts in this organization directory only"
  6. Select Application Type "Web", enter a sign-on URL (for example http://localhost) and click Add
    The sign-on URL is not used and can be any valid URL
  7. Select Certificates and Secrets, then click New client secret
  8. Enter a description for a new key, select "Never expires" and click Add
  9. Write down the Value. It is used as the password for the Service Principal
  10. Select Overview. Write down the Application ID. It is used as the username (login ID in the steps below) of the Service Principal

[1] Create a custom role for the fence agent

The Service Principal doesn't have permissions to access your Azure resources by default. You need to give the Service Principal permissions to start and stop (deallocate) all virtual machines of the cluster. If you did not already create the custom role, you can create it using PowerShell or Azure CLI

Use the following content for the input file. You need to adapt the content to your subscriptions that is, replace c276fc76-9cd4-44c9-99a7-4fd71546436e and e91d47c4-76f3-4271-a796-21b4ecfe3624 with the Ids of your subscription. If you only have one subscription, remove the second entry in AssignableScopes.

{
  "Name": "Linux Fence Agent Role",
  "Id": null,
  "IsCustom": true,
  "Description": "Allows to deallocate and start virtual machines",
  "Actions": [
    "Microsoft.Compute/*/read",
    "Microsoft.Compute/virtualMachines/deallocate/action",
    "Microsoft.Compute/virtualMachines/start/action"
  ],
  "NotActions": [
  ],
  "AssignableScopes": [
    "/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e",
    "/subscriptions/e91d47c4-76f3-4271-a796-21b4ecfe3624"
  ]
}

[A] Assign the custom role to the Service Principal

Assign the custom role "Linux Fence Agent Role" that was created in the last chapter to the Service Principal. Don't use the Owner role anymore!

  1. Go to https://portal.azure.com
  2. Open the All resources blade
  3. Select the virtual machine of the first cluster node
  4. Click Access control (IAM)
  5. Click Add role assignment
  6. Select the role "Linux Fence Agent Role"
  7. Enter the name of the application you created above
  8. Click Save

Repeat the steps above for the second cluster node.

[1] Create the STONITH devices

After you edited the permissions for the virtual machines, you can configure the STONITH devices in the cluster.

# replace the bold string with your subscription ID, resource group, tenant ID, service principal ID and password
sudo crm configure primitive rsc_st_azure stonith:fence_azure_arm \
   params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="login ID" passwd="password"

sudo crm configure property stonith-timeout=900
sudo crm configure property stonith-enabled=true

Default Pacemaker configuration for SBD

  1. [1] Enable the use of a STONITH device and set the fence delay
sudo crm configure property stonith-timeout=144
sudo crm configure property stonith-enabled=true

# List the resources to find the name of the SBD device
sudo crm resource list
sudo crm resource stop stonith-sbd
sudo crm configure delete stonith-sbd
sudo crm configure primitive stonith-sbd stonith:external/sbd \
   params pcmk_delay_max="15" \
   op monitor interval="15" timeout="15"

Pacemaker configuration for Azure scheduled events

Azure offers scheduled events. Scheduled events are provided via meta-data service and allow time for the application to prepare for events like VM shutdown, VM redeployment, etc. Resource agent azure-events monitors for scheduled Azure events. If events are detected, the agent will attempt to stop all resources on the impacted VM and move them to another node in the cluster. To achieve that additional Pacemaker resources must be configured.

  1. [A] Install the azure-events agent.
sudo zypper install resource-agents
  1. [1] Configure the resources in Pacemaker.

#Place the cluster in maintenance mode
sudo crm configure property maintenance-mode=true

#Create Pacemaker resources for the Azure agent
sudo crm configure primitive rsc_azure-events ocf:heartbeat:azure-events op monitor interval=10s
sudo crm configure clone cln_azure-events rsc_azure-events

#Take the cluster out of maintenance mode
sudo crm configure property maintenance-mode=false

Note

After you configure the Pacemaker resources for azure-events agent, when you place the cluster in or out of maintenance mode, you may get warning messages like:
WARNING: cib-bootstrap-options: unknown attribute 'hostName_ hostname'
WARNING: cib-bootstrap-options: unknown attribute 'azure-events_globalPullState'
WARNING: cib-bootstrap-options: unknown attribute 'hostName_ hostname'
These warning messages can be ignored.

Next steps