High availability of IBM Db2 LUW on Azure VMs on Red Hat Enterprise Linux Server

IBM Db2 for Linux, UNIX, and Windows (LUW) in high availability and disaster recovery (HADR) configuration consists of one node that runs a primary database instance and at least one node that runs a secondary database instance. Changes to the primary database instance are replicated to a secondary database instance synchronously or asynchronously, depending on your configuration.

Note

This article contains references to the terms master and slave, terms that Microsoft no longer uses. When these terms are removed from the software, we’ll remove them from this article.

This article describes how to deploy and configure the Azure virtual machines (VMs), install the cluster framework, and install the IBM Db2 LUW with HADR configuration.

The article doesn't cover how to install and configure IBM Db2 LUW with HADR or SAP software installation. To help you accomplish these tasks, we provide references to SAP and IBM installation manuals. This article focuses on parts that are specific to the Azure environment.

The supported IBM Db2 versions are 10.5 and later, as documented in SAP note 1928533.

Before you begin an installation, see the following SAP notes and documentation:

SAP note Description
1928533 SAP applications on Azure: Supported products and Azure VM types
2015553 SAP on Azure: Support prerequisites
2178632 Key monitoring metrics for SAP on Azure
2191498 SAP on Linux with Azure: Enhanced monitoring
2243692 Linux on Azure (IaaS) VM: SAP license issues
2002167 Red Hat Enterprise Linux 7.x: Installation and Upgrade
2694118 Red Hat Enterprise Linux HA Add-On on Azure
1999351 Troubleshooting enhanced Azure monitoring for SAP
2233094 DB6: SAP applications on Azure that use IBM Db2 for Linux, UNIX, and Windows - additional information
1612105 DB6: FAQ on Db2 with HADR
Documentation
SAP Community Wiki: Has all of the required SAP Notes for Linux
Azure Virtual Machines planning and implementation for SAP on Linux guide
Azure Virtual Machines deployment for SAP on Linux (this article)
Azure Virtual Machines database management system(DBMS) deployment for SAP on Linux guide
SAP workload on Azure planning and deployment checklist
Overview of the High Availability Add-On for Red Hat Enterprise Linux 7
High Availability Add-On Administration
High Availability Add-On Reference
Support Policies for RHEL High Availability Clusters - Microsoft Azure Virtual Machines as Cluster Members
Installing and Configuring a Red Hat Enterprise Linux 7.4 (and later) High-Availability Cluster on Microsoft Azure
IBM Db2 Azure Virtual Machines DBMS deployment for SAP workload
IBM Db2 HADR 11.1
IBM Db2 HADR 10.5
Support Policy for RHEL High Availability Clusters - Management of IBM Db2 for Linux, Unix, and Windows in a Cluster

Overview

To achieve high availability, IBM Db2 LUW with HADR is installed on at least two Azure virtual machines, which are deployed in an Azure availability set or across Azure Availability Zones.

The following graphics display a setup of two database server Azure VMs. Both database server Azure VMs have their own storage attached and are up and running. In HADR, one database instance in one of the Azure VMs has the role of the primary instance. All clients are connected to primary instance. All changes in database transactions are persisted locally in the Db2 transaction log. As the transaction log records are persisted locally, the records are transferred via TCP/IP to the database instance on the second database server, the standby server, or standby instance. The standby instance updates the local database by rolling forward the transferred transaction log records. In this way, the standby server is kept in sync with the primary server.

HADR is only a replication functionality. It has no failure detection and no automatic takeover or failover facilities. A takeover or transfer to the standby server must be initiated manually by a database administrator. To achieve an automatic takeover and failure detection, you can use the Linux Pacemaker clustering feature. Pacemaker monitors the two database server instances. When the primary database server instance crashes, Pacemaker initiates an automatic HADR takeover by the standby server. Pacemaker also ensures that the virtual IP address is assigned to the new primary server.

IBM Db2 high availability overview

To have SAP application servers connect to primary database, you need a virtual host name and a virtual IP address. In the event of a failover, the SAP application servers will connect to new primary database instance. In an Azure environment, an Azure load balancer is required to use a virtual IP address in the way that's required for HADR of IBM Db2.

To help you fully understand how IBM Db2 LUW with HADR and Pacemaker fits into a highly available SAP system setup, the following image presents an overview of a highly available setup of an SAP system based on IBM Db2 database. This article covers only IBM Db2, but it provides references to other articles about how to set up other components of an SAP system.

IBM DB2 high availability full environment overview

High-level overview of the required steps

To deploy an IBM Db2 configuration, you need to follow these steps:

  • Plan your environment.
  • Deploy the VMs.
  • Update RHEL Linux and configure file systems.
  • Install and configure Pacemaker.
  • Setup glusterfs cluster or Azure NetApp Files
  • Install ASCS/ERS on a separate cluster.
  • Install IBM Db2 database with Distributed/High Availability option (SWPM).
  • Install and create a secondary database node and instance, and configure HADR.
  • Confirm that HADR is working.
  • Apply the Pacemaker configuration to control IBM Db2.
  • Configure Azure Load Balancer.
  • Install primary and dialog application servers.
  • Check and adapt the configuration of SAP application servers.
  • Perform failover and takeover tests.

Plan Azure infrastructure for hosting IBM Db2 LUW with HADR

Complete the planning process before you execute the deployment. Planning builds the foundation for deploying a configuration of Db2 with HADR in Azure. Key elements that need to be part of planning for IMB Db2 LUW (database part of SAP environment) are listed in the following table:

Topic Short description
Define Azure resource groups Resource groups where you deploy VM, VNet, Azure Load Balancer, and other resources. Can be existing or new.
Virtual network / Subnet definition Where VMs for IBM Db2 and Azure Load Balancer are being deployed. Can be existing or newly created.
Virtual machines hosting IBM Db2 LUW VM size, storage, networking, IP address.
Virtual host name and virtual IP for IBM Db2 database The virtual IP or host name that's used for connection of SAP application servers. db-virt-hostname, db-virt-ip.
Azure fencing Method to avoid split brain situations is prevented.
Azure Load Balancer Usage of Basic or Standard (recommended), probe port for Db2 database (our recommendation 62500) probe-port.
Name resolution How name resolution works in the environment. DNS service is highly recommended. Local hosts file can be used.

For more information about Linux Pacemaker in Azure, see Setting up Pacemaker on Red Hat Enterprise Linux in Azure.

Deployment on Red Hat Enterprise Linux

The resource agent for IBM Db2 LUW is included in Red Hat Enterprise Linux Server HA Addon. For the setup that's described in this document, you should use Red Hat Enterprise Linux for SAP. The Azure Marketplace contains an image for Red Hat Enterprise Linux 7.4 for SAP or higher that you can use to deploy new Azure virtual machines. Be aware of the various support or service models that are offered by Red Hat through the Azure Marketplace when you choose a VM image in the Azure VM Marketplace.

Hosts: DNS updates

Make a list of all host names, including virtual host names, and update your DNS servers to enable proper IP address to host-name resolution. If a DNS server doesn't exist or you can't update and create DNS entries, you need to use the local host files of the individual VMs that are participating in this scenario. If you're using host files entries, make sure that the entries are applied to all VMs in the SAP system environment. However, we recommend that you use your DNS that, ideally, extends into Azure

Manual deployment

Make sure that the selected OS is supported by IBM/SAP for IBM Db2 LUW. The list of supported OS versions for Azure VMs and Db2 releases is available in SAP note 1928533. The list of OS releases by individual Db2 release is available in the SAP Product Availability Matrix. We highly recommend a minimum of Red Hat Enterprise Linux 7.4 for SAP because of Azure-related performance improvements in this or later Red Hat Enterprise Linux versions.

  1. Create or select a resource group.
  2. Create or select a virtual network and subnet.
  3. Create an Azure availability set or deploy an availability zone.
    • For the availability set, set the maximum update domains to 2.
  4. Create Virtual Machine 1.
    • Use Red Hat Enterprise Linux for SAP image in the Azure Marketplace.
    • Select the Azure availability set you created in step 3, or select Availability Zone.
  5. Create Virtual Machine 2.
    • Use Red Hat Enterprise Linux for SAP image in the Azure Marketplace.
    • Select the Azure availability set you in created in step 3, or select Availability Zone (not the same zone as in step 3).
  6. Add data disks to the VMs, and then check the recommendation of a file system setup in the article IBM Db2 Azure Virtual Machines DBMS deployment for SAP workload.

Install the IBM Db2 LUW and SAP environment

Before you start the installation of an SAP environment based on IBM Db2 LUW, review the following documentation:

  • Azure documentation
  • SAP documentation
  • IBM documentation

Links to this documentation are provided in the introductory section of this article.

Check the SAP installation manuals about installing NetWeaver-based applications on IBM Db2 LUW. You can find the guides on the SAP Help portal by using the SAP Installation Guide Finder.

You can reduce the number of guides displayed in the portal by setting the following filters:

  • I want to: "Install a new system"
  • My Database: "IBM Db2 for Linux, Unix, and Windows"
  • Additional filters for SAP NetWeaver versions, stack configuration, or operating system

Red Hat firewall rules

Red Hat Enterprise Linux has firewall enabled by default.

#Allow access to SWPM tool. Rule is not permanent.
sudo firewall-cmd --add-port=4237/tcp

Installation hints for setting up IBM Db2 LUW with HADR

To set up the primary IBM Db2 LUW database instance:

  • Use the high availability or distributed option.
  • Install the SAP ASCS/ERS and Database instance.
  • Take a backup of the newly installed database.

Important

Write down the "Database Communication port" that's set during installation. It must be the same port number for both database instances. SAP SWPM Port Definition

IBM Db2 HADR settings for Azure

When you use an Azure Pacemaker fencing agent, set the following parameters:

  • HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
  • HADR timeout value (HADR_TIMEOUT) = 45

We recommend the preceding parameters based on initial failover/takeover testing. It is mandatory that you test for proper functionality of failover and takeover with these parameter settings. Because individual configurations can vary, the parameters might require adjustment.

Note

Specific to IBM Db2 with HADR configuration with normal startup: The secondary or standby database instance must be up and running before you can start the primary database instance.

Note

For installation and configuration that's specific to Azure and Pacemaker: During the installation procedure through SAP Software Provisioning Manager, there is an explicit question about high availability for IBM Db2 LUW:

  • Do not select IBM Db2 pureScale.
  • Do not select Install IBM Tivoli System Automation for Multiplatforms.
  • Do not select Generate cluster configuration files. SAP SWPM - DB2 HA options

To set up the Standby database server by using the SAP homogeneous system copy procedure, execute these steps:

  1. Select the System copy option > Target systems > Distributed > Database instance.
  2. As a copy method, select Homogeneous System so that you can use backup to restore a backup on the standby server instance.
  3. When you reach the exit step to restore the database for homogeneous system copy, exit the installer. Restore the database from a backup of the primary host. All subsequent installation phases have already been executed on the primary database server.

Red Hat firewall rules for DB2 HADR

Add firewall rules to allow traffic to DB2 and between DB2 for HADR to work:

  • Database communication port. If using partitions, add those ports too.
  • HADR port (value of DB2 parameter HADR_LOCAL_SVC)
  • Azure probe port
sudo firewall-cmd --add-port=<port>/tcp --permanent
sudo firewall-cmd --reload

IBM Db2 HADR check

For demonstration purposes and the procedures described in this article, the database SID is ID2.

After you've configured HADR and the status is PEER and CONNECTED on the primary and standby nodes, perform the following check:


Execute command as db2<sid> db2pd -hadr -db <SID>

#Primary output:
Database Member 0 -- Database ID2 -- Active -- Up 1 days 15:45:23 -- Date 2019-06-25-10.55.25.349375

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS =
                  PRIMARY_MEMBER_HOST = az-idb01
                     PRIMARY_INSTANCE = db2id2
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = az-idb02
                     STANDBY_INSTANCE = db2id2
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 06/25/2019 10:55:05.076494 (1561460105)
          HEARTBEAT_INTERVAL(seconds) = 7
                     HEARTBEAT_MISSED = 5
                   HEARTBEAT_EXPECTED = 52
                HADR_TIMEOUT(seconds) = 30
        TIME_SINCE_LAST_RECV(seconds) = 5
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 598.000027
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 598.000
                  LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 46080
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 369280
            PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
            STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
                  HADR_LOG_GAP(bytes) = 132242668
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
                     STANDBY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
              STANDBY_REPLAY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
         STANDBY_RECV_BUF_SIZE(pages) = 2048
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 1000
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 300
                      PEER_WINDOW_END = 06/25/2019 11:12:03.000000 (1561461123)
             READS_ON_STANDBY_ENABLED = N


#Secondary output:
Database Member 0 -- Database ID2 -- Standby -- Up 1 days 15:45:18 -- Date 2019-06-25-10.56.19.820474

                            HADR_ROLE = STANDBY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 0
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS =
                  PRIMARY_MEMBER_HOST = az-idb01
                     PRIMARY_INSTANCE = db2id2
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = az-idb02
                     STANDBY_INSTANCE = db2id2
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 06/25/2019 10:55:05.078116 (1561460105)
          HEARTBEAT_INTERVAL(seconds) = 7
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 10
                HADR_TIMEOUT(seconds) = 30
        TIME_SINCE_LAST_RECV(seconds) = 1
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 598.000027
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 598.000
                  LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 46080
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 367360
            PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
            STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
                     STANDBY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
              STANDBY_REPLAY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
         STANDBY_RECV_BUF_SIZE(pages) = 2048
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 1000
                STANDBY_SPOOL_PERCENT = 0
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 1000
                      PEER_WINDOW_END = 06/25/2019 11:12:59.000000 (1561461179)
             READS_ON_STANDBY_ENABLED = N

Configure Azure Load Balancer

To configure Azure Load Balancer, we recommend that you use the Azure Standard Load Balancer SKU and then do the following;

Note

The Standard Load Balancer SKU has restrictions accessing public IP addresses from the nodes underneath the Load Balancer. The article Public endpoint connectivity for Virtual Machines using Azure Standard Load Balancer in SAP high-availability scenarios is describing ways on how to enable those nodes to access public IP addresses

Important

Floating IP is not supported on a NIC secondary IP configuration in load-balancing scenarios. For details see Azure Load balancer Limitations. If you need additional IP address for the VM, deploy a second NIC.

  1. Create a front-end IP pool:

    a. In the Azure portal, open the Azure Load Balancer, select frontend IP pool, and then select Add.

    b. Enter the name of the new front-end IP pool (for example, Db2-connection).

    c. Set the Assignment to Static, and enter the IP address Virtual-IP defined at the beginning.

    d. Select OK.

    e. After the new front-end IP pool is created, note the pool IP address.

  2. Create a back-end pool:

    a. In the Azure portal, open the Azure Load Balancer, select backend pools, and then select Add.

    b. Enter the name of the new back-end pool (for example, Db2-backend).

    c. Select Add a virtual machine.

    d. Select the availability set or the virtual machines hosting IBM Db2 database created in the preceding step.

    e. Select the virtual machines of the IBM Db2 cluster.

    f. Select OK.

  3. Create a health probe:

    a. In the Azure portal, open the Azure Load Balancer, select health probes, and select Add.

    b. Enter the name of the new health probe (for example, Db2-hp).

    c. Select TCP as the protocol and port 62500. Keep the Interval value set to 5, and keep the Unhealthy threshold value set to 2.

    d. Select OK.

  4. Create the load-balancing rules:

    a. In the Azure portal, open the Azure Load Balancer, select Load balancing rules, and then select Add.

    b. Enter the name of the new Load Balancer rule (for example, Db2-SID).

    c. Select the front-end IP address, the back-end pool, and the health probe that you created earlier (for example, Db2-frontend).

    d. Keep the Protocol set to TCP, and enter port Database Communication port.

    e. Increase the idle timeout to 30 minutes.

    f. Make sure to enable Floating IP.

    g. Select OK.

[A] Add firewall rule for probe port:

sudo firewall-cmd --add-port=<probe-port>/tcp --permanent
sudo firewall-cmd --reload

Create the Pacemaker cluster

To create a basic Pacemaker cluster for this IBM Db2 server, see Setting up Pacemaker on Red Hat Enterprise Linux in Azure.

Db2 Pacemaker configuration

When you use Pacemaker for automatic failover in the event of a node failure, you need to configure your Db2 instances and Pacemaker accordingly. This section describes this type of configuration.

The following items are prefixed with either:

  • [A]: Applicable to all nodes
  • [1]: Applicable only to node 1
  • [2]: Applicable only to node 2

[A] Prerequisite for Pacemaker configuration:

  1. Shut down both database servers with user db2<sid> with db2stop.
  2. Change the shell environment for db2<sid> user to /bin/ksh:
# Install korn shell:
sudo yum install ksh
# Change users shell:
sudo usermod -s /bin/ksh db2<sid>

Pacemaker configuration

[1] IBM Db2 HADR-specific Pacemaker configuration:

# Put Pacemaker into maintenance mode
sudo pcs property set maintenance-mode=true	

[1] Create IBM Db2 resources:

If building a cluster on RHEL 7.x, use the following commands:

# Replace bold strings with your instance name db2sid, database SID, and virtual IP address/Azure Load Balancer.
sudo pcs resource create Db2_HADR_ID2 db2 instance='db2id2' dblist='ID2' master meta notify=true resource-stickiness=5000

#Configure resource stickiness and correct cluster notifications for master resoruce
sudo pcs resource update Db2_HADR_ID2-master meta notify=true resource-stickiness=5000

# Configure virtual IP - same as Azure Load Balancer IP
sudo pcs resource create vip_db2id2_ID2 IPaddr2 ip='10.100.0.40'

# Configure probe port for Azure load Balancer
sudo pcs resource create nc_db2id2_ID2 azure-lb port=62500

#Create a group for ip and Azure loadbalancer probe port
sudo pcs resource group add g_ipnc_db2id2_ID2 vip_db2id2_ID2 nc_db2id2_ID2

#Create colocation constrain - keep Db2 HADR Master and Group on same node
sudo pcs constraint colocation add g_ipnc_db2id2_ID2 with master Db2_HADR_ID2-master

#Create start order constrain
sudo pcs constraint order promote Db2_HADR_ID2-master then g_ipnc_db2id2_ID2

If building a cluster on RHEL 8.x, use the following commands:

# Replace bold strings with your instance name db2sid, database SID, and virtual IP address/Azure Load Balancer.
sudo pcs resource create Db2_HADR_ID2 db2 instance='db2id2' dblist='ID2' promotable meta notify=true resource-stickiness=5000

#Configure resource stickiness and correct cluster notifications for master resoruce
sudo pcs resource update Db2_HADR_ID2-clone meta notify=true resource-stickiness=5000

# Configure virtual IP - same as Azure Load Balancer IP
sudo pcs resource create vip_db2id2_ID2 IPaddr2 ip='10.100.0.40'

# Configure probe port for Azure load Balancer
sudo pcs resource create nc_db2id2_ID2 azure-lb port=62500

#Create a group for ip and Azure loadbalancer probe port
sudo pcs resource group add g_ipnc_db2id2_ID2 vip_db2id2_ID2 nc_db2id2_ID2

#Create colocation constrain - keep Db2 HADR Master and Group on same node
sudo pcs constraint colocation add g_ipnc_db2id2_ID2 with master Db2_HADR_ID2-clone

#Create start order constrain
sudo pcs constraint order promote Db2_HADR_ID2-clone then g_ipnc_db2id2_ID2

[1] Start IBM Db2 resources:

  • Put Pacemaker out of maintenance mode.
# Put Pacemaker out of maintenance-mode - that start IBM Db2
sudo pcs property set maintenance-mode=false

[1] Make sure that the cluster status is OK and that all of the resources are started. It's not important which node the resources are running on.

sudo pcs status
2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb01
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb01 ]
     Slaves: [ az-idb02 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb01
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Important

You must manage the Pacemaker clustered Db2 instance by using Pacemaker tools. If you use db2 commands such as db2stop, Pacemaker detects the action as a failure of resource. If you're performing maintenance, you can put the nodes or resources in maintenance mode. Pacemaker suspends monitoring resources, and you can then use normal db2 administration commands.

Make changes to SAP profiles to use virtual IP for connection

To connect to the primary instance of the HADR configuration, the SAP application layer needs to use the virtual IP address that you defined and configured for the Azure Load Balancer. The following changes are required:

/sapmnt/<SID>/profile/DEFAULT.PFL

SAPDBHOST = db-virt-hostname
j2ee/dbhost = db-virt-hostname

/sapmnt/<SID>/global/db6/db2cli.ini

Hostname=db-virt-hostname

Install primary and dialog application servers

When you install primary and dialog application servers against an Db2 HADR configuration, use the virtual host name that you picked for the configuration.

If you performed the installation before you created the Db2 HADR configuration, make the changes as described in the preceding section and as follows for SAP Java stacks.

ABAP+Java or Java stack systems JDBC URL check

Use the J2EE Config tool to check or update the JDBC URL. Because the J2EE Config tool is a graphical tool, you need to have X server installed:

  1. Sign in to the primary application server of the J2EE instance and execute:

    sudo /usr/sap/*SID*/*Instance*/j2ee/configtool/configtool.sh
  2. In the left frame, choose security store.

  3. In the right frame, choose the key jdbc/pool/\<SAPSID>/url.

  4. Change the host name in the JDBC URL to the virtual host name.

    jdbc:db2://db-virt-hostname:5912/TSP:deferPrepares=0
  5. Select Add.

  6. To save your changes, select the disk icon at the upper left.

  7. Close the configuration tool.

  8. Restart the Java instance.

Configure log archiving for HADR setup

To configure the Db2 log archiving for HADR setup, we recommend that you configure both the primary and the standby database to have automatic log retrieval capability from all log archive locations. Both the primary and standby database must be able to retrieve log archive files from all the log archive locations to which either one of the database instances might archive log files.

The log archiving is performed only by the primary database. If you change the HADR roles of the database servers or if a failure occurs, the new primary database is responsible for log archiving. If you've set up multiple log archive locations, your logs might be archived twice. In the event of a local or remote catch-up, you might also have to manually copy the archived logs from the old primary server to the active log location of the new primary server.

We recommend configuring a common NFS share or GlusterFS, where logs are written from both nodes. The NFS share or GlusterFS has to be highly available.

You can use existing highly available NFS shares or GlusterFS for transports or a profile directory. For more information, see:

Test the cluster setup

This section describes how you can test your Db2 HADR setup. Every test assumes IBM Db2 primary is running on the az-idb01 virtual machine. User with sudo privileges or root (not recommended) must be used.

The initial status for all test cases is explained here: (crm_mon -r or pcs status)

  • pcs status is a snapshot of Pacemaker status at execution time
  • crm_mon -r is continuous output of Pacemaker status
2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb01
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb01 ]
     Slaves: [ az-idb02 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb01
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb01

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

The original status in an SAP system is documented in Transaction DBACOCKPIT > Configuration > Overview, as shown in the following image:

DBACockpit - Pre Migration

Test takeover of IBM Db2

Important

Before you start the test, make sure that:

  • Pacemaker doesn't have any failed actions (pcs status).
  • There are no location constraints (leftovers of migration test)
  • The IBM Db2 HADR synchronization is working. Check with user db2<sid>
    db2pd -hadr -db <DBSID>

Migrate the node that's running the primary Db2 database by executing following command:

# On RHEL 7.x
sudo pcs resource move Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource move Db2_HADR_ID2-clone --master

After the migration is done, the crm status output looks like:

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb01
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Stopped: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

The original status in an SAP system is documented in Transaction DBACOCKPIT > Configuration > Overview, as shown in the following image:

DBACockpit - Post Migration

Resource migration with "pcs resource move" creates location constraints. Location constraints in this case are preventing running IBM Db2 instance on az-idb01. If location constraints are not deleted, the resource cannot fail back.

Remove the location constrain and standby node will be started on az-idb01.

# On RHEL 7.x
sudo pcs resource clear Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource clear Db2_HADR_ID2-clone

And cluster status changes to:

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb01
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Slaves: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

DBACockpit - Removed location constrain

Migrate the resource back to az-idb01 and clear the location constraints

# On RHEL 7.x
sudo pcs resource move Db2_HADR_ID2-master az-idb01
sudo pcs resource clear Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource move Db2_HADR_ID2-clone --master
sudo pcs resource clear Db2_HADR_ID2-clone
  • On RHEL 7.x - pcs resource move <res_name> : Creates location constraints and can cause issues with takeover
  • On RHEL 8.x - pcs resource move <res_name> --master: Creates location constraints and can cause issues with takeover
  • pcs resource clear <res_name>: Clears location constraints
  • pcs resource cleanup <res_name>: Clears all errors of the resource

Test a manual takeover

You can test a manual takeover by stopping the Pacemaker service on az-idb01 node:

systemctl stop pacemaker

status on az-ibdb02

2 nodes configured
5 resources configured

Node az-idb01: pending
Online: [ az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Stopped: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

After the failover, you can start the service again on az-idb01.

systemctl start  pacemaker

Kill the Db2 process on the node that runs the HADR primary database

#Kill main db2 process - db2sysc
[sapadmin@az-idb02 ~]$ sudo ps -ef|grep db2sysc
db2ptr    34598  34596  8 14:21 ?        00:00:07 db2sysc 0
[sapadmin@az-idb02 ~]$ sudo kill -9 34598

The Db2 instance is going to fail, and Pacemaker will move master node and report following status:

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Stopped: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=49, status=complete, exitreason='none',
    last-rc-change='Wed Jun 26 09:57:35 2019', queued=0ms, exec=362ms

Pacemaker will restart the Db2 primary database instance on the same node, or it will fail over to the node that's running the secondary database instance and an error is reported.

Kill the Db2 process on the node that runs the secondary database instance

[sapadmin@az-idb02 ~]$ sudo ps -ef|grep db2sysc
db2id2    23144  23142  2 09:53 ?        00:00:13 db2sysc 0
[sapadmin@az-idb02 ~]$ sudo kill -9 23144

The node gets into failed stated and error reported

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb01 ]
     Slaves: [ az-idb02 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb01
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb01

Failed Actions:
* Db2_HADR_ID2_monitor_20000 on az-idb02 'not running' (7): call=144, status=complete, exitreason='none',
    last-rc-change='Wed Jun 26 10:02:09 2019', queued=0ms, exec=0ms

The Db2 instance gets restarted in the secondary role it had assigned before.

Stop DB via db2stop force on the node that runs the HADR primary database instance

As user db2<sid> execute command db2stop force:

az-idb01:db2ptr> db2stop force

Failure detected:

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Slaves: [ az-idb02 ]
     Stopped: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Stopped
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Stopped

Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=110, status=complete, exitreason='none',
    last-rc-change='Wed Jun 26 14:03:12 2019', queued=0ms, exec=355ms

The Db2 HADR secondary database instance got promoted into the primary role.

2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

 rsc_st_azure   (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Slaves: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=110, status=complete, exitreason='none',
    last-rc-change='Wed Jun 26 14:03:12 2019', queued=0ms, exec=355ms

Crash the VM that runs the HADR primary database instance with "halt"

#Linux kernel panic.
sudo echo b > /proc/sysrq-trigger

In such a case, Pacemaker will detect that the node that's running the primary database instance isn't responding.

2 nodes configured
5 resources configured

Node az-idb01: UNCLEAN (online)
Online: [ az-idb02 ]

Full list of resources:

rsc_st_azure    (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb01 ]
     Slaves: [ az-idb02 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb01
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb01

The next step is to check for a Split brain situation. After the surviving node has determined that the node that last ran the primary database instance is down, a failover of resources is executed.

2 nodes configured
5 resources configured

Online: [ az-idb02 ]
OFFLINE: [ az-idb01 ]

Full list of resources:

rsc_st_azure    (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Stopped: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02 

In the event of a kernel panic, the failed node will be restared by fencing agent. After the failed node is back online, you must start pacemaker cluster by

sudo pcs cluster start
it starts the Db2 instance into the secondary role.
2 nodes configured
5 resources configured

Online: [ az-idb01 az-idb02 ]

Full list of resources:

rsc_st_azure    (stonith:fence_azure_arm):      Started az-idb02
 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
     Masters: [ az-idb02 ]
     Slaves: [ az-idb01 ]
 Resource Group: g_ipnc_db2id2_ID2
     vip_db2id2_ID2     (ocf::heartbeat:IPaddr2):       Started az-idb02
     nc_db2id2_ID2      (ocf::heartbeat:azure-lb):      Started az-idb02

Next steps