Disaster recovery and high availability for chatbots in Azure

To set up disaster recovery for an enterprise-grade conversational bot (chatbot), first review the service level agreement (SLA) that cover the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for the chatbot. Implement the disaster recovery patterns in this article to build highly available and disaster resistant chatbot solutions to meet the SLA.

For a description of the core components of a typical enterprise-grade chatbot solution in Azure, see Enterprise-grade conversational bot.

Architecture

Enterprise-grade chatbot in Azure

This diagram shows deployment of a chatbot solution in active-passive failover mode in two different Azure regions for disaster recovery.

Download a Visio file of this architecture.

Components

Disaster recovery solutions vary depending on your SLA and the Azure services you use.

Non-regional services

Azure Active Directory (Azure AD), Azure Traffic Manager, Azure Front Door, and Azure Bot Service registration are non-regional services that are always available in Azure geographies, regardless of specific region availability or outage.

Regional services with automatic failover

Although you provision Azure Key Vault and Language Understanding Intelligent Service (LUIS) in a specific Azure region, these services provide automatic failover to a different Azure region. For more information, see:

Regional services without automatic failover

These services may need your attention to ensure high availability and disaster recovery.

Keep all deployment and source code artifacts in a source code repository, and use Azure paired regions to deploy them in parallel. You can automate all the following deployment tasks and save them as part of your deployment artifacts. When you deploy these services in the two paired regions, configure your bot API's environment variables to match the specific services in each Azure region.

  • Keep the primary and secondary Azure search indexes in sync. For a sample app to back up and restore Azure search indexes, see QnAMakerBackupRestore on GitHub.
  • Back up Application Insights by using continuous export. Although you can't currently import the exported telemetry to another Application Insights resource, you can export into a storage account for further analysis.
  • To set up high availability and disaster recovery for Azure Storage accounts, see Disaster recovery and storage account failover.
  • Deploy the bot API and QnA Maker into an Azure App Service Plan in both regions.
  • Once you set up the primary and secondary stacks, use Azure Traffic Manager or Azure Front Door to configure the endpoints and set up a routing method for both QnA Maker and the bot API.
  • Create a Secure Sockets Layer (SSL) certificate for your traffic manager endpoint, and bind the SSL certificate in your App Services.
  • Finally, use the Traffic Manager or Azure Front Door endpoint of QnA Maker in your bot, and use the traffic manager endpoint of the bot API as the bot endpoint in Azure Bot Service registration.

Next steps