If you'd like to see us expand this article with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know with GitHub Feedback!
To plan disaster recovery for an enterprise-grade conversational bot (chatbot), start by reviewing the service level agreement (SLA). The SLA should describe the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) targets for the chatbot. Then, implement the patterns in this article to build highly available and disaster-resilient chatbot solutions to meet the SLA.
The core components of a typical enterprise-grade chatbot solution in Azure are discussed in Enterprise-grade conversational bot.
The diagram below shows deployment of a chatbot solution for disaster recovery. The failover mode is active-passive in two different Azure regions.
Download a Visio file of this architecture.
Disaster recovery solutions vary depending on your SLA and the Azure services you use.
Azure Active Directory (Azure AD), Azure Traffic Manager, Azure Front Door, and Azure Bot Service registration are non-regional services. They're always available in Azure geographies, whatever the specific region availability or outage.
Regional services with automatic failover
Although you provision Azure Key Vault and Language Understanding Intelligent Service (LUIS) in a specific Azure region, these services provide automatic failover to a different Azure region. For more information, see:
- Azure Key Vault availability and redundancy
- To set up high availability for Azure Cosmos DB, see High availability with Azure Cosmos DB.
- LUIS regions and endpoints
Regional services without automatic failover
These services may need your attention to ensure high availability and disaster recovery.
Keep all deployment and source code artifacts in a source code repository, and use Azure paired regions to deploy them in parallel. You can automate all the following deployment tasks and save them as part of your deployment artifacts. When you deploy these services in the two paired regions, configure your bot API environment variables to match the specific services in each Azure region.
- Keep the primary and secondary Azure search indexes in sync. For a sample app to back up and restore Azure search indexes, see QnAMakerBackupRestore on GitHub.
- Back up Application Insights by using continuous export. Although you can't currently import the exported telemetry to another Application Insights resource, you can export into a storage account for further analysis.
- To set up high availability and disaster recovery for Azure Storage accounts, see Disaster recovery and storage account failover.
- Deploy the bot API and QnA Maker into an Azure App Service Plan in both regions.
- Once you set up the primary and secondary stacks, use Azure Traffic Manager or Azure Front Door to configure the endpoints. Set up a routing method for both QnA Maker and the bot API.
- Create a Secure Sockets Layer (TLS/SSL) certificate for your traffic manager endpoint, and bind the TLS/SSL certificate in your App Services.
- Finally, use the Traffic Manager or Azure Front Door endpoint of QnA Maker in your bot. Then, use the Traffic Manager endpoint of the bot API as the bot endpoint in Azure Bot Service registration.
List of services
Key technologies used to implement this architecture:
- Azure Bot Service
- Azure Active Directory
- Azure Traffic Manager
- Azure Front Door
- Azure App Service Web Apps
- Azure Cognitive Services QnA Maker
- Application Insights is a feature of Azure Monitor
- Azure Cognitive Services Language Understanding
- Azure Cosmos DB
- Azure Key Vault
- Azure Cognitive Search
- Cognitive Services - Authoring and publishing regions and the associated keys
- Cosmos DB - High availability with Azure Cosmos DB
- Key Vault - Azure Key Vault availability and redundancy
- Storage - Disaster recovery and account failover
Article on availability:
Azure Architecture Center: