使用 Azure SQL Database 的商務持續性概觀Overview of business continuity with Azure SQL Database

Azure SQL Database 中的商務持續性是指在面臨中斷時可讓您的企業持續運作的機制、原則和程序 (特別是其運算基礎結構)。Business continuity in Azure SQL Database refers to the mechanisms, policies, and procedures that enable your business to continue operating in the face of disruption, particularly to its computing infrastructure. 在大部分情況下,Azure SQL Database 會處理雲端環境中可能發生的干擾性事件,並且讓您的應用程式和商務程序持續執行。In the most of the cases, Azure SQL Database will handle the disruptive events that might happen in the cloud environment and keep your applications and business processes running. 不過,SQL Database 有一些無法處理的干擾性事件,例如:However, there are some disruptive events that cannot be handled by SQL Database such as:

  • 使用者不小心刪除或更新了資料表中的資料列。User accidentally deleted or updated a row in a table.
  • 惡意攻擊者接著刪除資料或刪除資料庫。Malicious attacker succeeded to delete data or drop a database.
  • 地震導致電源中斷,而且資料中心暫時停用。Earthquake caused a power outage and temporary disabled data-center.

Azure SQL Database 無法控制這些情況,因此您必須在 SQL Database 使用商務持續性功能,這些功能可讓您復原您的資料並且讓應用程式持續執行。These cases cannot be controlled by Azure SQL Database, so you would need to use the business continuity features in SQL Database that enables you to recover your data and keep your applications running.

本概觀說明 Azure SQL Database 針對商務持續性和災害復原所提供的功能。This overview describes the capabilities that Azure SQL Database provides for business continuity and disaster recovery. 了解選項、建議和教學課程,以從可能導致資料遺失或造成資料庫和應用程式無法使用的干擾性事件中復原。Learn about options, recommendations, and tutorials for recovering from disruptive events that could cause data loss or cause your database and application to become unavailable. 了解當使用者或應用程式錯誤影響資料完整性、Azure 區域中斷,或您的應用程式需要維護時該如何處理。Learn what to do when a user or application error affects data integrity, an Azure region has an outage, or your application requires maintenance.

您可用來提供商務持續性的 SQL Database 功能SQL Database features that you can use to provide business continuity

就資料庫的觀點而言,有四個主要可能中斷案例:From a database perspective, there are four major potential disruption scenarios:

  • 影響資料庫節點的本機硬體或軟體失敗,例如磁碟機失敗。Local hardware or software failures affecting the database node such as a disk-drive failure.
  • 通常由應用程式 Bug 或人為錯誤所造成的資料損毀或刪除。Data corruption or deletion typically caused by an application bug or human error. 這類失敗本質上為應用程式專屬的,而且通常無法由基礎結構自動偵測或緩和。Such failures are intrinsically application-specific and cannot as a rule be detected or mitigated automatically by the infrastructure.
  • 可能由天然災害所造成的資料中心中斷。Datacenter outage, possibly caused by a natural disaster. 此案例需要某種程度的異地備援,讓應用程式能容錯移轉到替代資料中心。This scenario requires some level of geo-redundancy with application failover to an alternate datacenter.
  • 升級或維護錯誤,在對應用程式或資料庫所進行的計劃性升級或維護期間發生的非預期問題,可能需要快速回復到先前的資料庫狀態。Upgrade or maintenance errors, unanticipated issues that occur during planned upgrades or maintenance to an application or database may require rapid rollback to a prior database state.

SQL Database 提供幾種商務持續性功能,包括可以緩和這些案例的自動備份和選用的料庫複寫。SQL Database provides several business continuity features, including automated backups and optional database replication that can mitigate these scenarios. 首先,您必須了解 SQL Database 高可用性架構如何將 99.99% 的可用性和復原功能提供給某些會影響您商務程序的干擾性事件。First, you need to understand how SQL Database high availability architecture provides 99.99% availability and resiliency to some disruptive events that might affect your business process. 然後,您可以了解您可用來從 SQL Database 高可用性架構無法處理的干擾性事件復原的其他機制,例如:Then, you can learn about the additional mechanisms that you can use to recover from the disruptive events that cannot be handled by SQL Database high availability architecture, such as:

每個功能對於預估的復原時間 (ERT) 都有不同的特性,最近的交易都有可能遺失資料。Each has different characteristics for estimated recovery time (ERT) and potential data loss for recent transactions. 一旦您了解這些選項,就可以在其中選擇,而在大部分情況下,可以針對不同情況一起搭配使用。Once you understand these options, you can choose among them - and, in most scenarios, use them together for different scenarios. 當您開發商務持續性計劃時,您必須了解應用程式在干擾性事件之後完全復原所需的最大可接受時間。As you develop your business continuity plan, you need to understand the maximum acceptable time before the application fully recovers after the disruptive event. 完全復原應用程式所需的時間,也稱為復原時間目標 (RTO)。The time required for application to fully recover is known as recovery time objective (RTO). 您也必須了解在干擾性事件之後復原時,應用程式可忍受遺失的最近資料更新 (時間間隔) 最大期間。You also need to understand the maximum period of recent data updates (time interval) the application can tolerate losing when recovering after the disruptive event. 您可能經得起遺失的更新時間週期,也稱為復原點目標 (RPO)。The time period of updates that you might afford to lose is known as recovery point objective (RPO).

下表會比較每個服務層的最常見的案例提供 ERT 與 RPO。The following table compares the ERT and RPO for each service tier for the most common scenarios.

功能Capability 基本Basic 標準Standard 進階Premium 一般用途General Purpose 業務關鍵Business Critical
從備份進行時間點還原Point in Time Restore from backup 7 天內的任何還原點Any restore point within seven days 35 天內的任何還原點Any restore point within 35 days 35 天內的任何還原點Any restore point within 35 days 設定期間內的任何還原點 (最多 35 天)Any restore point within configured period (up to 35 days) 設定期間內的任何還原點 (最多 35 天)Any restore point within configured period (up to 35 days)
從異地複寫備份進行異地還原Geo-restore from geo-replicated backups ERT < 12 小時ERT < 12 h
RPO < 1 小時RPO < 1 h
ERT < 12 小時ERT < 12 h
RPO < 1 小時RPO < 1 h
ERT < 12 小時ERT < 12 h
RPO < 1 小時RPO < 1 h
ERT < 12 小時ERT < 12 h
RPO < 1 小時RPO < 1 h
ERT < 12 小時ERT < 12 h
RPO < 1 小時RPO < 1 h
自動容錯移轉群組Auto-failover groups RTO = 1 小時RTO = 1 h
RPO < 5 秒RPO < 5s
RTO = 1 小時RTO = 1 h
RPO < 5 秒RPO < 5 s
RTO = 1 小時RTO = 1 h
RPO < 5 秒RPO < 5 s
RTO = 1 小時RTO = 1 h
RPO < 5 秒RPO < 5 s
RTO = 1 小時RTO = 1 h
RPO < 5 秒RPO < 5 s
手動資料庫容錯移轉Manual database failover ERT = 30 sERT = 30 s
RPO < 5 秒RPO < 5s
ERT = 30 sERT = 30 s
RPO < 5 秒RPO < 5 s
ERT = 30 sERT = 30 s
RPO < 5 秒RPO < 5 s
ERT = 30 sERT = 30 s
RPO < 5 秒RPO < 5 s
ERT = 30 sERT = 30 s
RPO < 5 秒RPO < 5 s

注意

手動資料庫容錯移轉其異地複寫次要使用單一資料庫的容錯移轉是指未計劃的模式Manual database failover refers to failover of a single database to its geo-replicated secondary using the unplanned mode.

將資料庫復原到現有的伺服器Recover a database to the existing server

SQL Database 會每週自動執行完整資料庫備份、通常每 12 小時自動執行差異資料庫備份,且每 5 - 10 分鐘自動執行交易記錄備份,透過這樣的備份組合來防止您的企業遺失資料。SQL Database automatically performs a combination of full database backups weekly, differential database backups generally taken every 12 hours, and transaction log backups every 5 - 10 minutes to protect your business from data loss. 所有服務層級的備份都會儲存在 RA-GRS 儲存體中 35 天,但基本 DTU 服務層級的備份儲存 7 天除外。The backups are stored in RA-GRS storage for 35 days for all service tiers except Basic DTU service tiers where the backups are stored for 7 days. 如需詳細資訊,請參閱自動資料庫備份For more information, see automatic database backups. 您可以使用 Azure 入口網站、PowerShell 或 REST API,將現有的資料庫從自動備份還原到先前的時間點,以做為相同 SQL Database 伺服器上的新資料庫。You can restore an existing database form the automated backups to an earlier point in time as a new database on the same SQL Database server using the Azure portal, PowerShell, or the REST API. 如需詳細資訊,請參閱還原時間點For more information, see Point-in-time restore.

如果應用程式的最大支援時間點還原 (PITR) 保留期限不夠,可以藉由針對資料庫設定長期保留 (LTR) 原則來延長。If the maximum supported point-in-time restore (PITR) retention period is not sufficient for your application, you can extend it by configuring a long-term retention (LTR) policy for the database(s). 如需詳細資訊,請參閱長期備份保留For more information, see Long-term backup retention.

您可以使用這些自動資料庫備份,將資料庫從各種干擾性事件復原,不論是在您的資料中心內復原,還是復原到另一個資料中心,都可以。You can use these automatic database backups to recover a database from various disruptive events, both within your data center and to another data center. 復原時間通常不到 12 小時。The recovery time is usually less than 12 hours. 復原非常大型的作用中資料庫可能需要比較長的時間。It may take longer to recover a very large or active database. 使用自動資料庫備份時,預估的復原時間取決於數個因素,包括在相同區域中同時進行復原的資料庫總數、資料庫大小、交易記錄大小,以及網路頻寬。Using automatic database backups, the estimated time of recovery depends on several factors including the total number of databases recovering in the same region at the same time, the database size, the transaction log size, and network bandwidth. 如需復原時間的詳細資訊,請參閱資料庫復原時間For more information about recovery time, see database recovery time. 當復原到另一個資料區時,會使用異地備援備份將可能的資料遺失限制為 1 小時。When recovering to another data region, the potential data loss is limited to 1 hour with use of geo-redundant backups.

如果您的應用程式有下列狀況,可以使用自動備份和時間點還原作為您的商務持續性和復原機制︰Use automated backups and point-in-time restore as your business continuity and recovery mechanism if your application:

  • 非關鍵性應用程式。Is not considered mission critical.
  • 沒有繫結 SLA - 停機 24 小時或更長的時間不會衍生財務責任。Doesn't have a binding SLA - a downtime of 24 hours or longer does not result in financial liability.
  • 資料變更率低 (每小時的交易次數低),並且最多可接受遺失一小時的資料變更。Has a low rate of data change (low transactions per hour) and losing up to an hour of change is an acceptable data loss.
  • 成本有限。Is cost sensitive.

如果您需要更快速的復原,請使用主動式異地複寫自動容錯移轉群組If you need faster recovery, use active geo-replication or auto-failover groups. 如果您必須能夠復原 35 天之前的資料,請使用長期保留If you need to be able to recover data from a period older than 35 days, use Long-term retention.

將資料庫復原到另一個區域Recover a database to another region

雖然很罕見,但 Azure 資料中心也可能會有中斷的時候。Although rare, an Azure data center can have an outage. 發生中斷時,可能只會讓業務中斷幾分鐘,也可能會持續幾小時。When an outage occurs, it causes a business disruption that might only last a few minutes or might last for hours.

  • 其中一個選項是在資料中心中斷結束時等待您的資料庫重新上線。One option is to wait for your database to come back online when the data center outage is over. 這適用於可以容忍資料庫離線的應用程式。This works for applications that can afford to have the database offline. 例如,您不需要不斷處理的開發專案或免費試用版。For example, a development project or free trial you don't need to work on constantly. 當資料中心中斷時,您不會知道中斷會持續多久,因此這個選項僅適用於您可以一段時間暫時不需要資料庫。When a data center has an outage, you do not know how long the outage might last, so this option only works if you don't need your database for a while.
  • 另一個選項是使用異地備援資料庫備份 (異地還原),在任何 Azure 區域中的任何伺服器上還原資料庫。Another option is to restore a database on any server in any Azure region using geo-redundant database backups (geo-restore). 異地還原使用異地備援備份做為其來源,即使因為中斷而無法存取資料庫或資料中心,也能用來復原資料庫。Geo-restore uses a geo-redundant backup as its source and can be used to recover a database even if the database or datacenter is inaccessible due to an outage.
  • 最後,您可以快速地從中斷復原如果您已設定異地次要資料庫使用作用中異地複寫自動容錯移轉群組針對您的資料庫或資料庫。Finally, you can quickly recover from an outage if you have configured either geo-secondary using active geo-replication or an auto-failover group for your database or databases. 根據您選擇的這些技術,您可以使用手動或自動容錯移轉。Depending on your choice of these technologies, you can use either manual or automatic failover. 雖然容錯移轉本身只需要幾秒鐘的時間就能完成,服務將需要至少 1 小時才能啟動。While failover itself takes only a few seconds, the service will take at least 1 hour to activate it. 這是依據中斷的規模來確保容錯移轉之正當性的必要作法。This is necessary to ensure that the failover is justified by the scale of the outage. 此外,基於非同步複寫的本質,容錯移轉可能會造成小規模的資料遺失。Also, the failover may result in small data loss due to the nature of asynchronous replication. 請參閱此文章稍早的表格,以取得自動容錯移轉 RTO 和 RPO 的詳細資料。See the table earlier in this article for details of the auto-failover RTO and RPO.

重要

若要使用主動式異地複寫和自動容錯移轉群組,您必須是訂用帳戶擁有者,或是在 SQL Server 中擁有系統管理權限。To use active geo-replication and auto-failover groups, you must either be the subscription owner or have administrative permissions in SQL Server. 您可以使用 Azure 入口網站、PowerShell 或 REST API 並透過 Azure 訂用帳戶的權限來進行設定和容錯移轉,也可以使用 Transact-SQL 並透過 SQL Server 權限來進行。You can configure and fail over using the Azure portal, PowerShell, or the REST API using Azure subscription permissions or using Transact-SQL with SQL Server permissions.

如果您的應用程式符合下列任何準則,請使用自動容錯移轉群組:Use auto-failover groups if your application meets any of these criteria:

  • 是關鍵性應用程式。Is mission critical.
  • 具有不允許 12 小時或以上之停機時間的服務等級協定 (SLA)。Has a service level agreement (SLA) that does not allow for 12 hours or more of downtime.
  • 停機可能會衍生財務責任。Downtime may result in financial liability.
  • 具有很高的資料變更率,且無法接受為時 1 小時的資料遺失。Has a high rate of data change and 1 hour of data loss is not acceptable.
  • 與潛在的財務責任和相關企業損失相較下,使用主動式異地複寫的額外成本較低。The additional cost of active geo-replication is lower than the potential financial liability and associated loss of business.

當您採取行動時,復原所需的時間以及會遺失多少資料,取決於您如何決定在應用程式中使用這些商務持續性功能。When you take action, how long it takes you to recover, and how much data loss you incur depends upon how you decide to use these business continuity features in your application. 事實上,您可以根據應用程式需求,選擇使用資料庫備份和主動式異地複寫的組合。Indeed, you may choose to use a combination of database backups and active geo-replication depending upon your application requirements. 若要探討使用這些商務持續性功能針對獨立資料庫和彈性集區進行應用程式設計時的考量,請參閱設計雲端災害復原應用程式彈性集區災害復原策略For a discussion of application design considerations for stand-alone databases and for elastic pools using these business continuity features, see Design an application for cloud disaster recovery and Elastic pool disaster recovery strategies.

下列各節概述使用資料庫備份或主動式異地複寫來進行復原的步驟。The following sections provide an overview of the steps to recover using either database backups or active geo-replication. 如需包括規劃需求的詳細步驟、復原後步驟,以及有關如何模擬中斷以執行災害復原演練的資訊,請參閱從中斷復原 SQL DatabaseFor detailed steps including planning requirements, post recovery steps, and information about how to simulate an outage to perform a disaster recovery drill, see Recover a SQL Database from an outage.

準備中斷Prepare for an outage

無論您要使用何種商務持續性功能,您都必須︰Regardless of the business continuity feature you use, you must:

  • 識別並準備目標伺服器,包括伺服器層級 IP 防火牆規則、登入和 master 資料庫層級權限。Identify and prepare the target server, including server-level IP firewall rules, logins, and master database level permissions.
  • 決定如何將用戶端和用戶端應用程式重新導向到新的伺服器Determine how to redirect clients and client applications to the new server
  • 記錄其他相依性,例如稽核設定和警示Document other dependencies, such as auditing settings and alerts

如果您沒有適當地準備,在容錯移轉或資料庫復原後讓應用程式上線將會多花費時間,而且也可能需要在有壓力的情況下進行疑難排解 - 這是不良的情況組合。If you do not prepare properly, bringing your applications online after a failover or a database recovery takes additional time and likely also require troubleshooting at a time of stress - a bad combination.

容錯移轉至異地複寫的次要資料庫Fail over to a geo-replicated secondary database

如果您使用主動式異地複寫 」 或 「 自動容錯移轉群組作為復原機制,您可以設定自動容錯移轉原則,或使用未規劃的手動容錯移轉If you are using active geo-replication or auto-failover groups as your recovery mechanism, you can configure an automatic failover policy or use manual unplanned failover. 啟動容錯移轉後,次要資料庫就會成為新的主要資料庫,並準備好記錄新的交易以及回應查詢 - 只會遺失尚未複寫的資料。Once initiated, the failover causes the secondary to become the new primary and ready to record new transactions and respond to queries - with minimal data loss for the data not yet replicated. 如需關於設計容錯移轉程序的資訊,請參閱設計雲端災害復原應用程式For information on designing the failover process, see Design an application for cloud disaster recovery.

注意

資料中心恢復連線時,舊的主要複本會自動重新連線至新的主要複本,且會成為次要資料庫。When the data center comes back online the old primaries automatically reconnect to the new primary and become secondary databases. 若您需要將主要複本重新定位至原始區域,可手動啟動規劃的容錯移轉 (容錯回復)。If you need to relocate the primary back to the original region, you can initiate a planned failover manually (failback).

執行異地還原Perform a geo-restore

如果您使用自動備份搭配異地備援儲存體 (預設為啟用),您可以使用異地還原來復原資料庫。If you are using the automated backups with geo-redundant storage (enabled by default), you can recover the database using geo-restore. 復原通常會在 12 小時內進行,且可能遺失最多 1 小時的資料 (視最後一次記錄備份的建立並複寫時間而定)。Recovery usually takes place within 12 hours - with data loss of up to one hour determined by when the last log backup was taken and replicated. 在復原完成之前,資料庫無法記錄任何交易或回應任何查詢。Until the recovery completes, the database is unable to record any transactions or respond to any queries. 請注意,異地還原只會將資料庫還原至最後一個可用的時間點。Note, geo-restore only restores the database to the last available point in time.

注意

如果資料中心在您的應用程式切換到復原的資料庫之前就再次上線,您可以取消復原。If the data center comes back online before you switch your application over to the recovered database, you can cancel the recovery.

執行容錯移轉後/復原後工作Perform post failover / recovery tasks

從其中任何一種復原機制復原之後,您都必須執行下列額外的工作,您的使用者和應用程式才能回復正常執行狀態︰After recovery from either recovery mechanism, you must perform the following additional tasks before your users and applications are back up and running:

  • 重新導向用戶端與用戶端應用程式到新的伺服器與還原的資料庫Redirect clients and client applications to the new server and restored database
  • 確定有適當的伺服器層級 IP 防火牆規則供使用者連線或使用資料庫層級防火牆,才能啟用適當的規則。Ensure appropriate server-level IP firewall rules are in place for users to connect or use database-level firewalls to enable appropriate rules.
  • 確定有適當的登入和 master 資料庫層級權限 (或使用 自主的使用者)Ensure appropriate logins and master database level permissions are in place (or use contained users)
  • 依適當情況設定稽核Configure auditing, as appropriate
  • 依適當情況設定警示Configure alerts, as appropriate

注意

如果您使用容錯移轉群組,並使用讀寫接聽程式來連線至資料庫,則容錯移轉後的重新導向將針對應用程式以自動且透明化的方式進行。If you are using a failover group and connect to the databases using the read-write lstener, the redirection after failover will happen automatically and transparently to the application.

在最少停機時間的情況下升級應用程式Upgrade an application with minimal downtime

有時,應用程式會因為計劃性維護 (例如應用程式升級) 而必須離線。Sometimes an application must be taken offline because of planned maintenance such as an application upgrade. 管理應用程式升級 說明如何使用「主動式異地複寫」來輪流升級雲端應用程式,以將升級時的停機時間縮到最短,並提供發生錯誤時的復原路徑。Manage application upgrades describes how to use active geo-replication to enable rolling upgrades of your cloud application to minimize downtime during upgrades and provide a recovery path if something goes wrong.

後續步驟Next steps

如需獨立資料庫和彈性集區的應用程式設計考量探討,請參閱設計雲端災害復原應用程式彈性集區災害復原策略For a discussion of application design considerations for stand-alone databases and for elastic pools, see Design an application for cloud disaster recovery and Elastic pool disaster recovery strategies.