使用 Azure SQL Database 的商務持續性概觀Overview of business continuity with Azure SQL Database

Azure SQL Database 中的商務持續性是指在面臨中斷時可讓您的企業持續運作的機制、原則和程序 (特別是其運算基礎結構)。Business continuity in Azure SQL Database refers to the mechanisms, policies, and procedures that enable your business to continue operating in the face of disruption, particularly to its computing infrastructure. 在大部分情況下,Azure SQL Database 會處理雲端環境中可能發生的干擾性事件,並且讓您的應用程式和商務程序持續執行。In the most of the cases, Azure SQL Database will handle the disruptive events that might happen in the cloud environment and keep your applications and business processes running. 不過, SQL Database 會自動處理一些干擾性事件, 例如:However, there are some disruptive events that cannot be handled by SQL Database automatically such as:

  • 使用者不小心刪除或更新了資料表中的資料列。User accidentally deleted or updated a row in a table.
  • 惡意攻擊者接著刪除資料或刪除資料庫。Malicious attacker succeeded to delete data or drop a database.
  • 地震導致電源中斷,而且資料中心暫時停用。Earthquake caused a power outage and temporary disabled data-center.

本概觀說明 Azure SQL Database 針對商務持續性和災害復原所提供的功能。This overview describes the capabilities that Azure SQL Database provides for business continuity and disaster recovery. 了解選項、建議和教學課程,以從可能導致資料遺失或造成資料庫和應用程式無法使用的干擾性事件中復原。Learn about options, recommendations, and tutorials for recovering from disruptive events that could cause data loss or cause your database and application to become unavailable. 了解當使用者或應用程式錯誤影響資料完整性、Azure 區域中斷,或您的應用程式需要維護時該如何處理。Learn what to do when a user or application error affects data integrity, an Azure region has an outage, or your application requires maintenance.

您可用來提供商務持續性的 SQL Database 功能SQL Database features that you can use to provide business continuity

就資料庫的觀點而言,有四個主要可能中斷案例:From a database perspective, there are four major potential disruption scenarios:

  • 影響資料庫節點的本機硬體或軟體失敗,例如磁碟機失敗。Local hardware or software failures affecting the database node such as a disk-drive failure.
  • 通常由應用程式 Bug 或人為錯誤所造成的資料損毀或刪除。Data corruption or deletion typically caused by an application bug or human error. 這類失敗是應用程式特定的, 而且通常無法由資料庫服務偵測到。Such failures are application-specific and typically cannot be detected by the database service.
  • 可能由天然災害所造成的資料中心中斷。Datacenter outage, possibly caused by a natural disaster. 此案例需要某種程度的異地備援,讓應用程式能容錯移轉到替代資料中心。This scenario requires some level of geo-redundancy with application failover to an alternate datacenter.
  • 升級或維護錯誤, 在規劃的基礎結構維護或升級期間發生的非預期問題, 可能需要快速回復至先前的資料庫狀態。Upgrade or maintenance errors, unanticipated issues that occur during planned infrastructure maintenance or upgrades may require rapid rollback to a prior database state.

為了減輕本機硬體和軟體失敗, SQL Database 包含高可用性架構, 可保證從這些失敗中自動復原, 最高可達 99.995% 的可用性 SLA。To mitigate the local hardware and software failures, SQL Database includes a high availability architecture, which guarantees automatic recovery from these failures with up to 99.995% availability SLA.

為了保護您的企業免于資料遺失, SQL Database 會每週自動建立完整的資料庫備份, 每隔12小時一次差異資料庫備份, 而交易記錄備份每 5-10 分鐘一次。To protect your business from data loss, SQL Database automatically creates full database backups weekly, differential database backups every 12 hours, and transaction log backups every 5 - 10 minutes . 所有服務層級的備份會儲存在 GRS 儲存體中至少7天。The backups are stored in RA-GRS storage for at least 7 days for all service tiers. 所有服務層級 (基本支援可設定的時間點還原備份保留期限), 最多35天。All service tiers except Basic support configurable backup retention period for point-in-time restore, up to 35 days.

SQL Database 也提供數種商務持續性功能, 您可以用來緩和各種未規劃的案例。SQL Database also provides several business continuity features, that you can use to mitigate various unplanned scenarios.

復原相同 Azure 區域內的資料庫Recover a database within the same Azure region

您可以使用自動資料庫備份, 將資料庫還原到過去的某個時間點。You can use automatic database backups to restore a database to a point in time in the past. 如此一來, 您就可以從人為錯誤所造成的資料損毀中復原。This way you can recover from data corruptions caused by human errors. 點時還原可讓您在相同伺服器中建立新的資料庫, 以代表損毀事件之前的資料狀態。The poin-in-time restore allows you to create a new database in the same server that represents the state of data prior to the corrupting event. 對於大部分的資料庫, 還原作業所花費的時間少於12小時。For most databases the restore operations takes less than 12 hours. 復原非常大或非常活躍的資料庫可能需要較長的時間。It may take longer to recover a very large or very active database. 如需復原時間的詳細資訊,請參閱資料庫復原時間For more information about recovery time, see database recovery time.

如果還原時間點的最大支援備份保留期限 (PITR) 對您的應用程式而言不足, 您可以為資料庫設定長期保留 (LTR) 原則來加以擴充。If the maximum supported backup retention period for point-in-time restore (PITR) is not sufficient for your application, you can extend it by configuring a long-term retention (LTR) policy for the database(s). 如需詳細資訊,請參閱長期備份保留For more information, see Long-term backup retention.

比較異地複寫與容錯移轉群組Compare geo-replication with failover groups

自動容錯移轉群組會簡化異地複寫的部署和使用, 並新增額外的功能, 如下表所述:Auto-failover groups simplify the deployment and usage of geo-replication and add the additional capabilities as described in the following table:

異地複寫Geo-replication 容錯移轉群組Failover groups
自動容錯移轉Automatic failover No Yes
同時故障切換多個資料庫Fail over multiple databases simultaneously No Yes
在容錯移轉之後更新連接字串Update connection string after failover Yes No
支援的受控實例Managed instance supported No Yes
可以位於與主要相同的區域Can be in same region as primary Yes No
多個複本Multiple replicas Yes No
支援讀取規模Supports read-scale Yes Yes
     

將資料庫復原到現有的伺服器Recover a database to the existing server

雖然很罕見,但 Azure 資料中心也可能會有中斷的時候。Although rare, an Azure data center can have an outage. 發生中斷時,可能只會讓業務中斷幾分鐘,也可能會持續幾小時。When an outage occurs, it causes a business disruption that might only last a few minutes or might last for hours.

  • 其中一個選項是在資料中心中斷結束時等待您的資料庫重新上線。One option is to wait for your database to come back online when the data center outage is over. 這適用於可以容忍資料庫離線的應用程式。This works for applications that can afford to have the database offline. 例如,您不需要不斷處理的開發專案或免費試用版。For example, a development project or free trial you don't need to work on constantly. 當資料中心中斷時,您不會知道中斷會持續多久,因此這個選項僅適用於您可以一段時間暫時不需要資料庫。When a data center has an outage, you do not know how long the outage might last, so this option only works if you don't need your database for a while.
  • 另一個選項是使用異地備援資料庫備份 (異地還原),在任何 Azure 區域中的任何伺服器上還原資料庫。Another option is to restore a database on any server in any Azure region using geo-redundant database backups (geo-restore). 異地還原使用異地備援備份做為其來源,即使因為中斷而無法存取資料庫或資料中心,也能用來復原資料庫。Geo-restore uses a geo-redundant backup as its source and can be used to recover a database even if the database or datacenter is inaccessible due to an outage.
  • 最後, 如果您已經使用作用中異地複寫或資料庫或資料庫的自動容錯移轉群組來設定異地次要資料庫, 您可以快速地從中斷復原。Finally, you can quickly recover from an outage if you have configured either geo-secondary using active geo-replication or an auto-failover group for your database or databases. 根據您選擇的這些技術,您可以使用手動或自動容錯移轉。Depending on your choice of these technologies, you can use either manual or automatic failover. 雖然容錯移轉本身只需要幾秒鐘的時間就能完成,服務將需要至少 1 小時才能啟動。While failover itself takes only a few seconds, the service will take at least 1 hour to activate it. 這是依據中斷的規模來確保容錯移轉之正當性的必要作法。This is necessary to ensure that the failover is justified by the scale of the outage. 此外,基於非同步複寫的本質,容錯移轉可能會造成小規模的資料遺失。Also, the failover may result in small data loss due to the nature of asynchronous replication.

當您開發商務持續性計劃時,您必須了解應用程式在干擾性事件之後完全復原所需的最大可接受時間。As you develop your business continuity plan, you need to understand the maximum acceptable time before the application fully recovers after the disruptive event. 應用程式完全復原所需的時間, 也稱為復原時間目標 (RTO)。The time required for application to fully recover is known as Recovery time objective (RTO). 您也需要瞭解從未規劃的干擾性事件復原時, 應用程式可容忍遺失的最近資料更新 (時間間隔) 最長期間。You also need to understand the maximum period of recent data updates (time interval) the application can tolerate losing when recovering from an unplanned disruptive event. 可能的資料遺失稱為復原點目標 (RPO)。The potential data loss is known as Recovery point objective (RPO).

不同的復原方法提供不同層級的 RPO 和 RTO。Different recovery methods offer different levels of RPO and RTO. 您可以選擇特定的復原方法, 或使用方法的組合來達到完整的應用程式復原。You can choose a specific recovery method, or use a combination of methods to achieve full application recovery. 下表比較每個修復選項的 RPO 和 RTO。The following table compares RPO and RTO of each recovery option. 自動容錯移轉群組會簡化異地複寫的部署和使用, 並新增額外的功能, 如下表所述。Auto-failover groups simplify the deployment and usage of geo-replication and adds the additional capabilities as described in the following table.

修復方法Recovery method RTORTO RPORPO
從異地複寫備份進行異地還原Geo-restore from geo-replicated backups 12小時12 h 1 小時1 h
自動容錯移轉群組Auto-failover groups 1 小時1 h 5秒5 s
手動資料庫容錯移轉Manual database failover 30秒30 s 5秒5 s

注意

手動資料庫容錯移轉是指使用未規劃的模式, 將單一資料庫容錯移轉到其異地複寫的次要複本。Manual database failover refers to failover of a single database to its geo-replicated secondary using the unplanned mode. 請參閱此文章稍早的表格,以取得自動容錯移轉 RTO 和 RPO 的詳細資料。See the table earlier in this article for details of the auto-failover RTO and RPO.

如果您的應用程式符合下列任何準則,請使用自動容錯移轉群組:Use auto-failover groups if your application meets any of these criteria:

  • 是關鍵性應用程式。Is mission critical.
  • 具有不允許 12 小時或以上之停機時間的服務等級協定 (SLA)。Has a service level agreement (SLA) that does not allow for 12 hours or more of downtime.
  • 停機可能會衍生財務責任。Downtime may result in financial liability.
  • 具有很高的資料變更率,且無法接受為時 1 小時的資料遺失。Has a high rate of data change and 1 hour of data loss is not acceptable.
  • 與潛在的財務責任和相關企業損失相較下,使用主動式異地複寫的額外成本較低。The additional cost of active geo-replication is lower than the potential financial liability and associated loss of business.

您可以根據您的應用程式需求, 選擇使用資料庫備份和主動式異地複寫的組合。You may choose to use a combination of database backups and active geo-replication depending upon your application requirements. 如需獨立資料庫的設計考慮, 以及使用這些商務持續性功能的彈性集區的討論, 請參閱設計雲端嚴重損壞修復的應用程式和彈性集區嚴重損壞修復策略For a discussion of design considerations for stand-alone databases and for elastic pools using these business continuity features, see Design an application for cloud disaster recovery and Elastic pool disaster recovery strategies.

下列各節概述使用資料庫備份或主動式異地複寫來進行復原的步驟。The following sections provide an overview of the steps to recover using either database backups or active geo-replication. 如需包括規劃需求的詳細步驟、復原後步驟,以及有關如何模擬中斷以執行災害復原演練的資訊,請參閱從中斷復原 SQL DatabaseFor detailed steps including planning requirements, post recovery steps, and information about how to simulate an outage to perform a disaster recovery drill, see Recover a SQL Database from an outage.

準備中斷Prepare for an outage

無論您要使用何種商務持續性功能,您都必須︰Regardless of the business continuity feature you use, you must:

  • 識別並準備目標伺服器,包括伺服器層級 IP 防火牆規則、登入和 master 資料庫層級權限。Identify and prepare the target server, including server-level IP firewall rules, logins, and master database level permissions.
  • 決定如何將用戶端和用戶端應用程式重新導向到新的伺服器Determine how to redirect clients and client applications to the new server
  • 記錄其他相依性,例如稽核設定和警示Document other dependencies, such as auditing settings and alerts

如果您沒有適當地準備,在容錯移轉或資料庫復原後讓應用程式上線將會多花費時間,而且也可能需要在有壓力的情況下進行疑難排解 - 這是不良的情況組合。If you do not prepare properly, bringing your applications online after a failover or a database recovery takes additional time and likely also require troubleshooting at a time of stress - a bad combination.

容錯移轉至異地複寫的次要資料庫Fail over to a geo-replicated secondary database

如果您使用主動式異地複寫或自動容錯移轉群組做為復原機制, 則可以設定自動容錯移轉原則, 或使用手動未規劃的容錯移轉If you are using active geo-replication or auto-failover groups as your recovery mechanism, you can configure an automatic failover policy or use manual unplanned failover. 啟動容錯移轉後,次要資料庫就會成為新的主要資料庫,並準備好記錄新的交易以及回應查詢 - 只會遺失尚未複寫的資料。Once initiated, the failover causes the secondary to become the new primary and ready to record new transactions and respond to queries - with minimal data loss for the data not yet replicated. 如需關於設計容錯移轉程序的資訊,請參閱設計雲端災害復原應用程式For information on designing the failover process, see Design an application for cloud disaster recovery.

注意

資料中心恢復連線時,舊的主要複本會自動重新連線至新的主要複本,且會成為次要資料庫。When the data center comes back online the old primaries automatically reconnect to the new primary and become secondary databases. 若您需要將主要複本重新定位至原始區域,可手動啟動規劃的容錯移轉 (容錯回復)。If you need to relocate the primary back to the original region, you can initiate a planned failover manually (failback).

執行異地還原Perform a geo-restore

如果您使用自動備份搭配異地備援儲存體 (預設為啟用),您可以使用異地還原來復原資料庫。If you are using the automated backups with geo-redundant storage (enabled by default), you can recover the database using geo-restore. 復原通常會在 12 小時內進行,且可能遺失最多 1 小時的資料 (視最後一次記錄備份的建立並複寫時間而定)。Recovery usually takes place within 12 hours - with data loss of up to one hour determined by when the last log backup was taken and replicated. 在復原完成之前,資料庫無法記錄任何交易或回應任何查詢。Until the recovery completes, the database is unable to record any transactions or respond to any queries. 請注意,異地還原只會將資料庫還原至最後一個可用的時間點。Note, geo-restore only restores the database to the last available point in time.

注意

如果資料中心在您的應用程式切換到復原的資料庫之前就再次上線,您可以取消復原。If the data center comes back online before you switch your application over to the recovered database, you can cancel the recovery.

執行容錯移轉後/復原後工作Perform post failover / recovery tasks

從其中任何一種復原機制復原之後,您都必須執行下列額外的工作,您的使用者和應用程式才能回復正常執行狀態︰After recovery from either recovery mechanism, you must perform the following additional tasks before your users and applications are back up and running:

  • 重新導向用戶端與用戶端應用程式到新的伺服器與還原的資料庫Redirect clients and client applications to the new server and restored database
  • 確定有適當的伺服器層級 IP 防火牆規則供使用者連線或使用資料庫層級防火牆,才能啟用適當的規則。Ensure appropriate server-level IP firewall rules are in place for users to connect or use database-level firewalls to enable appropriate rules.
  • 確定有適當的登入和 master 資料庫層級權限 (或使用 自主的使用者)Ensure appropriate logins and master database level permissions are in place (or use contained users)
  • 依適當情況設定稽核Configure auditing, as appropriate
  • 依適當情況設定警示Configure alerts, as appropriate

注意

如果您使用容錯移轉群組,並使用讀寫接聽程式來連線至資料庫,則容錯移轉後的重新導向將針對應用程式以自動且透明化的方式進行。If you are using a failover group and connect to the databases using the read-write lstener, the redirection after failover will happen automatically and transparently to the application.

在最少停機時間的情況下升級應用程式Upgrade an application with minimal downtime

有時,應用程式會因為計劃性維護 (例如應用程式升級) 而必須離線。Sometimes an application must be taken offline because of planned maintenance such as an application upgrade. 管理應用程式升級 說明如何使用「主動式異地複寫」來輪流升級雲端應用程式,以將升級時的停機時間縮到最短,並提供發生錯誤時的復原路徑。Manage application upgrades describes how to use active geo-replication to enable rolling upgrades of your cloud application to minimize downtime during upgrades and provide a recovery path if something goes wrong.

後續步驟Next steps

如需獨立資料庫和彈性集區的應用程式設計考量探討,請參閱設計雲端災害復原應用程式彈性集區災害復原策略For a discussion of application design considerations for stand-alone databases and for elastic pools, see Design an application for cloud disaster recovery and Elastic pool disaster recovery strategies.