Azure SQL Database 和 SQL 受控執行個體的高可用性High availability for Azure SQL Database and SQL Managed Instance

適用於: Azure SQL Database Azure SQL 受控執行個體

在 Azure SQL Database 和 SQL 受控執行個體中,高可用性架構的目標是保證您的資料庫已啟動並執行至少99.99% 的時間 (如需有關不同層的特定 SLA 的詳細資訊,請參閱 Azure SQL Database 和 SQL 受控執行個體) 的 sla,而不必擔心維護作業和中斷的影響。The goal of the high availability architecture in Azure SQL Database and SQL Managed Instance is to guarantee that your database is up and running minimum of 99.99% of time (For more information regarding specific SLA for different tiers, Please refer SLA for Azure SQL Database and SQL Managed Instance), without worrying about the impact of maintenance operations and outages. Azure 會自動處理重要的服務工作,例如修補、備份、Windows 和 Azure SQL 升級,以及基礎硬體、軟體或網路失敗等非計畫事件。Azure automatically handles critical servicing tasks, such as patching, backups, Windows and Azure SQL upgrades, as well as unplanned events such as underlying hardware, software, or network failures. 當 Azure SQL Database 中的基礎資料庫進行修補或容錯移轉時,如果您在應用程式中 使用重試邏輯 ,則停機時間並不明顯。When the underlying database in Azure SQL Database is patched or fails over, the downtime is not noticeable if you employ retry logic in your app. 即使在最重要的情況下,SQL Database 和 SQL 受控執行個體也可以快速復原,以確保您的資料一律可供使用。SQL Database and SQL Managed Instance can quickly recover even in the most critical circumstances ensuring that your data is always available.

高可用性解決方案的設計是為了確保認可的資料永遠不會因為失敗而遺失,而維護作業不會影響您的工作負載,而且資料庫不會是您的軟體架構中的單一失敗點。The high availability solution is designed to ensure that committed data is never lost due to failures, that maintenance operations do not affect your workload, and that the database will not be a single point of failure in your software architecture. 在升級或維護資料庫時,不會有維護視窗或停機時間要求您停止工作負載。There are no maintenance windows or downtimes that should require you to stop the workload while the database is upgraded or maintained.

有兩個高可用性架構模型:There are two high availability architectural models:

  • 以計算和儲存體分隔為基礎的 標準可用性模型Standard availability model that is based on a separation of compute and storage. 它依賴遠端儲存層的高可用性和可靠性。It relies on high availability and reliability of the remote storage tier. 此架構的目標是以預算導向的商務應用程式,可容忍維護活動期間的效能降低。This architecture targets budget-oriented business applications that can tolerate some performance degradation during maintenance activities.
  • 以資料庫引擎進程叢集為基礎的高階 可用性模型Premium availability model that is based on a cluster of database engine processes. 這是因為永遠會有一個可用資料庫引擎節點的仲裁。It relies on the fact that there is always a quorum of available database engine nodes. 此架構以高 IO 效能、高交易率為目標的任務關鍵性應用程式,並保證在維護活動期間對您的工作負載造成最少的效能影響。This architecture targets mission critical applications with high IO performance, high transaction rate and guarantees minimal performance impact to your workload during maintenance activities.

SQL Database 和 SQL 受控執行個體都是在最新穩定版本的 SQL Server Database engine 和 Windows 作業系統上執行,大部分的使用者都不會注意到升級會持續執行。SQL Database and SQL Managed Instance both run on the latest stable version of the SQL Server database engine and Windows operating system, and most users would not notice that upgrades are performed continuously.

基本、標準和一般用途服務層級本機冗余可用性Basic, Standard, and General Purpose service tier locally redundant availability

基本、標準和一般用途服務層級會利用適用于無伺服器和已布建計算的標準可用性架構。The Basic, Standard, and General Purpose service tiers leverage the standard availability architecture for both serverless and provisioned compute. 下圖顯示具有個別計算和儲存層的四個不同節點。The following figure shows four different nodes with the separated compute and storage layers.

分隔計算與儲存體

標準可用性模型包含兩個層級:The standard availability model includes two layers:

  • 一種無狀態計算層,可執行 sqlservr.exe 處理常式,且僅包含暫時性和快取的資料,例如 TempDB、附加 SSD 上的模型資料庫,以及記憶體中的計畫快取、緩衝集區和資料行存放區集區。A stateless compute layer that runs the sqlservr.exe process and contains only transient and cached data, such as TempDB, model databases on the attached SSD, and plan cache, buffer pool, and columnstore pool in memory. 此無狀態節點是由可初始化 sqlservr.exe 、控制節點健康情況的 Azure Service Fabric 操作,並在必要時執行容錯移轉至另一個節點。This stateless node is operated by Azure Service Fabric that initializes sqlservr.exe, controls health of the node, and performs failover to another node if necessary.
  • 具狀態資料層,其中包含儲存在 Azure Blob 儲存體中的資料庫檔案 ( .mdf/.ldf) 。A stateful data layer with the database files (.mdf/.ldf) that are stored in Azure Blob storage. Azure blob 儲存體具有內建的資料可用性和冗余功能。Azure blob storage has built-in data availability and redundancy feature. 它保證即使處理常式當機,也會保留資料檔案中記錄檔或頁面中的每筆記錄 sqlservr.exeIt guarantees that every record in the log file or page in the data file will be preserved even if sqlservr.exe process crashes.

每次升級資料庫引擎或作業系統,或偵測到失敗時,Azure Service Fabric 都會將無狀態 sqlservr.exe 進程移至具有足夠可用容量的另一個無狀態計算節點。Whenever the database engine or the operating system is upgraded, or a failure is detected, Azure Service Fabric will move the stateless sqlservr.exe process to another stateless compute node with sufficient free capacity. Azure Blob 儲存體中的資料不會受到移動影響,且資料/記錄檔會附加到新初始化的 sqlservr.exe 進程。Data in Azure Blob storage is not affected by the move, and the data/log files are attached to the newly initialized sqlservr.exe process. 此程式可保證99.99% 的可用性,但繁重的工作負載可能會在轉換期間發生效能降低的情況,因為新的進程是以 sqlservr.exe 冷快取啟動。This process guarantees 99.99% availability, but a heavy workload may experience some performance degradation during the transition since the new sqlservr.exe process starts with cold cache.

一般用途服務層區域冗余可用性 (預覽) General Purpose service tier zone redundant availability (Preview)

一般用途服務層的區域冗余設定會利用 Azure 可用性區域   來跨 Azure 區域內的多個實體位置複寫資料庫。Zone redundant configuration for the general purpose service tier utilizes Azure Availability Zones  to replicate databases across multiple physical locations within an Azure region.藉由選取 [區域冗余],您可以讓新的和現有的一般用途單一資料庫和彈性集區復原到較大的一組失敗,包括重大的資料中心中斷,而不需要變更應用程式邏輯。 By selecting zone redundancy, you can make your new and existing general purpose single databases and elastic pools resilient to a much larger set of failures, including catastrophic datacenter outages, without any changes of the application logic.

一般用途層的區域冗余設定有兩個層級:Zone redundant configuration for the general purpose tier has two layers:

  • 具有資料庫檔案的具狀態資料層 ( .mdf/.ldf) 儲存在 ZRS PFS (區域冗余 儲存體 premium 檔案共用A stateful data layer with the database files (.mdf/.ldf) that are stored in ZRS PFS (zone-redundant storage premium file share. 使用 區域多餘的儲存體 時,資料和記錄檔會在三個實際隔離的 Azure 可用性區域間同步複製。Using zone-redundant storage the data and log files are synchronously copied across three physically-isolated Azure availability zones.
  • 執行 sqlservr.exe 進程的無狀態計算層,其中只包含暫時性和快取的資料,例如 TempDB、附加 SSD 上的模型資料庫,以及記憶體中的計畫快取、緩衝集區和資料行存放區集區。A stateless compute layer that runs the sqlservr.exe process and contains only transient and cached data, such as TempDB, model databases on the attached SSD, and plan cache, buffer pool, and columnstore pool in memory. 此無狀態節點是由 Azure Service Fabric 所操作,可初始化 sqlservr.exe、控制節點的健康情況,並在必要時執行容錯移轉至另一個節點。This stateless node is operated by Azure Service Fabric that initializes sqlservr.exe, controls health of the node, and performs failover to another node if necessary. 針對區域冗余的一般用途資料庫,具有備用容量的節點可立即在其他可用性區域進行容錯移轉。For zone redundant general purpose databases, nodes with spare capacity are readily available in other Availability Zones for failover.

下圖說明一般用途服務層級的高可用性架構的區域冗余版本:The zone redundant version of the high availability architecture for the general purpose service tier is illustrated by the following diagram:

適用于一般用途的區域冗余設定

重要

如需支援區域重複資料庫之區域的最新資訊,請參閱 依區域的服務支援For up to date information about the regions that support zone redundant databases, see Services support by region. 只有在選取第5代計算硬體時,才可使用區域冗余設定。Zone redundant configuration is only available when the Gen5 compute hardware is selected. SQL 受控執行個體中無法使用此功能。This feature is not available in SQL Managed Instance.

注意

大小為 80 vcore 的一般用途資料庫,可能會在區域冗余設定時遇到效能降低的情況。General Purpose databases with a size of 80 vcore may experience performance degradation with zone redundant configuration. 此外,備份、還原、資料庫複製和設定異地 DR 關聯性等作業,可能會因為任何大於 1 TB 的單一資料庫,而遇到較慢的效能。Additionally, operations such as backup, restore, database copy, and setting up Geo-DR relationships may experience slower performance for any single databases larger than 1 TB.

Premium 和業務關鍵服務層級本機冗余可用性Premium and Business Critical service tier locally redundant availability

Premium 和業務關鍵服務層級會利用高階可用性模型,將計算資源整合 (sqlservr.exe 進程) 和儲存體 (本機連接的 SSD) 在單一節點上。Premium and Business Critical service tiers leverage the Premium availability model, which integrates compute resources (sqlservr.exe process) and storage (locally attached SSD) on a single node. 藉由將計算和儲存體複寫至建立三到四個節點叢集的其他節點,即可達到高可用性。High availability is achieved by replicating both compute and storage to additional nodes creating a three to four-node cluster.

資料庫引擎節點的叢集

基礎資料庫檔案 ( .mdf/.ldf) 放置在附加的 SSD 儲存體上,以提供非常低的延遲 IO 給您的工作負載。The underlying database files (.mdf/.ldf) are placed on the attached SSD storage to provide very low latency IO to your workload. 高可用性是使用類似 SQL Server Always On 可用性群組的技術來實行。High availability is implemented using a technology similar to SQL Server Always On availability groups. 叢集包含可供讀寫客戶工作負載存取的單一主要複本,以及最多三個次要複本 (計算和儲存體) 包含資料複本。The cluster includes a single primary replica that is accessible for read-write customer workloads, and up to three secondary replicas (compute and storage) containing copies of data. 主要節點會依序不斷地將變更推送至次要節點,並確保在認可每個交易之前,資料會同步處理至至少一個次要複本。The primary node constantly pushes changes to the secondary nodes in order and ensures that the data is synchronized to at least one secondary replica before committing each transaction. 此程式可保證如果主要節點因為任何原因而損毀,一律會有完全同步處理的節點以進行容錯移轉。This process guarantees that if the primary node crashes for any reason, there is always a fully synchronized node to fail over to. Azure Service Fabric 會起始容錯移轉。The failover is initiated by the Azure Service Fabric. 次要複本變成新的主要節點之後,就會建立另一個次要複本,以確保叢集有足夠的節點 (仲裁集) 。Once the secondary replica becomes the new primary node, another secondary replica is created to ensure the cluster has enough nodes (quorum set). 完成容錯移轉之後,Azure SQL 連接會自動重新導向至新的主要節點。Once failover is complete, Azure SQL connections are automatically redirected to the new primary node.

額外的好處是,premium 可用性模型包含將唯讀 Azure SQL 連接重新導向至其中一個次要複本的能力。As an extra benefit, the premium availability model includes the ability to redirect read-only Azure SQL connections to one of the secondary replicas. 這項功能稱為「 讀取相應放大」。它提供100% 的額外計算容量,不需額外費用,就能從主要複本關閉並載入唯讀作業,例如分析工作負載。This feature is called Read Scale-Out. It provides 100% additional compute capacity at no extra charge to off-load read-only operations, such as analytical workloads, from the primary replica.

Premium 和業務關鍵服務層級區域冗余可用性Premium and Business Critical service tier zone redundant availability

依預設,會在相同的資料中心內建立高階可用性模型的節點叢集。By default, the cluster of nodes for the premium availability model is created in the same datacenter. 隨著 Azure 可用性區域的推出,SQL Database 可以將業務關鍵資料庫的不同複本放置到相同區域中的不同可用性區域。With the introduction of Azure Availability Zones, SQL Database can place different replicas of the Business Critical database to different availability zones in the same region. 為了避免發生單點失敗,系統也會跨多個區域將控制環複寫成三個閘道環 (GW)。To eliminate a single point of failure, the control ring is also duplicated across multiple zones as three gateway rings (GW). Azure 流量管理員 (ATM) 會控制特定閘道的路由。The routing to a specific gateway ring is controlled by Azure Traffic Manager (ATM). 因為 Premium 或業務關鍵服務層中的區域多餘設定不會建立額外的資料庫冗余,所以您可以不需要額外的成本來啟用它。Because the zone redundant configuration in the Premium or Business Critical service tiers does not create additional database redundancy, you can enable it at no extra cost. 藉由選取區域冗余設定,您可以讓高階或業務關鍵資料庫復原到較大的一組失敗,包括重大的資料中心中斷,而不會變更應用程式邏輯。By selecting a zone redundant configuration, you can make your Premium or Business Critical databases resilient to a much larger set of failures, including catastrophic datacenter outages, without any changes to the application logic. 您也可以將任何現有的進階或業務關鍵資料庫或彈性集區轉換成區域備援組態。You can also convert any existing Premium or Business Critical databases or pools to the zone redundant configuration.

因為區域冗余資料庫在不同的資料中心內有複本之間有一些距離,所以增加的網路延遲可能會增加認可時間,因而影響某些 OLTP 工作負載的效能。Because the zone redundant databases have replicas in different datacenters with some distance between them, the increased network latency may increase the commit time and thus impact the performance of some OLTP workloads. 您一律可以停用區域備援設定來回到單一區域設定。You can always return to the single-zone configuration by disabling the zone redundancy setting. 此程式是與一般服務層級升級類似的線上操作。This process is an online operation similar to the regular service tier upgrade. 在此程序結束時,資料庫或集區會從區域備援環移轉成單一區域環,或反之亦然。At the end of the process, the database or pool is migrated from a zone redundant ring to a single zone ring or vice versa.

重要

使用業務關鍵層時,只有在選取第5代計算硬體時,才可使用區域冗余設定。When using the Business Critical tier, zone redundant configuration is only available when the Gen5 compute hardware is selected. 如需支援區域重複資料庫之區域的最新資訊,請參閱 依區域的服務支援For up to date information about the regions that support zone redundant databases, see Services support by region.

注意

SQL 受控執行個體中無法使用此功能。This feature is not available in SQL Managed Instance.

下圖說明區域備援版的高可用性架構:The zone redundant version of the high availability architecture is illustrated by the following diagram:

高可用性架構區域備援

超大規模服務層級可用性Hyperscale service tier availability

超大規模服務層架構在分散式函式 架構 中有描述,而且目前僅適用于 SQL Database,而非 SQL 受控執行個體。The Hyperscale service tier architecture is described in Distributed functions architecture and is only currently available for SQL Database, not SQL Managed Instance.

超大規模功能架構

超大規模中的可用性模型包含四個層級:The availability model in Hyperscale includes four layers:

  • 執行進程的無狀態計算層, sqlservr.exe 只包含暫時性和快取的資料,例如在連接的 SSD 上的非涵蓋 RBPEX 快取、TempDB、模型資料庫等等,以及記憶體中的計畫快取、緩衝集區和資料行存放區集區。A stateless compute layer that runs the sqlservr.exe processes and contains only transient and cached data, such as non-covering RBPEX cache, TempDB, model database, etc. on the attached SSD, and plan cache, buffer pool, and columnstore pool in memory. 此無狀態層包含主要計算複本,並選擇性地包含一些可作為容錯移轉目標的次要計算複本。This stateless layer includes the primary compute replica and optionally a number of secondary compute replicas that can serve as failover targets.
  • 頁面伺服器所形成的無狀態儲存層。A stateless storage layer formed by page servers. 這一層是在計算複本上執行之進程的分散式儲存引擎 sqlservr.exeThis layer is the distributed storage engine for the sqlservr.exe processes running on the compute replicas. 每一頁伺服器僅包含暫時性和快取的資料,例如涵蓋連接 SSD 上的 RBPEX 快取,以及在記憶體中快取的資料頁。Each page server contains only transient and cached data, such as covering RBPEX cache on the attached SSD, and data pages cached in memory. 每部頁面伺服器在主動-主動設定中都有成對的頁面伺服器,以提供負載平衡、冗余和高可用性。Each page server has a paired page server in an active-active configuration to provide load balancing, redundancy, and high availability.
  • 由執行記錄服務處理常式、交易記錄登陸區域,以及交易記錄長期儲存的計算節點所形成的具狀態交易記錄儲存層。A stateful transaction log storage layer formed by the compute node running the Log service process, the transaction log landing zone, and transaction log long term storage. 登陸區域和長期儲存使用 Azure 儲存體,可為交易記錄提供可用性和 冗余 ,以確保認可交易的資料耐久性。Landing zone and long term storage use Azure Storage, which provides availability and redundancy for transaction log, ensuring data durability for committed transactions.
  • 具有資料庫檔案的具狀態資料儲存層 ( .mdf/ndf) 儲存在 Azure 儲存體中,並由頁面伺服器更新。A stateful data storage layer with the database files (.mdf/.ndf) that are stored in Azure Storage and are updated by page servers. 這一層會使用 Azure 儲存體的資料可用性和 冗余 功能。This layer uses data availability and redundancy features of Azure Storage. 它保證資料檔中的每個頁面都會保留,即使超大規模架構的其他層級中的進程損毀,或計算節點失敗也一樣。It guarantees that every page in a data file will be preserved even if processes in other layers of Hyperscale architecture crash, or if compute nodes fail.

所有超大規模層中的計算節點會在 Azure Service Fabric 上執行,以控制每個節點的健康情況,並視需要執行容錯移轉至可用狀況良好的節點。Compute nodes in all Hyperscale layers run on Azure Service Fabric, which controls health of each node and performs failovers to available healthy nodes as necessary.

如需超大規模中高可用性的詳細資訊,請參閱 超大規模中的資料庫高可用性For more information on high availability in Hyperscale, see Database High Availability in Hyperscale.

加速資料庫復原 (ADR)Accelerated Database Recovery (ADR)

加速資料庫復原 (ADR) 是一項新的資料庫引擎功能,可大幅提升資料庫可用性,尤其是在有長時間執行的交易時。Accelerated Database Recovery (ADR) is a new database engine feature that greatly improves database availability, especially in the presence of long running transactions. ADR 目前適用于 Azure SQL Database、Azure SQL 受控執行個體和 Azure Synapse Analytics (先前的 SQL 資料倉儲) 。ADR is currently available for Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (formerly SQL Data Warehouse).

測試應用程式錯誤復原Testing application fault resiliency

高可用性是 SQL Database 和 SQL 受控執行個體平台的基礎部分,可為您的資料庫應用程式提供透明的運作。High availability is a fundamental part of the SQL Database and SQL Managed Instance platform that works transparently for your database application. 不過,我們認為您可能想要測試在計劃性或非計劃性事件期間起始的自動容錯移轉作業對應用程式有何影響,然後再將其部署到生產環境。However, we recognize that you may want to test how the automatic failover operations initiated during planned or unplanned events would impact an application before you deploy it to production. 您可以手動觸發容錯移轉,方法是呼叫特殊 API 來重新開機資料庫、彈性集區或受控實例。You can manually trigger a failover by calling a special API to restart a database, an elastic pool, or a managed instance. 在區域冗余資料庫或彈性集區的情況下,API 呼叫會導致將用戶端連線重新導向至可用性區域中的新主資料庫,而不同于舊的主要複本的可用性區域。In the case of a zone redundant database or elastic pool, the API call would result in redirecting client connections to the new primary in an Availability Zone different from the Availability Zone of the old primary. 因此,除了測試容錯移轉如何影響現有的資料庫會話,您也可以確認它是否因為網路延遲的變更而變更了端對端效能。So in addition to testing how failover impacts existing database sessions, you can also verify if it changes the end-to-end performance due to changes in network latency. 由於重新開機作業是侵入式的,且有大量的作業可能會讓平臺壓力,因此每個資料庫、彈性集區或受控實例的每15分鐘只允許一個容錯移轉呼叫。Because the restart operation is intrusive and a large number of them could stress the platform, only one failover call is allowed every 15 minutes for each database, elastic pool, or managed instance.

您可以使用 PowerShell、REST API 或 Azure CLI 來起始容錯移轉:A failover can be initiated using PowerShell, REST API, or Azure CLI:

部署類型Deployment type PowerShellPowerShell REST APIREST API Azure CLIAzure CLI
資料庫Database Invoke-AzSqlDatabaseFailoverInvoke-AzSqlDatabaseFailover 資料庫容錯移轉Database failover az rest 可用來從 Azure CLI 叫用 REST API 呼叫az rest may be used to invoke a REST API call from Azure CLI
彈性集區Elastic pool Invoke-AzSqlElasticPoolFailoverInvoke-AzSqlElasticPoolFailover 彈性集區容錯移轉Elastic pool failover az rest 可用來從 Azure CLI 叫用 REST API 呼叫az rest may be used to invoke a REST API call from Azure CLI
受控執行個體Managed Instance Invoke-AzSqlInstanceFailoverInvoke-AzSqlInstanceFailover 受控實例-容錯移轉Managed Instances - Failover az sql mi 容錯移轉az sql mi failover

重要

超大規模資料庫的可讀取次要複本無法使用容錯移轉命令。The Failover command is not available for readable secondary replicas of Hyperscale databases.

結論Conclusion

Azure SQL Database 和 Azure SQL 受控執行個體提供與 Azure 平臺緊密整合的內建高可用性解決方案。Azure SQL Database and Azure SQL Managed Instance feature a built-in high availability solution, that is deeply integrated with the Azure platform. 這取決於故障偵測和復原的 Service Fabric、用於資料保護的 Azure Blob 儲存體,以及可用性區域以取得更高的容錯 ((如稍早在不適用於 Azure SQL) 受控執行個體的檔中所述)。It is dependent on Service Fabric for failure detection and recovery, on Azure Blob storage for data protection, and on Availability Zones for higher fault tolerance (as mentioned earlier in document not applicable to Azure SQL Managed Instance yet). 此外,SQL Database 和 SQL 受控執行個體利用 SQL Server 實例的 Always On 可用性群組技術進行複寫和容錯移轉。In addition, SQL Database and SQL Managed Instance leverage the Always On availability group technology from the SQL Server instance for replication and failover. 這些技術的組合可讓應用程式完全實現混合儲存模型的優點,並支援最嚴苛的 Sla。The combination of these technologies enables applications to fully realize the benefits of a mixed storage model and support the most demanding SLAs.

後續步驟Next steps