管理 Azure 中 Windows 虛擬機器的可用性Manage the availability of Windows virtual machines in Azure

了解如何設定及管理多部虛擬機器,以確保 Azure 中 Windows 應用程式的高可用性。Learn ways to set up and manage multiple virtual machines to ensure high availability for your Windows application in Azure. 您也可以 管理 Linux 虛擬機器的可用性You can also manage the availability of Linux virtual machines.

如需有關使用傳統部署模型時建立和使用可用性設定組的指示,請參閱 如何設定可用性設定組For instructions on creating and using availability sets when using the classic deployment model, see How to Configure an Availability Set.

了解 VM 重新開機 - 維護與停機時間Understand VM Reboots - maintenance vs. downtime

有三種情況可能會導致 Azure 中的虛擬機器受到影響:未規劃的硬體維護、未預期的停機時間以及規劃的維護。There are three scenarios that can lead to virtual machine in Azure being impacted: unplanned hardware maintenance, unexpected downtime, and planned maintenance.

  • 未規劃的硬體維護事件發生在 Azure 平台預測與實體機器相關聯的硬體或任何平台元件即將失敗 (故障) 時。Unplanned Hardware Maintenance Event occurs when the Azure platform predicts that the hardware or any platform component associated to a physical machine, is about to fail. 平台會在預測到發生失敗時發出未規劃的硬體維護事件,以減少對該硬體上裝載之虛擬機器的影響。When the platform predicts a failure, it will issue an unplanned hardware maintenance event to reduce the impact to the virtual machines hosted on that hardware. Azure 會使用即時移轉技術, 將虛擬機器從失敗的硬體遷移到狀況良好的實體機器。Azure uses Live Migration technology to migrate the Virtual Machines from the failing hardware to a healthy physical machine. 即時移轉是只會短時間暫停虛擬機器的 VM 保留作業。Live Migration is a VM preserving operation that only pauses the Virtual Machine for a short time. 系統會維護記憶體、開啟的檔案和網路連線,但在事件之前及 (或) 之後的效能可能會降低。Memory, open files, and network connections are maintained, but performance might be reduced before and/or after the event. 如果無法使用即時移轉,VM 將會發生未預期的停機時間,如下所述。In cases where Live Migration cannot be used, the VM will experience Unexpected Downtime, as described below.

  • 非預期的停機時間是指虛擬機器的硬體或實體基礎結構發生意外失敗的時間。An Unexpected Downtime is when the hardware or the physical infrastructure for the virtual machine fails unexpectedly. 這可能包含本機網路失敗、本機磁碟失敗,或其他機架層級的失敗。This can include local network failures, local disk failures, or other rack level failures. 在偵測到失敗時,Azure 平台會自動將虛擬機器移轉 (修復) 至相同資料中心內狀況良好的實體機器。When detected, the Azure platform automatically migrates (heals) your virtual machine to a healthy physical machine in the same datacenter. 在修復程序期間,虛擬機器會停機 (重新開機),而在某些情況下,還會遺失暫存磁碟機。During the healing procedure, virtual machines experience downtime (reboot) and in some cases loss of the temporary drive. 連結的 OS 和資料磁碟一律會予以保留。The attached OS and data disks are always preserved.

    在可影響整個資料中心或甚至整個地區的罕見中斷或災害事件中,虛擬機器也可能遇到停機情況。Virtual machines can also experience downtime in the unlikely event of an outage or disaster that affects an entire datacenter, or even an entire region. 針對這些案例,Azure 提供了可用性區域配對地區等保護選項。For these scenarios, Azure provides protection options including availability zones and paired regions.

  • 規劃的維護事件是由 Microsoft 對基礎 Azure 平台進行的定期更新,為虛擬機器在其中執行的平台基礎結構改善整體可靠性、效能和安全性。Planned Maintenance events are periodic updates made by Microsoft to the underlying Azure platform to improve overall reliability, performance, and security of the platform infrastructure that your virtual machines run on. 這些更新大多數都會在不影響虛擬機器或雲端服務的情況下執行 (請參閱 VM 保留維護)。Most of these updates are performed without any impact upon your Virtual Machines or Cloud Services (see VM Preserving Maintenance). 雖然 Azure 平台嘗試在所有可能的情況下使用 VM 保留維護,但是有極少數的情況需要重新啟動虛擬機器以將必要更新套用至基礎結構。While the Azure platform attempts to use VM Preserving Maintenance in all possible occasions, there are rare instances when these updates require a reboot of your virtual machine to apply the required updates to the underlying infrastructure. 在此情況下,您也可以在適合的時間範圍起始 VM 的維護,以使用維護重新部署作業來執行 Azure 規劃的維護。In this case, you can perform Azure Planned Maintenance with Maintenance-Redeploy operation by initiating the maintenance for their VMs in the suitable time window. 如需詳細資訊,請參閱虛擬機器的規劃維護For more information, see Planned Maintenance for Virtual Machines.

為了減少一或多個這些事件造成的停機所帶來的影響,建議您為虛擬機器使用下列高可用性的最佳做法:To reduce the impact of downtime due to one or more of these events, we recommend the following high availability best practices for your virtual machines:

使用可用性區域來防禦資料中心層級的失敗Use availability zones to protect from datacenter level failures

可用性區域可擴充您在 vm 上維護應用程式和資料可用性所需的控制層級。Availability zones expand the level of control you have to maintain the availability of the applications and data on your VMs. 「可用性區域」是 Azure 地區內獨特的實體位置。Availability Zones are unique physical locations within an Azure region. 每個區域皆由一或多個配備獨立電力、冷卻系統及網路的資料中心所組成。Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. 為確保復原, 所有已啟用的區域中至少有三個不同的區域。To ensure resiliency, there are a minimum of three separate zones in all enabled regions. 地區內「可用性區域」的實體區隔可保護應用程式和資料不受資料中心故障影響。The physical separation of Availability Zones within a region protects applications and data from datacenter failures. 區域備援服務會將應用程式和資料複寫至所有「可用性區域」,以防出現單一失敗點。Zone-redundant services replicate your applications and data across Availability Zones to protect from single-points-of-failure.

Azure 區域中的可用性區域是容錯網域更新網域的組合。An Availability Zone in an Azure region is a combination of a fault domain and an update domain. 例如,如果您在 Azure 區域中建立橫跨三個區域的三個 (或更多) VM,您的 VM 會有效地分散到三個容錯網域和三個更新網域。For example, if you create three or more VMs across three zones in an Azure region, your VMs are effectively distributed across three fault domains and three update domains. Azure 平台會從更新網域中辨識此分佈,以確定不會同時更新不同區域中的 VM。The Azure platform recognizes this distribution across update domains to make sure that VMs in different zones are not updated at the same time.

使用「可用性區域」時,Azure 可提供業界最佳的 99.99% VM 執行時間 SLA。With Availability Zones, Azure offers industry best 99.99% VM uptime SLA. 藉由架構您的解決方案, 以在區域中使用複寫的 Vm, 您可以保護應用程式和資料免于遺失資料中心。By architecting your solutions to use replicated VMs in zones, you can protect your applications and data from the loss of a datacenter. 如果有個區域遭到入侵,就可以立即在另一個區域中使用複寫的應用程式和資料。If one zone is compromised, then replicated apps and data are instantly available in another zone.

可用性區域

深入了解如何在可用性區域中部署 WindowsLinux VM。Learn more about deploying a Windows or Linux VM in an Availability Zone.

針對備援在可用性設定組中設定多部虛擬機器Configure multiple virtual machines in an availability set for redundancy

可用性設定組是另一個資料中心設定, 可提供 VM 的冗余和可用性。Availability sets are another datacenter configuration to provide VM redundancy and availability. 資料中心內的這項組態可以確保在規劃或未規劃的維護事件發生期間,至少有一部虛擬機器可以使用,且符合 99.95% 的 Azure SLA。This configuration within a datacenter ensures that during either a planned or unplanned maintenance event, at least one virtual machine is available and meets the 99.95% Azure SLA. 如需相關資訊,請參閱 虛擬機器的 SLAFor more information, see the SLA for Virtual Machines.

重要

避免一個可用性設定組中只有一部執行個體虛擬機器。Avoid leaving a single instance virtual machine in an availability set by itself. 此組態中的 VM 不符合 SLA 的保證,而且會在 Azure 規劃的維護事件期間發生停機狀況,除非單一 VM 是使用 Azure 進階 SSDVMs in this configuration do not qualify for a SLA guarantee and face downtime during Azure planned maintenance events, except when a single VM is using Azure premium SSDs. 使用進階 SSD 的單一 VM 適用 Azure SLA。For single VMs using premium SSDs, the Azure SLA applies.

基礎 Azure 平台會為可用性集合中的每部虛擬機器指派一個更新網域和一個容錯網域Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. 在指定的可用性設定組中,預設指派五個非使用者可設定的更新網域 (接著可以增加 Resource Manager 部署,以提供最多 20 個更新網域),表示虛擬機器群組和可同時重新啟動的基礎實體硬體。For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. 當一個可用性設定組中設定了超過五部虛擬機器,會將第六部虛擬機器放入與第一部虛擬機器相同的更新網域中,而第七部則會放入與第二部相同的更新網域中,以此類推。When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. 重新啟動的更新網域順序可能不會在規劃的維護事件期間循序進行,而只會一次重新啟動一個更新網域。The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. 在不同的更新網域上起始維護之前,重新啟動的更新網域有 30 分鐘的復原時間。A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.

容錯網域定義共用通用電源和網路交換器的虛擬機器群組。Fault domains define the group of virtual machines that share a common power source and network switch. 根據預設,可用性設定組中設定的虛擬機器會分置於最多三個容錯網域中,以進行 Resource Manager 部署 (如果是傳統,則是兩個容錯網域)。By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). 將虛擬機器放入可用性設定組,並無法保護應用程式不會遭受作業系統錯誤或特定應用程式錯誤,而只會限制可能的實體硬體錯誤、網路中斷或電源中斷所帶來的影響。While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.

更新網域和容錯網域設定的概念圖Conceptual drawing of the update domain and fault domain configuration

將受控磁碟使用於可用性設定組中的 VMUse managed disks for VMs in an availability set

如果您目前使用 VM 搭配非受控磁碟,強烈建議您將可用性設定組中的 VM 轉換為受控磁碟If you are currently using VMs with unmanaged disks, we highly recommend you convert VMs in Availability Set to use Managed Disks.

受控磁碟可確保可用性設定組中的 VM 磁碟彼此充分隔離,以避免單一失敗點,為可用性設定組提供更高的可靠性。Managed disks provide better reliability for Availability Sets by ensuring that the disks of VMs in an Availability Set are sufficiently isolated from each other to avoid single points of failure. 作法是自動將磁碟置於不同的儲存體容錯網域 (儲存體叢集),並將它們與 VM 容錯網域對齊。It does this by automatically placing the disks in different storage fault domains (storage clusters) and aligning them with the VM fault domain. 如果因為硬體或軟體失敗而造成儲存體容錯網域失敗,則只有磁碟是在這些儲存體容錯網域上的 VM 執行個體才會失敗。If a storage fault domain fails due to hardware or software failure, only the VM instance with disks on the storage fault domain fails. 受控磁碟 FDManaged disks FDs

重要

受控可用性設定組的容錯網域數目會依區域而異,每個區域會有兩個或三個。The number of fault domains for managed availability sets varies by region - either two or three per region. 下表顯示每個區域擁有的數目:The following table shows the number per region

每個區域的容錯網域數目Number of Fault Domains per region

區域Region 容錯網域的數目上限Max # of Fault Domains
East USEast US 33
美國東部 2East US 2 33
美國西部West US 33
美國西部 2West US 2 22
美國中部Central US 33
美國中北部North Central US 33
美國中南部South Central US 33
美國中西部West Central US 22
加拿大中部Canada Central 33
加拿大東部Canada East 22
北歐North Europe 33
西歐West Europe 33
英國南部UK South 22
英國西部UK West 22
東亞East Asia 22
東南亞South East Asia 22
日本東部Japan East 22
日本西部Japan West 22
印度南部South India 22
印度中部Central India 22
印度西部West India 22
南韓中部Korea Central 22
南韓南部Korea South 22
阿拉伯聯合大公國北部UAE North 22
澳洲東部Australia East 22
澳大利亞東南部Australia Southeast 22
澳大利亞中部Australia Central 22
澳大利亞中部 2Australia Central 2 22
巴西南部Brazil South 22
美國政府維吉尼亞州US Gov Virginia 22
美國政府德克薩斯州US Gov Texas 22
美國政府亞利桑那州US Gov Arizona 22
美國國防部中央US DoD Central 22
美國 DoD 東部US DoD East 22

如果您打算使用 VM 搭配非受控磁碟,請針對 VM 的虛擬硬碟 (VHD) 在其中儲存為分頁 Blob 的儲存體帳戶,遵循以下的最佳做法。If you plan to use VMs with unmanaged disks, follow below best practices for Storage accounts where virtual hard disks (VHDs) of VMs are stored as page blobs.

  1. 將與 VM 相關聯的所有磁碟 (OS 和資料) 保留於相同的儲存體帳戶中Keep all disks (OS and data) associated with a VM in the same storage account
  2. 將更多VHD 新增至儲存體帳戶之前,請檢閱儲存體帳戶中非受控磁碟數目的限制Review the limits on the number of unmanaged disks in a Storage account before adding more VHDs to a storage account
  3. 針對可用性設定組中的每個 VM 使用個別的儲存體帳戶。Use separate storage account for each VM in an Availability Set. 請勿與相同可用性設定組中的多個 VM 共用儲存體帳戶。Do not share Storage accounts with multiple VMs in the same Availability Set. 如果遵循上述的最佳做法,即可接受位於不同可用性設定組的 VM 共用儲存體帳戶 非受控磁碟 FDIt is acceptable for VMs across different Availability Sets to share storage accounts if above best practices are followed Unmanaged disks FDs

使用排定的事件主動回應 VM 影響事件Use scheduled events to proactively respond to VM impacting events

當您訂閱排定的事件時,您的 VM 將會收到即將有維護事件可能會影響到 VM 的通知。When you subscribe to scheduled events, your VM is notified about upcoming maintenance events that can impact your VM. 排定的事件啟用時,您的虛擬機器在維護活動執行之前將有最低限度的時間可因應。When scheduled events are enabled, your virtual machine is given a minimum amount of time before the maintenance activity is performed. 例如,可能對 VM 造成影響的主機 OS 更新會排入佇列中作為事件,以指出相關影響,並在您未採取任何動作時指出將執行維護的時間。For example, Host OS updates that might impact your VM are queued up as events that specify the impact, as well as a time at which the maintenance will be performed if no action is taken. 當 Azure 偵測到即將有硬體故障可能對 VM 造成影響時,也會將排程事件排入佇列,以讓您決定應於何時執行修復。Schedule events are also queued up when Azure detects imminent hardware failure that might impact your VM, which allows you to decide when the healing should be performed. 客戶可以使用此事件在維護之前執行適當工作,例如儲存狀態、容錯移轉至次要機器等等。Customers can use the event to perform tasks prior to the maintenance, such as saving state, failing over to the secondary, and so on. 在您完成依正常程序處理維護事件的邏輯之後,您可以核准待處理的已排定事件,以允許平台繼續進行維護。After you complete your logic for gracefully handling the maintenance event, you can approve the outstanding scheduled event to allow the platform to proceed with maintenance.

將每個應用層設定為不同的可用性區域或可用性設定組Configure each application tier into separate availability zones or availability sets

如果您的虛擬機器幾乎完全相同, 並為您的應用程式提供相同的用途, 建議您為應用程式的每個層級設定可用性區域或可用性設定組。If your virtual machines are all nearly identical and serve the same purpose for your application, we recommend that you configure an availability zone or availability set for each tier of your application. 如果您將兩個不同的層放在相同的可用性區域或集合中, 則可以一次重新開機相同應用層中的所有虛擬機器。If you place two different tiers in the same availability zone or set, all virtual machines in the same application tier can be rebooted at once. 藉由在可用性區域中設定至少兩部虛擬機器, 或為每個層級設定, 可確保每一層中至少有一部虛擬機器可供使用。By configuring at least two virtual machines in an availability zone or set for each tier, you guarantee that at least one virtual machine in each tier is available.

例如, 您可以將所有虛擬機器放在應用程式的前端, 在單一可用性區域中執行 IIS、Apache 和 Nginx, 或設定完成。For example, you could put all the virtual machines in the front end of your application running IIS, Apache, and Nginx in a single availability zone or set. 請確定只有前端虛擬機器放在相同的可用性區域或集合中。Make sure that only front-end virtual machines are placed in the same availability zone or set. 同樣地, 請確定只有資料層虛擬機器會放在自己的可用性區域或集合中, 例如複寫的 SQL Server 虛擬機器或 MySQL 虛擬機器。Similarly, make sure that only data-tier virtual machines are placed in their own availability zone or set, like your replicated SQL Server virtual machines, or your MySQL virtual machines.

應用層Application tiers

結合負載平衡器與可用性區域或集合Combine a load balancer with availability zones or sets

Azure Load Balancer與可用性區域結合, 或設定以取得最大的應用程式復原。Combine the Azure Load Balancer with an availability zone or set to get the most application resiliency. Azure 負載平衡器會在多部虛擬機器之間分配流量。The Azure Load Balancer distributes traffic between multiple virtual machines. 我們的標準層虛擬機器中包含 Azure 負載平衡器。For our Standard tier virtual machines, the Azure Load Balancer is included. 並非所有的虛擬機器階層都包含 Azure Load Balancer。Not all virtual machine tiers include the Azure Load Balancer. 如需關於負載平衡虛擬機器的詳細資訊,請參閱 負載平衡虛擬機器For more information about load balancing your virtual machines, see Load Balancing virtual machines.

若負載平衡器沒有設定為平衡多部虛擬機器之間的流量,則所有計劃性維護事件都只會影響處理流量的虛擬機器,並導致應用程式層中斷。If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. 將同一個層的多部虛擬機器放在相同的負載平衡器和可用性設定組下,可讓至少一個執行個體持續處理流量。Placing multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be continuously served by at least one instance.

如需有關如何在可用性區域間進行負載平衡的教學課程, 請參閱使用 Azure CLI 在所有可用性區域之間進行 vm 的負載平衡For a tutorial on how to load balance across availability zones, see Load balance VMs across all availability zones by using the Azure CLI.

後續步驟Next steps

若要深入了解如何對虛擬機器進行負載平衡,請參閱 對虛擬機器進行負載平衡To learn more about load balancing your virtual machines, see Load Balancing virtual machines.