您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

在 Azure 中管理 Windows 虚拟机的可用性Manage the availability of Windows virtual machines in Azure

了解如何设置和管理多个虚拟机,以确保 Windows 应用程序在 Azure 中的高可用性。Learn ways to set up and manage multiple virtual machines to ensure high availability for your Windows application in Azure. 也可以管理 Linux 虚拟机的可用性You can also manage the availability of Linux virtual machines.

有关在使用经典部署模型时创建和使用可用性集的说明,请参阅如何配置可用性集For instructions on creating and using availability sets when using the classic deployment model, see How to Configure an Availability Set.

了解 VM 重启 - 维护和停机Understand VM Reboots - maintenance vs. downtime

有三种情况可能会导致 Azure 中的虚拟机受影响:计划外硬件维护、意外停机、计划内维护。There are three scenarios that can lead to virtual machine in Azure being impacted: unplanned hardware maintenance, unexpected downtime, and planned maintenance.

  • 当 Azure 平台预测硬件或者与物理计算机关联的任何平台组件即将发生故障时,就会发生计划外硬件维护事件。Unplanned Hardware Maintenance Event occurs when the Azure platform predicts that the hardware or any platform component associated to a physical machine, is about to fail. 当预测到故障时,平台会发出计划外硬件维护事件,以便减少对托管在该硬件上的虚拟机的影响。When the platform predicts a failure, it will issue an unplanned hardware maintenance event to reduce the impact to the virtual machines hosted on that hardware. Azure 使用实时迁移技术将虚拟机从故障硬件迁移到正常的物理计算机。Azure uses Live Migration technology to migrate the Virtual Machines from the failing hardware to a healthy physical machine. 实时迁移是一项 VM 保留操作,只能短时间暂停虚拟机。Live Migration is a VM preserving operation that only pauses the Virtual Machine for a short time. 将会保留内存、打开的文件以及网络连接,但事件前后的性能可能会降低。Memory, open files, and network connections are maintained, but performance might be reduced before and/or after the event. 在无法使用实时迁移的情况下,VM 会出现意外停机,如下所述。In cases where Live Migration cannot be used, the VM will experience Unexpected Downtime, as described below.

  • 意外停机指虚拟机的硬件或物理基础设施意外出现故障。An Unexpected Downtime is when the hardware or the physical infrastructure for the virtual machine fails unexpectedly. 此类故障可能包括:本地网络故障、本地磁盘故障,或者其他机架级别的故障。This can include local network failures, local disk failures, or other rack level failures. 检测到此类故障时,Azure 平台会自动将虚拟机迁移到同一数据中心内的正常物理机(进行修复)。When detected, the Azure platform automatically migrates (heals) your virtual machine to a healthy physical machine in the same datacenter. 在修复过程中,虚拟机会经历停机(重启),在某些情况下会丢失临时驱动器。During the healing procedure, virtual machines experience downtime (reboot) and in some cases loss of the temporary drive. 始终会保留附加的 OS 和数据磁盘。The attached OS and data disks are always preserved.

    在发生会影响整个数据中心甚至整个区域的服务中断或灾难时(这种情况很少见),虚拟机也可能会停机。Virtual machines can also experience downtime in the unlikely event of an outage or disaster that affects an entire datacenter, or even an entire region. 针对这种情况,Azure 提供了保护选项,包括可用性区域配对区域For these scenarios, Azure provides protection options including availability zones and paired regions.

  • 计划内维护事件是指由 Microsoft 对底层 Azure平台进行定期更新,以改进虚拟机运行时所在的平台基础结构的总体可靠性、性能和安全性。Planned Maintenance events are periodic updates made by Microsoft to the underlying Azure platform to improve overall reliability, performance, and security of the platform infrastructure that your virtual machines run on. 大多数此类更新在执行时不会影响虚拟机或云服务(请参阅 VM 保留维护)。Most of these updates are performed without any impact upon your Virtual Machines or Cloud Services (see VM Preserving Maintenance). 虽然 Azure 平台会尝试在所有可能的情况下都使用 VM 保留维护,但在罕见情况下,这些更新需要重启虚拟机,否则无法将所需更新应用到底层基础结构。While the Azure platform attempts to use VM Preserving Maintenance in all possible occasions, there are rare instances when these updates require a reboot of your virtual machine to apply the required updates to the underlying infrastructure. 在这种情况下,可以在合适的时间窗口为 VM 启动维护,通过“维护-重新部署”操作来执行 Azure 计划内维护。In this case, you can perform Azure Planned Maintenance with Maintenance-Redeploy operation by initiating the maintenance for their VMs in the suitable time window. 有关详细信息,请参阅虚拟机的计划内维护For more information, see Planned Maintenance for Virtual Machines.

要减轻一个或多个此类事件引发的停机所造成的影响,我们建议遵循以下最佳做法以提高虚拟机的可用性:To reduce the impact of downtime due to one or more of these events, we recommend the following high availability best practices for your virtual machines:

使用可用性区域防范数据中心级故障Use availability zones to protect from datacenter level failures

"可用性区域" 展开控制级别, 你必须在 vm 上维护应用程序和数据的可用性。Availability zones expand the level of control you have to maintain the availability of the applications and data on your VMs. 可用性区域是 Azure 区域中独特的物理位置。Availability Zones are unique physical locations within an Azure region. 每个区域由一个或多个数据中心组成,这些数据中心配置了独立电源、冷却和网络。Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. 为了确保复原能力, 所有启用的区域中都至少有三个单独的区域。To ensure resiliency, there are a minimum of three separate zones in all enabled regions. 区域中可用性区域的物理隔离可以在发生数据中心故障的情况下保护应用程序和数据。The physical separation of Availability Zones within a region protects applications and data from datacenter failures. 区域冗余服务可跨可用性区域复制应用程序和数据,以防范单点故障。Zone-redundant services replicate your applications and data across Availability Zones to protect from single-points-of-failure.

Azure 区域中的可用性区域是容错域更新域的组合。An Availability Zone in an Azure region is a combination of a fault domain and an update domain. 例如,如果在 Azure 区域的三个区域中创建三个或更多 VM,则 VM 将有效分布在三个容错域和三个更新域中。For example, if you create three or more VMs across three zones in an Azure region, your VMs are effectively distributed across three fault domains and three update domains. Azure 平台会识别更新域上的此分布,以确保不同区域中的 VM 不会同时更新。The Azure platform recognizes this distribution across update domains to make sure that VMs in different zones are not updated at the same time.

Azure 凭借可用性区域提供一流的 99.99% VM 运行时间 SLA。With Availability Zones, Azure offers industry best 99.99% VM uptime SLA. 通过构建解决方案以在区域中使用复制的 Vm, 你可以保护应用程序和数据免受数据中心的损失。By architecting your solutions to use replicated VMs in zones, you can protect your applications and data from the loss of a datacenter. 如果一个区域发生故障,另一个区域会立即提供复制的应用和数据。If one zone is compromised, then replicated apps and data are instantly available in another zone.


详细了解如何在可用性区域中部署 WindowsLinux VM。Learn more about deploying a Windows or Linux VM in an Availability Zone.

在可用性集中配置多个虚拟机以确保冗余Configure multiple virtual machines in an availability set for redundancy

可用性集是用于提供 VM 冗余和可用性的另一个数据中心配置。Availability sets are another datacenter configuration to provide VM redundancy and availability. 数据中心内的这种配置可以确保在发生计划内或计划外维护事件时,至少有一个虚拟机可用,并满足 99.95% 的 Azure SLA 要求。This configuration within a datacenter ensures that during either a planned or unplanned maintenance event, at least one virtual machine is available and meets the 99.95% Azure SLA. 有关详细信息,请参阅虚拟机的 SLAFor more information, see the SLA for Virtual Machines.


避免将单实例虚拟机单独地置于可用性集中。Avoid leaving a single instance virtual machine in an availability set by itself. 此配置中的 VM 并不符合 SLA 保证,在出现 Azure 计划内维护事件时会停机,除非某个 VM 正在使用 Azure 高级 SSDVMs in this configuration do not qualify for a SLA guarantee and face downtime during Azure planned maintenance events, except when a single VM is using Azure premium SSDs. 对于使用高级 SSD 的单一 VM,Azure SLA 适用。For single VMs using premium SSDs, the Azure SLA applies.

基础 Azure 平台为可用性集中的每个虚拟机分配一个更新域和一个容错域。Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. 对于给定的可用性集,默认情况下会分配五个非用户可配置的更新域(可以增加资源管理器部署以最多提供 20 个更新域),以指示可同时重新启动的虚拟机和底层物理硬件组。For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. 在单个可用性集中配置了 5 个以上的虚拟机时,第 6 个虚拟机将放置在第 1 个虚拟机所在的更新域中,第 7 个虚拟机将放置在第 2 个虚拟机所在的更新域中,依此类推。When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. 在计划内维护期间,更新域的重启顺序可能不会按序进行,但一次只重启一个更新域。The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. 重启的更新域有 30 分钟的时间进行恢复,此时间过后,就会在另一更新域上启动维护操作。A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.

容错域定义一组共用一个通用电源和网络交换机的虚拟机。Fault domains define the group of virtual machines that share a common power source and network switch. 默认情况下,在可用性集中配置的虚拟机隔离在资源管理器部署的最多三个容错域(经典部署的两个容错域)中。By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). 虽然将虚拟机置于可用性集中并不能让应用程序免受特定于操作系统或应用程序的故障的影响,但可以限制潜在物理硬件故障、网络中断或电源中断的影响。While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.

更新域和容错域配置的概念图Conceptual drawing of the update domain and fault domain configuration

为可用性集中的 VM 使用托管磁盘Use managed disks for VMs in an availability set

如果当前使用的 VM 没有托管磁盘,则强烈建议在可用性集中转换 VM,以便使用托管磁盘If you are currently using VMs with unmanaged disks, we highly recommend you convert VMs in Availability Set to use Managed Disks.

通过确保可用性集中的 VM 的磁盘彼此之间完全隔离以避免单点故障,托管磁盘为可用性集提供了更佳的可靠性。Managed disks provide better reliability for Availability Sets by ensuring that the disks of VMs in an Availability Set are sufficiently isolated from each other to avoid single points of failure. 为此,会自动将磁盘放置在不同的存储容错域(存储群集)中,并使它们与 VM 容错域一致。It does this by automatically placing the disks in different storage fault domains (storage clusters) and aligning them with the VM fault domain. 如果某个存储容错域因硬件或软件故障而失败,则只有其磁盘在该存储容错域上的 VM 实例会失败。If a storage fault domain fails due to hardware or software failure, only the VM instance with disks on the storage fault domain fails. 托管磁盘 FDManaged disks FDs


托管可用性集的容错域的数目因区域而异 - 每个区域两到三个。The number of fault domains for managed availability sets varies by region - either two or three per region. 下表显示了每个区域的数目The following table shows the number per region

每个区域的容错域数Number of Fault Domains per region

区域Region 最大容错域数Max # of Fault Domains
East USEast US 33
美国东部 2East US 2 33
美国西部West US 33
美国西部 2West US 2 22
美国中部Central US 33
美国中北部North Central US 33
美国中南部South Central US 33
美国中西部West Central US 22
加拿大中部Canada Central 33
加拿大东部Canada East 22
北欧North Europe 33
西欧West Europe 33
英国南部UK South 22
英国西部UK West 22
东亚East Asia 22
东南亚South East Asia 22
日本东部Japan East 22
日本西部Japan West 22
印度南部South India 22
印度中部Central India 22
印度西部West India 22
韩国中部Korea Central 22
韩国南部Korea South 22
阿拉伯联合酋长国北部UAE North 22
澳大利亚东部Australia East 22
澳大利亚东南部Australia Southeast 22
澳大利亚中部Australia Central 22
澳大利亚中部 2Australia Central 2 22
巴西南部Brazil South 22
美国政府弗吉尼亚州US Gov Virginia 22
美国德克萨斯州政府US Gov Texas 22
美国亚利桑那州政府US Gov Arizona 22
美国 DoD 中部US DoD Central 22
美国 DoD 东部US DoD East 22

如果计划使用包含非托管磁盘的 VM,请按下述针对存储帐户的最佳做法进行操作。在这些存储帐户中,VM 的虚拟硬盘 (VHD) 以页 Blob 形式存储。If you plan to use VMs with unmanaged disks, follow below best practices for Storage accounts where virtual hard disks (VHDs) of VMs are stored as page blobs.

  1. 将与同一 VM 关联的所有磁盘(OS 和数据)放置在同一存储帐户中Keep all disks (OS and data) associated with a VM in the same storage account
  2. 在向存储帐户添加更多 VHD 之前,请查看存储帐户中非托管磁盘的数量限制Review the limits on the number of unmanaged disks in a Storage account before adding more VHDs to a storage account
  3. 为可用性集中的每个 VM 使用单独的存储帐户。Use separate storage account for each VM in an Availability Set. 同一可用性集中的多个 VM 不能共享存储帐户。Do not share Storage accounts with multiple VMs in the same Availability Set. 不同可用性集中的 VM 共享存储帐户是可以接受的,只要遵循上述最佳做法即可 托管磁盘 FDIt is acceptable for VMs across different Availability Sets to share storage accounts if above best practices are followed Unmanaged disks FDs

使用计划事件主动响应影响事件的 VMUse scheduled events to proactively respond to VM impacting events

如果订阅计划事件,则将通知 VM 即将发生会对 VM 造成影响的维护事件。When you subscribe to scheduled events, your VM is notified about upcoming maintenance events that can impact your VM. 启用计划事件后,可在执行维护活动之前为虚拟机提供最少的时间。When scheduled events are enabled, your virtual machine is given a minimum amount of time before the maintenance activity is performed. 例如,可能会影响 VM 的主机 OS 更新将作为事件排队等候,通知中将详述其影响,以及在未采取任何操作的情况下执行维护的时间。For example, Host OS updates that might impact your VM are queued up as events that specify the impact, as well as a time at which the maintenance will be performed if no action is taken. 当 Azure 检测到即将发生可能影响 VM 的硬件失败时,计划事件也会排队等候,以便决定执行修复的时间。Schedule events are also queued up when Azure detects imminent hardware failure that might impact your VM, which allows you to decide when the healing should be performed. 客户可以使用事件在维护前执行任务,例如,保存状态、故障转移到辅助 VM 等。Customers can use the event to perform tasks prior to the maintenance, such as saving state, failing over to the secondary, and so on. 完成用于妥善处理维护事件的逻辑后,可批准未完成的计划事件,以允许平台继续进行维护。After you complete your logic for gracefully handling the maintenance event, you can approve the outstanding scheduled event to allow the platform to proceed with maintenance.

将每个应用层配置为单独的可用性区域或可用性集Configure each application tier into separate availability zones or availability sets

如果虚拟机几乎完全相同, 并且为应用程序提供相同的目的, 我们建议为每个应用程序层配置可用性区域或可用性集。If your virtual machines are all nearly identical and serve the same purpose for your application, we recommend that you configure an availability zone or availability set for each tier of your application. 如果将两个不同的层置于同一可用性区域或集中, 则同一应用程序层中的所有虚拟机都可以同时重启。If you place two different tiers in the same availability zone or set, all virtual machines in the same application tier can be rebooted at once. 通过在可用性区域中配置至少两个虚拟机或为每个层设置, 可确保每个层中至少有一个虚拟机可用。By configuring at least two virtual machines in an availability zone or set for each tier, you guarantee that at least one virtual machine in each tier is available.

例如, 可以将运行 IIS、Apache 和 Nginx 的应用程序前端的所有虚拟机放入单个可用性区域或集。For example, you could put all the virtual machines in the front end of your application running IIS, Apache, and Nginx in a single availability zone or set. 请确保仅将前端虚拟机放置在同一可用性区域中。Make sure that only front-end virtual machines are placed in the same availability zone or set. 同样, 请确保仅将数据层虚拟机置于其自身的可用性区域中或设置, 如复制的 SQL Server 虚拟机或 MySQL 虚拟机。Similarly, make sure that only data-tier virtual machines are placed in their own availability zone or set, like your replicated SQL Server virtual machines, or your MySQL virtual machines.

应用程序层Application tiers

结合使用负载均衡器和可用性区域Combine a load balancer with availability zones or sets

Azure 负载均衡器与可用性区域相结合, 或将其设置为最大程度地提高应用程序复原能力。Combine the Azure Load Balancer with an availability zone or set to get the most application resiliency. Azure 负载均衡器将流量分布到多个虚拟机中。The Azure Load Balancer distributes traffic between multiple virtual machines. 对于标准层虚拟机来说,Azure 负载均衡器已包括在内。For our Standard tier virtual machines, the Azure Load Balancer is included. 并非所有虚拟机层都包括 Azure 负载均衡器。Not all virtual machine tiers include the Azure Load Balancer. 有关对虚拟机进行负载均衡的更多信息,请阅读对虚拟机进行负载均衡For more information about load balancing your virtual machines, see Load Balancing virtual machines.

如果没有将负载均衡器配置为对多个虚拟机上的流量进行平衡,则任何计划内维护事件都会影响唯一的那个处理流量的虚拟机,导致应用程序层中断。If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. 将同一层的多个虚拟机置于相同的负载均衡器和可用性集下可以确保至少有一个虚拟机实例能够持续处理流量。Placing multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be continuously served by at least one instance.

有关如何在可用性区域之间进行负载平衡的教程, 请参阅使用 Azure CLI 跨所有可用性区域对 vm 进行负载均衡For a tutorial on how to load balance across availability zones, see Load balance VMs across all availability zones by using the Azure CLI.

后续步骤Next steps

若要了解有关对虚拟机进行负载均衡的详细信息,请参阅对虚拟机进行负载均衡To learn more about load balancing your virtual machines, see Load Balancing virtual machines.