您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

在 Azure 中管理 Windows 虚拟机的可用性Manage the availability of Windows virtual machines in Azure

了解如何设置和管理多个虚拟机,以确保 Windows 应用程序在 Azure 中的高可用性。Learn ways to set up and manage multiple virtual machines to ensure high availability for your Windows application in Azure. 也可以管理 Linux 虚拟机的可用性You can also manage the availability of Linux virtual machines.

有关在使用经典部署模型时创建和使用可用性集的说明,请参阅如何配置可用性集For instructions on creating and using availability sets when using the classic deployment model, see How to Configure an Availability Set.

了解 VM 重启 - 维护和停机Understand VM Reboots - maintenance vs. downtime

有三种情况可能会导致 Azure 中的虚拟机受影响:计划外硬件维护、意外停机、计划内维护。There are three scenarios that can lead to virtual machine in Azure being impacted: unplanned hardware maintenance, unexpected downtime, and planned maintenance.

  • 当 Azure 平台预测硬件或者与物理计算机关联的任何平台组件即将发生故障时,就会发生计划外硬件维护事件。Unplanned Hardware Maintenance Event occurs when the Azure platform predicts that the hardware or any platform component associated to a physical machine, is about to fail. 当预测到故障时,平台会发出计划外硬件维护事件,以便减少对托管在该硬件上的虚拟机的影响。When the platform predicts a failure, it will issue an unplanned hardware maintenance event to reduce the impact to the virtual machines hosted on that hardware. Azure 使用实时迁移技术将虚拟机从故障硬件迁移到健康的物理计算机。Azure uses Live Migration technology to migrate the Virtual Machines from the failing hardware to a healthy physical machine. 实时迁移是一项 VM 保留操作,只能短时间暂停虚拟机。Live Migration is a VM preserving operation that only pauses the Virtual Machine for a short time. 将会保留内存、打开的文件以及网络连接,但事件前后的性能可能会降低。Memory, open files, and network connections are maintained, but performance might be reduced before and/or after the event. 在无法使用实时迁移的情况下,VM 会出现意外停机,如下所述。In cases where Live Migration cannot be used, the VM will experience Unexpected Downtime, as described below.

  • 意外停机指虚拟机的硬件或物理基础设施意外出现故障。An Unexpected Downtime is when the hardware or the physical infrastructure for the virtual machine fails unexpectedly. 此类故障可能包括:本地网络故障、本地磁盘故障,或者其他机架级别的故障。This can include local network failures, local disk failures, or other rack level failures. 检测到此类故障时,Azure 平台会自动将虚拟机迁移到同一数据中心内的正常物理机(进行修复)。When detected, the Azure platform automatically migrates (heals) your virtual machine to a healthy physical machine in the same datacenter. 在修复过程中,虚拟机会经历停机(重启),在某些情况下会丢失临时驱动器。During the healing procedure, virtual machines experience downtime (reboot) and in some cases loss of the temporary drive. 始终会保留附加的 OS 和数据磁盘。The attached OS and data disks are always preserved.

    在发生会影响整个数据中心甚至整个区域的服务中断或灾难时(这种情况很少见),虚拟机也可能会停机。Virtual machines can also experience downtime in the unlikely event of an outage or disaster that affects an entire datacenter, or even an entire region. 针对这种情况,Azure 提供了保护选项,包括可用性区域配对区域For these scenarios, Azure provides protection options including availability zones and paired regions.

  • 计划内维护事件是指由 Microsoft 对底层 Azure平台进行定期更新,以改进虚拟机运行时所在的平台基础结构的总体可靠性、性能和安全性。Planned Maintenance events are periodic updates made by Microsoft to the underlying Azure platform to improve overall reliability, performance, and security of the platform infrastructure that your virtual machines run on. 大多数此类更新在执行时不会影响虚拟机或云服务(请参阅 VM 保留维护)。Most of these updates are performed without any impact upon your Virtual Machines or Cloud Services (see VM Preserving Maintenance). 虽然 Azure 平台会尝试在所有可能的情况下都使用 VM 保留维护,但在罕见情况下,这些更新需要重启虚拟机,否则无法将所需更新应用到底层基础结构。While the Azure platform attempts to use VM Preserving Maintenance in all possible occasions, there are rare instances when these updates require a reboot of your virtual machine to apply the required updates to the underlying infrastructure. 在这种情况下,可以在合适的时间窗口为 VM 启动维护,通过“维护-重新部署”操作来执行 Azure 计划内维护。In this case, you can perform Azure Planned Maintenance with Maintenance-Redeploy operation by initiating the maintenance for their VMs in the suitable time window. 有关详细信息,请参阅虚拟机的计划内维护For more information, see Planned Maintenance for Virtual Machines.

要减轻一个或多个此类事件引发的停机所造成的影响,我们建议遵循以下最佳做法以提高虚拟机的可用性:To reduce the impact of downtime due to one or more of these events, we recommend the following high availability best practices for your virtual machines:

在可用性集中配置多个虚拟机以确保冗余Configure multiple virtual machines in an availability set for redundancy

要为应用程序提供冗余,建议将两个或更多虚拟机组合到一个可用性集中。To provide redundancy to your application, we recommend that you group two or more virtual machines in an availability set. 数据中心内的这种配置可以确保在发生计划内或计划外维护事件时,至少有一个虚拟机可用,并满足 99.95% 的 Azure SLA 要求。This configuration within a datacenter ensures that during either a planned or unplanned maintenance event, at least one virtual machine is available and meets the 99.95% Azure SLA. 有关详细信息,请参阅虚拟机的 SLAFor more information, see the SLA for Virtual Machines.

重要

避免将单实例虚拟机单独地置于可用性集中。Avoid leaving a single instance virtual machine in an availability set by itself. 此配置中的 VM 并不符合 SLA 保证,在出现 Azure 计划内维护事件时会停机,除非某个 VM 正在使用 Azure 高级存储VMs in this configuration do not qualify for a SLA guarantee and face downtime during Azure planned maintenance events, except when a single VM is using Azure Premium Storage. 对于使用高级存储的单一 VM,Azure SLA 适用。For single VMs using premium storage, the Azure SLA applies.

基础 Azure 平台为可用性集中的每个虚拟机分配一个更新域和一个容错域。Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure platform. 对于给定的可用性集,默认情况下会分配五个非用户可配置的更新域(可以增加资源管理器部署以最多提供 20 个更新域),以指示可同时重新启动的虚拟机和底层物理硬件组。For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. 在单个可用性集中配置了 5 个以上的虚拟机时,第 6 个虚拟机将放置在第 1 个虚拟机所在的更新域中,第 7 个虚拟机将放置在第 2 个虚拟机所在的更新域中,依此类推。When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. 在计划内维护期间,更新域的重启顺序可能不会按序进行,但一次只重启一个更新域。The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. 重启的更新域有 30 分钟的时间进行恢复,此时间过后,就会在另一更新域上启动维护操作。A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.

容错域定义一组共用一个通用电源和网络交换机的虚拟机。Fault domains define the group of virtual machines that share a common power source and network switch. 默认情况下,在可用性集中配置的虚拟机隔离在资源管理器部署的最多三个容错域(经典部署的两个容错域)中。By default, the virtual machines configured within your availability set are separated across up to three fault domains for Resource Manager deployments (two fault domains for Classic). 虽然将虚拟机置于可用性集中并不能让应用程序免受特定于操作系统或应用程序的故障的影响,但可以限制潜在物理硬件故障、网络中断或电源中断的影响。While placing your virtual machines into an availability set does not protect your application from operating system or application-specific failures, it does limit the impact of potential physical hardware failures, network outages, or power interruptions.

更新域和容错域配置的概念图

为可用性集中的 VM 使用托管磁盘Use managed disks for VMs in an availability set

如果当前使用的 VM 没有托管磁盘,则强烈建议在可用性集中转换 VM,以便使用托管磁盘If you are currently using VMs with unmanaged disks, we highly recommend you convert VMs in Availability Set to use Managed Disks.

通过确保可用性集中的 VM 的磁盘彼此之间完全隔离以避免单点故障,托管磁盘为可用性集提供了更佳的可靠性。Managed disks provide better reliability for Availability Sets by ensuring that the disks of VMs in an Availability Set are sufficiently isolated from each other to avoid single points of failure. 为此,会自动将磁盘放置在不同的存储容错域(存储群集)中,并使它们与 VM 容错域一致。It does this by automatically placing the disks in different storage fault domains (storage clusters) and aligning them with the VM fault domain. 如果某个存储容错域因硬件或软件故障而失败,则只有其磁盘在该存储容错域上的 VM 实例会失败。If a storage fault domain fails due to hardware or software failure, only the VM instance with disks on the storage fault domain fails. 托管磁盘 FDManaged disks FDs

重要

托管可用性集的容错域的数目因区域而异 - 每个区域两到三个。The number of fault domains for managed availability sets varies by region - either two or three per region. 下表显示了每个区域的数目The following table shows the number per region

每个区域的容错域数Number of Fault Domains per region

区域Region 最大容错域数Max # of Fault Domains
美国东部East US 33
美国东部 2East US 2 33
美国西部West US 33
美国西部 2West US 2 22
美国中部Central US 33
美国中北部North Central US 33
美国中南部South Central US 33
美国中西部West Central US 22
加拿大中部Canada Central 33
加拿大东部Canada East 22
北欧North Europe 33
西欧West Europe 33
英国南部UK South 22
英国西部UK West 22
东亚East Asia 22
东南亚South East Asia 22
日本东部Japan East 22
日本西部Japan West 22
印度南部South India 22
印度中部Central India 22
印度西部West India 22
韩国中部Korea Central 22
韩国南部Korea South 22
澳大利亚东部Australia East 22
澳大利亚东南部Australia Southeast 22
巴西南部Brazil South 22
美国政府弗吉尼亚州US Gov Virginia 22
美国德克萨斯州政府US Gov Texas 22
美国亚利桑那州政府US Gov Arizona 22
美国 DoD 中部US DoD Central 22
美国 DoD 东部US DoD East 22

如果计划使用包含非托管磁盘的 VM,请按下述针对存储帐户的最佳做法进行操作。在这些存储帐户中,VM 的虚拟硬盘 (VHD) 以页 Blob 形式存储。If you plan to use VMs with unmanaged disks, follow below best practices for Storage accounts where virtual hard disks (VHDs) of VMs are stored as page blobs.

  1. 将与同一 VM 关联的所有磁盘(OS 和数据)放置在同一存储帐户中Keep all disks (OS and data) associated with a VM in the same storage account
  2. 在向存储帐户添加更多 VHD 之前,请查看存储帐户中非托管磁盘的数量限制Review the limits on the number of unmanaged disks in a Storage account before adding more VHDs to a storage account
  3. 为可用性集中的每个 VM 使用单独的存储帐户。Use separate storage account for each VM in an Availability Set. 同一可用性集中的多个 VM 不能共享存储帐户。Do not share Storage accounts with multiple VMs in the same Availability Set. 不同可用性集中的 VM 共享存储帐户是可以接受的,只要遵循上述最佳做法即可 托管磁盘 FDIt is acceptable for VMs across different Availability Sets to share storage accounts if above best practices are followed Unmanaged disks FDs

将每个应用程序层配置到不同的可用性集中Configure each application tier into separate availability sets

如果虚拟机几乎都是相同的,并且对应用程序的用途是一样的,我们建议针对每个应用程序层配置可用性集。If your virtual machines are all nearly identical and serve the same purpose for your application, we recommend that you configure an availability set for each tier of your application. 如果将两个不同的层置于同一可用性集中,则同一应用程序层中的所有虚拟机可以同时重启。If you place two different tiers in the same availability set, all virtual machines in the same application tier can be rebooted at once. 通过在可用性集中为每个层配置至少两个虚拟机,可以确保每个层中至少有一个虚拟机可用。By configuring at least two virtual machines in an availability set for each tier, you guarantee that at least one virtual machine in each tier is available.

例如,可以将运行 IIS、Apache、Nginx 的应用程序前端的所有虚拟机置于单个可用性集中。For example, you could put all the virtual machines in the front end of your application running IIS, Apache, Nginx in a single availability set. 请确保仅将前端虚拟机置于同一可用性集中。Make sure that only front-end virtual machines are placed in the same availability set. 同样,请确保仅将数据层虚拟机置于其自身的可用性集中,例如已复制的 SQL Server 虚拟机或 MySQL 虚拟机。Similarly, make sure that only data-tier virtual machines are placed in their own availability set, like your replicated SQL Server virtual machines, or your MySQL virtual machines.

应用程序层

将负载均衡器与可用性集组合在一起Combine a load balancer with availability sets

Azure 负载均衡器 与可用性集组合在一起,以获取最大的应用程序复原能力。Combine the Azure Load Balancer with an availability set to get the most application resiliency. Azure 负载均衡器将流量分布到多个虚拟机中。The Azure Load Balancer distributes traffic between multiple virtual machines. 对于标准层虚拟机来说,Azure 负载均衡器已包括在内。For our Standard tier virtual machines, the Azure Load Balancer is included. 并非所有虚拟机层都包括 Azure 负载均衡器。Not all virtual machine tiers include the Azure Load Balancer. 有关对虚拟机进行负载均衡的更多信息,请阅读对虚拟机进行负载均衡For more information about load balancing your virtual machines, see Load Balancing virtual machines.

如果没有将负载均衡器配置为对多个虚拟机上的流量进行平衡,则任何计划内维护事件都会影响唯一的那个处理流量的虚拟机,导致应用程序层中断。If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. 将同一层的多个虚拟机置于相同的负载均衡器和可用性集下可以确保至少有一个虚拟机实例能够持续处理流量。Placing multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be continuously served by at least one instance.

使用可用性区域防范数据中心级故障Use availability zones to protect from datacenter level failures

可用性区域是可用性集的替代方案,提高了在保持 VM 上应用程序和数据可用性时的控制度。Availability zones, an alternative to availability sets, expand the level of control you have to maintain the availability of the applications and data on your VMs. 可用性区域是 Azure 区域中的物理独立区域。An Availability Zone is a physically separate zone within an Azure region. 每个受支持的 Azure 区域有三个可用性区域。There are three Availability Zones per supported Azure region. 每个可用性区域有独立的电源、网络和散热设备,在逻辑上与 Azure 区域中的其他可用性区域保持独立。Each Availability Zone has a distinct power source, network, and cooling, and is logically separate from the other Availability Zones within the Azure region. 通过将解决方案构建为使用区域中复制的 VM,可以在数据中心服务中断时保护应用和数据。By architecting your solutions to use replicated VMs in zones, you can protect your apps and data from the loss of a datacenter. 如果一个区域发生故障,另一个区域会立即提供复制的应用和数据。If one zone is compromised, then replicated apps and data are instantly available in another zone.

可用性区域

详细了解如何在可用性区域中部署 WindowsLinux VM。Learn more about deploying a Windows or Linux VM in an Availability Zone.

后续步骤Next steps

若要了解有关对虚拟机进行负载均衡的详细信息,请参阅对虚拟机进行负载均衡To learn more about load balancing your virtual machines, see Load Balancing virtual machines.