您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

高性能计算 VM 大小High performance computing VM sizes

Azure H 系列虚拟机 (VM) 旨在为各种实际 HPC 工作负载提供领先性能、MPI 可扩展性和成本效益。Azure H-series virtual machines (VMs) are designed to deliver leadership-class performance, MPI scalability, and cost efficiency for a variety of real-world HPC workloads.

HBv2 系列VM 具有 200 Gb/秒 Mellanox HDR InfiniBand,而 HB 和 HC 系列 VM 均具有 100 Gb/秒 Mellanox EDR InfiniBand。HBv2-series VMs feature 200 Gb/sec Mellanox HDR InfiniBand, while both HB and HC-series VMs feature 100 Gb/sec Mellanox EDR InfiniBand. 这些 VM 类型中的每一种都连接到非阻塞的胖树中,以优化和一致的 RDMA 性能。Each of these VM types are connected in a non-blocking fat tree for optimized and consistent RDMA performance. HBv2 VM 支持自适应路由和动态连接传输(DCT,除了标准 RC 和 UD 传输)。HBv2 VMs support Adaptive Routing and the Dynamic Connected Transport (DCT, in additional to standard RC and UD transports). 这些功能增强了应用程序性能、可扩展性和一致性,强烈建议使用这些功能。These features enhance application performance, scalability, and consistency, and usage of them is strongly recommended.

HB系列VM 针对由内存带宽驱动的应用进行了优化,例如流体动力学、显式有限元分析和天气建模。HB-series VMs are optimized for applications driven by memory bandwidth, such as fluid dynamics, explicit finite element analysis, and weather modeling. HB VM 具有 60 个 AMD EPYC 7551 处理器内核、每个 CPU 内核 4 GB 的 RAM 以及无超线程读取功能。HB VMs feature 60 AMD EPYC 7551 processor cores, 4 GB of RAM per CPU core, and no hyperthreading. AMD EPYC 平台提供超过 260 GB/秒的内存带宽。The AMD EPYC platform provides more than 260 GB/sec of memory bandwidth.

HC系列VM 针对密集计算驱动的应用进行了优化,例如隐式有限元分析、分子动力学和计算化学。HC-series VMs are optimized for applications driven by dense computation, such as implicit finite element analysis, molecular dynamics, and computational chemistry. HC VM 具有 44 个英特尔至强白金 8168 处理器内核、每个 CPU 内核 8 GB 的 RAM,并且没有超线程。HC VMs feature 44 Intel Xeon Platinum 8168 processor cores, 8 GB of RAM per CPU core, and no hyperthreading. 英特尔到强白金平台支持英特尔丰富的软件工具生态系统,如英特尔数学内核库。The Intel Xeon Platinum platform supports Intel’s rich ecosystem of software tools such as the Intel Math Kernel Library.

H 系列VM 针对由高 CPU 频率或每个核心要求的大内存驱动的应用进行了优化。H-series VMs are optimized for applications driven by high CPU frequencies or large memory per core requirements. H 系列 VM 具有 8 或 16 英特尔至强 E5 2667 v3 处理器内核、每个 CPU 内核 7 或 14 GB 的 RAM,并且没有超线程。H-series VMs feature 8 or 16 Intel Xeon E5 2667 v3 processor cores, 7 or 14 GB of RAM per CPU core, and no hyperthreading. H 系列采用 56 Gb/秒 Mellanox FDR InfiniBand,采用非阻塞脂肪树配置,具有一致的 RDMA 性能。H-series features 56 Gb/sec Mellanox FDR InfiniBand in a non-blocking fat tree configuration for consistent RDMA performance. H 系列 VM 支持英特尔 MPI 5.x 和 MS-MPI。H-series VMs support Intel MPI 5.x and MS-MPI.

备注

A8 + A11 VM 计划于 2021 年 3 月 3 日停用。The A8 – A11 VMs are planned for retirement on 3/2021. 有关详细信息,请参阅HPC 迁移指南For more information, see HPC Migration Guide.

支持 RDMA 的实例RDMA-capable instances

大多数 HPC VM 大小(HBv2、HB、HC、H16r、H16mr、A8 和 A9)具有用于远程直接内存访问 (RDMA) 连接的网络接口。Most of the HPC VM sizes (HBv2, HB, HC, H16r, H16mr, A8 and A9) feature a network interface for remote direct memory access (RDMA) connectivity. 选定的 [N 系列]https://docs.microsoft.com/azure/virtual-machines/nc-series) (使用"r"指定的大小(如 NC24rs 配置(NC24rs_v3、NC24rs_v2和 NC24r)也支持 RDMA。Selected [N-series] (https://docs.microsoft.com/azure/virtual-machines/nc-series) sizes designated with 'r' such as the NC24rs configurations (NC24rs_v3, NC24rs_v2 and NC24r) are also RDMA-capable. 此接口是其他 VM 大小中可用的标准 Azure 网络接口的补充。This interface is in addition to the standard Azure network interface available in the other VM sizes.

此接口允许支持 RDMA 的实例通过 InfiniBand (IB) 网络进行通信,HBv2 的 HDR 速率、HB 的 EDR 速率、H16r、H16mr 和 RDMA 的 VDR 速率以及 A8 和 A9 VM 的 QDR 速率。This interface allows the RDMA-capable instances to communicate over an InfiniBand (IB) network, operating at HDR rates for HBv2, EDR rates for HB, HC, FDR rates for H16r, H16mr, and RDMA-capable N-series virtual machines, and QDR rates for A8 and A9 VMs. 这些 RDMA 功能可以提高某些消息传递接口 (MPI) 应用程序的可伸缩性和性能。These RDMA capabilities can boost the scalability and performance of certain Message Passing Interface (MPI) applications. 有关速度的详细信息,请参阅此页面表中的详细信息。For more information on speed, see the details in the tables on this page.

备注

在 Azure HPC 中,有两类 VM,具体取决于它们是否为 InfiniBand 启用了 SR-IOV。In Azure HPC, there are two classes of VMs depending on whether they are SR-IOV enabled for InfiniBand. 目前,适用于英菲尼班德的 SR-IOV 支持 VM 是:HBv2、HB、HC 和 NCv3。Currently, the SR-IOV for InfiniBand enabled VMs are: HBv2, HB, HC and NCv3. 启用 InfiniBand 的其余 VM 未启用 SR-IOV。Rest of the InfiniBand enabled VMs are not SR-IOV enabled. 所有支持 RDMA 的 VM 都支持 IB 的 RDMA。RDMA over IB is supported for all RDMA-capable VMs. 支持 SR-IOV 的 VM 上仅支持 IB 上的 IP。IP over IB is only supported on the SR-IOV enabled VMs.

  • 操作系统- Linux 非常适合 HPC VM,CentOS、RHEL、Ubuntu、SUSE 等发行版很常见。Operating system - Linux is very well supported for HPC VMs, distros such as CentOS, RHEL, Ubuntu, SUSE are common. 关于 Windows 支持,所有 HPC 系列 VM 都支持 Windows Server 2016。Regarding Windows support, Windows Server 2016 is supported on all the HPC series VMs. 支持启用 SR-IOV 的 VM 上还支持 Windows 服务器 2012 R2、Windows Server 2012。Windows Server 2012 R2, Windows Server 2012 are also supported on the non-SR-IOV enabled VMs.

  • MPI - Azure 上启用 SR-IOV 的 VM 大小(HBv2、HB、HC、NCv3)允许几乎任何版本的 MPI 与 Mellanox OFED 一起使用。MPI - The SR-IOV enabled VM sizes on Azure (HBv2, HB, HC, NCv3) allow almost any flavor of MPI to be used with Mellanox OFED. 在启用 SR-IOV 的 VM 上,支持的 MPI 实现使用 Microsoft 网络直接 (ND) 接口在 VM 之间通信。On non-SR-IOV enabled VMs, supported MPI implementations use the Microsoft Network Direct (ND) interface to communicate between VMs. 因此,仅支持 Microsoft MPI (MS-MPI) 2012 R2 或更高版本和英特尔 MPI 5.x 版本。Hence, only Microsoft MPI (MS-MPI) 2012 R2 or later and Intel MPI 5.x versions are supported. 英特尔 MPI 运行时库的更高版本(2017、2018)可能与 Azure RDMA 驱动程序兼容,也可能不兼容。Later versions (2017, 2018) of the Intel MPI runtime library may or may not be compatible with the Azure RDMA drivers.

  • 英菲尼班德<Linux*Windows> VM 扩展- 在支持 RDMA 的 VM 上,添加 InfiniBandDriver<Linux*窗口>扩展,以启用 InfiniBand。InfiniBandDriver<Linux|Windows> VM extension - On RDMA-capable VMs, add the InfiniBandDriver<Linux|Windows> extension to enable InfiniBand. 在 Linux 上,InfiniBandDriverLinux VM 扩展安装 Mellanox OFED 驱动程序(在 SR-IOV VM 上)用于 RDMA 连接。On Linux, the InfiniBandDriverLinux VM extension installs the Mellanox OFED drivers (on SR-IOV VMs) for RDMA connectivity. 在 Windows 上,InfiniBandDriverWindows VM 扩展安装 Windows 网络直接驱动程序(在非 SR-IOV VM 上)或 Mellanox OFED 驱动程序(在 SR-IOV VM 上)用于 RDMA 连接。On Windows, the InfiniBandDriverWindows VM extension installs Windows Network Direct drivers (on non-SR-IOV VMs) or Mellanox OFED drivers (on SR-IOV VMs) for RDMA connectivity. 在某些 A8 和 A9 实例部署中,将自动添加 HpcVmDrivers 扩展。In certain deployments of A8 and A9 instances, the HpcVmDrivers extension is added automatically. 请注意,正在弃用 HpcVm 驱动程序 VM 扩展;它将不会更新。Note that the HpcVmDrivers VM extension is being deprecated; it will not be updated. 若要将 VM 扩展添加到 VM,可以使用 Azure PowerShell cmdlet。To add the VM extension to a VM, you can use Azure PowerShell cmdlets.

    以下命令将最新版本 1.0 InfiniBandDriverWindows 安装在现有的支持 RDMA 的 VM 上安装,该 VM 名为myVM, 部署在美国西部区域名为myResourceGroup的资源组中:The following command installs the latest version 1.0 InfiniBandDriverWindows extension on an existing RDMA-capable VM named myVM deployed in the resource group named myResourceGroup in the West US region:

    Set-AzVMExtension -ResourceGroupName "myResourceGroup" -Location "westus" -VMName "myVM" -ExtensionName "InfiniBandDriverWindows" -Publisher "Microsoft.HpcCompute" -Type "InfiniBandDriverWindows" -TypeHandlerVersion "1.0"
    

    或者,VM 扩展可以包含在 Azure 资源管理器模板中,以便轻松部署,并包含以下 JSON 元素:Alternatively, VM extensions can be included in Azure Resource Manager templates for easy deployment, with the following JSON element:

    "properties":{
    "publisher": "Microsoft.HpcCompute",
    "type": "InfiniBandDriverWindows",
    "typeHandlerVersion": "1.0",
    } 
    

    以下命令在名为myVMSS的现有虚拟机规模集中的所有支持 RDMA 的 VM 上安装最新版本 1.0 InfiniBandDriverWindows 扩展,该组合部署在名为myResourceGroup的资源组中:The following command installs the latest version 1.0 InfiniBandDriverWindows extension on all RDMA-capable VMs in an existing virtual machine scale set named myVMSS deployed in the resource group named myResourceGroup:

    $VMSS = Get-AzVmss -ResourceGroupName "myResourceGroup" -VMScaleSetName "myVMSS"
    Add-AzVmssExtension -VirtualMachineScaleSet $VMSS -Name "InfiniBandDriverWindows" -Publisher "Microsoft.HpcCompute" -Type "InfiniBandDriverWindows" -TypeHandlerVersion "1.0"
    Update-AzVmss -ResourceGroupName "myResourceGroup" -VMScaleSetName "MyVMSS" -VirtualMachineScaleSet $VMSS
    Update-AzVmssInstance -ResourceGroupName "myResourceGroup" -VMScaleSetName "myVMSS" -InstanceId "*"
    

    有关详细信息,请参阅虚拟机扩展和功能For more information, see Virtual machine extensions and features. 还可使用经典部署模型中部署的 VM 扩展。You can also work with extensions for VMs deployed in the classic deployment model.

  • RDMA 网络地址空间 - Azure 中的 RDMA 网络保留地址空间 172.16.0.0/16。RDMA network address space - The RDMA network in Azure reserves the address space 172.16.0.0/16. 若要在 Azure 虚拟网络中部署的实例上运行 MPI 应用程序,请确保虚拟网络地址空间不与 RDMA 网络重叠。To run MPI applications on instances deployed in an Azure virtual network, make sure that the virtual network address space does not overlap the RDMA network.

群集配置选项Cluster configuration options

Azure 提供了多个选项,用于创建可使用 RDMA 网络通信的 Windows HPC VM 的群集,包括:Azure provides several options to create clusters of Windows HPC VMs that can communicate using the RDMA network, including:

  • 虚拟机- 在同一规模集或可用性集中部署支持 RDMA 的 HPC VM(使用 Azure 资源管理器部署模型时)。Virtual machines - Deploy the RDMA-capable HPC VMs in the same scale set or availability set (when you use the Azure Resource Manager deployment model). 如果使用经典部署模型,请在同一云服务中部署 VM。If you use the classic deployment model, deploy the VMs in the same cloud service.

  • 虚拟机缩放集- 在虚拟机规模集 (VMSS) 中,请确保将部署限制为单个放置组。Virtual machine scale sets - In a virtual machine scale set (VMSS), ensure that you limit the deployment to a single placement group. 例如,在资源管理器模板中,将 singlePlacementGroup 属性设置为 trueFor example, in a Resource Manager template, set the singlePlacementGroup property to true. 请注意,默认情况下,可以使用singlePlacementGroup属性true旋转到的最大 VMSS 大小上限为 100 VM。Note that the maximum VMSS size that can be spun up with singlePlacementGroup property to true is capped at 100 VMs by default. 如果您的 HPC 作业规模需求高于单个 VMSS 租户中的 100 个 VM,您可以请求增加,免费打开在线客户支持请求If your HPC job scale needs are higher than 100 VMs in a single VMSS tenant, you may request an increase, open an online customer support request at no charge.

  • 虚拟机之间的 MPI - 如果虚拟机 (VM) 之间需要 RDMA(例如使用 MPI 通信),请确保 VM 位于相同的虚拟机规模集或可用性集中。MPI among virtual machines - If RDMA (e.g. using MPI communication) is required between virtual machines (VMs), ensure that the VMs are in the same virtual machine scale set or availability set.

  • Azure 循环云- 在Azure 循环云中创建 HPC 群集以运行 MPI 作业。Azure CycleCloud - Create an HPC cluster in Azure CycleCloud to run MPI jobs.

  • Azure 批处理- 创建Azure 批处理池以运行 MPI 工作负荷。Azure Batch - Create an Azure Batch pool to run MPI workloads. 若要在 Azure Batch 中运行 MPI 应用程序时使用计算密集型实例,请参阅在 Azure Batch 中使用多实例任务来运行消息传递接口 (MPI) 应用程序To use compute-intensive instances when running MPI applications with Azure Batch, see Use multi-instance tasks to run Message Passing Interface (MPI) applications in Azure Batch.

  • Microsoft HPC 包 - HPC 包包括 MS-MPI 的运行时环境,该环境在部署在支持 RDMA 的 Linux VM 上时使用 Azure RDMA 网络。Microsoft HPC Pack - HPC Pack includes a runtime environment for MS-MPI that uses the Azure RDMA network when deployed on RDMA-capable Linux VMs. 例如部署,请参阅使用HPC Pack 设置 Linux RDMA 群集以运行 MPI 应用程序For example deployments, see Set up a Linux RDMA cluster with HPC Pack to run MPI applications.

部署注意事项Deployment considerations

  • Azure 订阅 - 若要部署不止一些计算密集型实例,请考虑使用即用即付订阅或其他购买选项。Azure subscription – To deploy more than a few compute-intensive instances, consider a pay-as-you-go subscription or other purchase options. 如果使用的是 Azure 免费帐户,则仅可以使用有限数量的 Azure 计算核心。If you're using an Azure free account, you can use only a limited number of Azure compute cores.

  • 定价和可用性 - 只在标准定价层提供 VM 大小。Pricing and availability - These VM sizes are offered only in the Standard pricing tier. 有关各 Azure 区域推出的产品,请查看 Products available by region(按区域提供的产品)。Check Products available by region for availability in Azure regions.

  • 核心配额 - 可能需要在 Azure 订阅中在默认值的基础上增加核心配额。Cores quota – You might need to increase the cores quota in your Azure subscription from the default value. 订阅可能也会限制可在特定 VM 大小系列(包括 H 系列)中部署的核心数目。Your subscription might also limit the number of cores you can deploy in certain VM size families, including the H-series. 要请求增加配额,请免费打开在线客户支持请求To request a quota increase, open an online customer support request at no charge. (默认限制可能会因订阅类别而异。)(Default limits may vary depending on your subscription category.)

    备注

    如果有大规模容量需求,请联系 Azure 支持。Contact Azure Support if you have large-scale capacity needs. Azure 配额为信用额度,而不是容量保障。Azure quotas are credit limits, not capacity guarantees. 不管配额是什么,都只根据所用的核心数计费。Regardless of your quota, you are only charged for cores that you use.

  • 虚拟网络 – Azure 虚拟网络不需要使用计算密集型实例。Virtual network – An Azure virtual network is not required to use the compute-intensive instances. 但是,对于许多部署来说,如果需要访问本地资源,则可能至少需要一个基于云的 Azure 虚拟网络或站点到站点连接。However, for many deployments you need at least a cloud-based Azure virtual network, or a site-to-site connection if you need to access on-premises resources. 需要时,请创建一个新的虚拟网络来部署实例。When needed, create a new virtual network to deploy the instances. 不支持将计算密集型 VM 添加到地缘组中的虚拟网络。Adding compute-intensive VMs to a virtual network in an affinity group is not supported.

  • 调整大小 - 考虑到专用硬件,可以只对同一大小系列(H 系列或计算密集型 A 系列)内的计算密集型实例进行大小调整。Resizing – Because of their specialized hardware, you can only resize compute-intensive instances within the same size family (H-series or compute-intensive A-series). 例如,可仅将 H 系列 VM 的大小从一个 H 系列大小调整为另一个。For example, you can only resize an H-series VM from one H-series size to another. 此外,不支持从非计算密集型大小调整为计算密集型大小。In addition, resizing from a non-compute-intensive size to a compute-intensive size is not supported.

其他大小Other sizes

后续步骤Next steps