High-performance computing on InfiniBand enabled H-series and N-series VMs

Azure's InfiniBand enabled H-series and N-series VMs are designed to deliver leadership-class performance, Message Passing Interface (MPI) scalability, and cost efficiency for a variety of real-world HPC and AI workloads. These high-performance computing (HPC) optimized VMs are used to solve some of the most computationally intensive problems in science and engineering such as: fluid dynamics, earth modeling, weather simulations, etc.

These articles describe how to get started on the InfiniBand-enabled H-series and N-series VMs on Azure as well as optimal configuration of the HPC and AI workloads on the VMs for scalability.

Features and capabilities

The InfiniBand enabled H-series and N-series VMs are designed to provide the best HPC performance, MPI scalability, and cost efficiency for HPC workloads. See H-series and N-series VMs to learn more about the features and capabilities of the VMs.

RDMA and InfiniBand

RDMA capable H-series and N-series VMs communicate over the low latency and high bandwidth InfiniBand network. The RDMA capability over such an interconnect is critical to boost the scalability and performance of distributed-node HPC and AI workloads. The InfiniBand enabled H-series and N-series VMs are connected in a non-blocking fat tree with a low-diameter design for optimized and consistent RDMA performance. See Enable InfiniBand to learn more about setting up InfiniBand on the InfiniBand enabled VMs.

Message passing interface

The SR-IOV enabled H-series and N-series support almost all MPI libraries and versions. Some of the most common, supported MPI libraries are: Intel MPI, OpenMPI, MPICH, MVAPICH2, Platform MPI, and all remote direct memory access (RDMA) verbs. See Set up MPI to learn more about installing various supported MPI libraries and their optimal configuration.

Get started

The first step is to select the H-series and N-series VM type optimal for the workload based on the VM specifications and RDMA capability. Second, configure the VM by enabling InfiniBand. There are various methods to doing this including using optimized VM images with drivers baked-in; see Optimization for Linux and Enable InfiniBand for details. Third, for distributed node workloads, choosing and configuring MPI is critical. See Set up MPI for details. Fourth, for performance and scalability, optimally configure the workloads by following guidance specific to the VM family, such as for HB-series overview and HC-series overview.

Next steps