HPC, Batch, and Big Compute solutions using Azure VMs
Organizations have large-scale computing needs. These Big Compute workloads include engineering design and analysis, financial risk calculations, image rendering, complex modeling, Monte Carlo simulations, and more.
Use the Azure cloud to efficiently run compute-intensive Linux and Windows workloads, from parallel batch jobs to traditional HPC simulations. Run your HPC and batch workloads on Azure infrastructure, with your choice of compute services, grid managers, Marketplace solutions, and vendor-hosted (SaaS) applications. Azure provides flexible solutions to distribute work and scale to thousands of VMs or cores and then scale down when you need fewer resources.
- Do-it-yourself solutions
- Set up your own cluster environment in Azure virtual machines or virtual machine scale sets.
- Lift and shift an on-premises cluster, or deploy a new cluster in Azure for additional capacity.
- Use Azure Resource Manager templates to deploy leading workload managers, infrastructure, and applications.
- Choose HPC and GPU VM sizes that include specialized hardware and network connections for MPI or GPU workloads.
- Add high performance storage for I/O-intensive workloads.
- Hybrid solutions
- Big Compute solutions as a service
- Marketplace solutions
The following sections provide more information about the supporting technologies and links to guidance.
Visit the Azure Marketplace for Linux and Windows VM images and solutions designed for HPC. Examples include:
- RogueWave CentOS-based HPC
- SUSE Linux Enterprise Server for HPC
- TIBCO Grid Server Engine
- Azure Data Science VM for Windows and Linux
- Intel Cloud Edition for Lustre
Run custom or commercial HPC applications in Azure. Several examples in this section are benchmarked to scale efficiently with additional VMs or compute cores. Visit the Azure Marketplace for ready-to-deploy solutions.
Check with the vendor of any commercial application for licensing or other restrictions for running in the cloud. Not all vendors offer pay-as-you-go licensing. You might need a licensing server in the cloud for your solution, or connect to an on-premises license server.
Graphics and rendering
- Autodesk Maya, 3ds Max, and Arnold on Azure Batch
AI and deep learning
- Batch AI training for deep learning models
- Microsoft Cognitive Toolkit
- Deep Learning VM
- Batch Shipyard recipes for deep learning
HPC and GPU VM sizes
Azure offers a range of sizes for Linux and Windows VMs, including sizes designed for compute-intensive workloads. For example, H16r and H16mr VMs can connect to a high throughput back-end RDMA network. This cloud network can improve the performance of tightly coupled parallel applications running under Microsoft MPI or Intel MPI.
N-series VMs feature NVIDIA GPUs designed for compute-intensive or graphics-intensive applications including artificial intelligence (AI) learning and visualization.
- High performance compute sizes for Linux and Windows VMs
- GPU-enabled sizes for Linux and Windows VMs
Learn how to:
- Set up a Linux RDMA cluster to run MPI applications
- Set up a Windows RDMA cluster with Microsoft HPC Pack to run MPI applications
- Use compute-intensive VMs in Batch pools
Batch is a platform service for running large-scale parallel and high-performance computing (HPC) applications efficiently in the cloud. Azure Batch schedules compute-intensive work to run on a managed pool of virtual machines, and can automatically scale compute resources to meet the needs of your jobs.
SaaS providers or developers can use the Batch SDKs and tools to integrate HPC applications or container workloads with Azure, stage data to Azure, and build job execution pipelines.
Learn how to:
- Get started developing with Batch
- Use Azure Batch code samples
- Use low-priority VMs with Batch
- Run containerized HPC workloads with Batch Shipyard
- Run parallel R workloads on Batch
- Run on-demand Spark jobs on Batch
The following are examples of cluster and workload managers that can run in Azure infrastructure. Create stand-alone clusters in Azure VMs or burst to Azure VMs from an on-premises cluster.
- Alces Flight Compute
- TIBCO DataSynapse GridServer
- Bright Cluster Manager
- IBM Spectrum Symphony and Symphony LSF
- PBS Pro
- Microsoft HPC Pack - see options to run in Windows and Linux VMs
Large-scale Batch and HPC workloads have demands for data storage and access that exceed the capabilities of traditional cloud file systems. Implement parallel file system solutions in Azure such as Lustre and BeeGFS.
- Parallel virtual file systems on Azure
- High performance cloud storage solutions from Avere (now joined with Microsoft)
Related Azure services
Azure virtual machines, virtual machine scale sets, Batch, and related compute services are the foundation of most Azure HPC solutions. However, your solution can take advantage of many related Azure services. Here is a partial list:
Data and analytics
AI and machine learning
Examples of customers that have solved business problems with Azure HPC solutions:
- AXA Global P&C
- Hymans Robertson
- Microsoft Research
- Mitsubishi UFJ Securities International
- Towers Watson
- Learn more about Big Compute solutions for engineering simulation, rendering, banking and capital markets, and genomics.