Cloud computing guide for researchers – Real HPC with linear scaling on thousands of cores

Article
08/01/2017

If you need to run lots of computations, either as a large supercomputer simulation, or hundreds of smaller runs for a parameter sweep calculation, then cloud computing can really help speed up your research. Microsoft Azure Big Compute can scale to thousands of compute cores using Message Passing Interface (MPI) due to our HPC-optimised, InfiniBand-enabled virtual machines (VMs). The availability of essentially limitless compute capabilities, on-demand, means that you can achieve more, in less time. No need to queue for your job to run, have your job priority lowered because you've run too many jobs, or compete with other users on a shared system. You also have full flexibility to install any software you like, without having to wait for someone else to do it for you. You can do real HPC in the public cloud today.

Real HPC in the cloud

High performance computing (HPC) requires low-latency, high bandwidth dedicated networking between compute nodes, and Microsoft Azure has this. Our Azure A-Series (A8 & A9) and H-Series (H16) VMs, with InfiniBand, achieve near bare-metal HPC performance and can run LINPACK at over 90% efficiency. You can even run parallel GPU (Graphic Processing Unit) nodes with our N-Series, which also have low-latency, high-speed networking, e.g. NC24r VMs have 24 CPU cores, 4 x K80 GPUs, 224GB RAM and InfiniBand.It is easy to run tightly-coupled distributed software for applications fluid dynamics, molecular modelling, climate research that need specialized hardware to run efficiently.

Unlike other public clouds, scaling does not stop at a couple of hundred cores, but can scale well to thousands of cores on real-world applications.

We have tested many HPC applications such as OpenFOAM, Fluent, STAR-CCM+, NAMD, PAM-CRASH and other highly-parallel, MPI-based codes can scale well to over a thousand cores. For example, you can see here how Fluent computational fluid dynamics software performs on up to 1024 compute cores. This test case is what Formula One teams were routinely running just a few years ago, in their dedicated CFD facilities, and is now possible for anyone via Microsoft Azure from their laptop. It shows how scientists and engineers can have the same HPC capability from anywhere in the world. It opens a world of possibilities for those who do not have easy access to HPC facilities. It is equally empowering for researchers, universities and HPC Centres, as it provides extra flexibility, capability, and agility to complement existing HPC investments. A hybrid cloud HPC model enables service providers to give their users the best possible experience, and focus energy on accelerating time to insight and scientific impact.

Microsoft Azure Big Compute is flexible in how you want to use it for your HPC applications. How you do this will depend on your individual and organisational situation, experience, and what you are trying to achieve.

HPC cluster in the cloud

Setup your own HPC cluster in Azure as your own personal supercomputer, with whatever software you like on it. Having full administrator access to your cluster gives you full flexibility and control over your own research software stack. There is no need to wait for someone else to install your software and libraries before you can start computing. It is possible to setup different clusters for different users, with custom configurations of CPU, memory, network, and software. You can think differently about how to get research done, scaling the cluster, creating new ones, and tearing them down when you don't need them. You can see how to setup a SLURM cluster here.

Research applications in the cloud

Burst your research application into the cloud. You can easily enable a desktop application, command-line tool, or web-application/service to run at scale without leaving your familiar environment. Azure Batch manages the compute resources in the cloud, so you can transparently run your application from whichever environment is most comfortable for you. This transforms how you can do your research, extending your desktop to literally thousands of machines in the cloud. You can even use Docker containers with Azure Batch, with our open-source Batch Shipyard toolkit here, and a detailed walkthrough here.

Researchers at the University of Newcastle, UK, used Azure Batch to run a parameter study for high-resolution flood modelling across 571 cities in Europe. By running across 40 x A11 Azure VMs they were able to run all of their simulations in two days, instead of two and a half months. Read the details here.

Nicola Bonzanni at ENPICOM uses Azure to build Conbind, phylogenetic fingerprinting Research-as-a-Service for genomics researcher worldwide. Read more about it here.

HPC burst to the cloud

Augment your on-premise HPC cluster with the cloud, to provide additional capacity when you need it. It's a great way to accelerate research even more by providing on-demand extra resources. It is also a way you can move users who may not need specialised on-premise HPC hardware, such as for thousands of single core jobs. This can help put the right users on the right hardware, maximising investment, job throughput, and user satisfaction. You can use Microsoft HPC Pack, or many of the most commonly used HPC cluster tools, such as SLURM, Cycle Computing, Bright Computing, and Altair PBS Pro.

Learn more

You can find out more on our dedicated Azure Big Compute web pages:

Azure Batch and HPC documentation explains different scenarios in detail.
Batch Shipyard is a tool to help provision and execute batch processing and HPC Docker workloads on Azure Batch compute pools. No experience with the Azure Batch SDK is needed; run your Dockerized tasks with easy-to-understand configuration files! There is a detailed hands-on walkthrough for this on Github.
doAzureParallel – a lightweight R package built on top of Azure Batch, that allows you to easily use Azure's flexible compute resources right from your R session.
Azure Container Service makes it easy to orchestrate your work using DC/OS, Docker Swarm, or Kubernetes. There is a detailed hands-on walkthrough for this on Github

You can now use low-priority VMs with Azure Batch for non-time critical jobs, to significantly reduce your costs by up to 80%. Learn more about it here.

There are more general Azure getting started videos at https://azure.microsoft.com/en-us/get-started/
and a full set of Azure for Research self-pace walkthroughs at https://aka.ms/a4rgithub

We are publishing more in this blog series on more advanced topics for researchers to take advantage of Azure. So stay posted in our cloud computing guide for researchers.

Need access to Microsoft Azure?

There are several ways you can get access to Microsoft Azure for your research. Your university may already make Azure available to you, so first port of call is to speak to your research computing department. There are also other ways for you to start experimenting with the cloud:

Sign-up to a one month free trial here
Apply for an Azure for Research award. Microsoft Azure for Research awards offer large allocations of cloud computing for your research project, and already supports hundreds of researchers worldwide across all domains.

There are several free Azure services for you to also explore: