Is it possible (sensible?) to run Docker containers on Azure CycleCloud using Slurm?

Gary Mansell 0 Reputation points
2024-04-15T15:53:14.18+00:00

I have been successfully running Azure CycleCloud & Slurm scheduler for running our HPC (CFD & CAE) Analysis Solving jobs from a /shared/apps loadpoint in a regular manner.

I demo'd our HPC Solving capabilities to our Climate modelling team and they are super interested in moving to an Azure CycleCloud and Slurm environment too, if it were possible, as they can scale in/out with workload and run huge jobs in parallel, rather than take weeks on their current physical kit.

But, they run their simulations inside Docker containers - So, is it possible (or is it even sensible?) to run Docker containers in an Azure CycleCloud / Slurm environment? Can anyone advise me on the pros/cons of this?

If it is indeed possible and not non-sensical - perhaps someone could share some information on what tools/techniques that I might need and how to do it?

Thanks

Gary

Azure CycleCloud
Azure CycleCloud
A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.
60 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 24,711 Reputation points
    2024-04-15T17:46:57.36+00:00

    Hello Gary Mansell

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    I just checked with internal team on this ask. Yes, it is possible running some distributed workload through docker containers on ndv5 vmss using slurm+ containers.

    Mostly with the pyxis and enroot tool from NVIDIA. Cyclecloud has support for those plugins and you can test with some simple mpi collectives from within a container across nodes.

    Ref: https://github.com/NVIDIA/pyxis
    Also, refer tto this Cyclecloud pyxis+enroot project https://github.com/Azure/azurehpc/tree/master/experimental/cc_slurm_pyxis_enroot

    Hope this helps.