question

RunshengGuo-9494 avatar image
0 Votes"
RunshengGuo-9494 asked RunshengGuo-9494 commented

ACI Networking Questions

Hi I had a couple questions about networking in ACI:

  1. What kind of bandwidth can I expect In ACI? I ran a speed test and got roughly 1300/1100 Mbps for download and upload seed for a CPU instance and 900/500 Mbps for a GPU (K80) instance.

  2. Is this inline with what I should be expecting, and if so, why is the bandwidth lower for the GPU instance? Is there any way to increase the bandwidth?

  3. For GPU instances, is using Mellanox Infiniband and NVLink supported for inter and intra container group communication?


Thanks!





azure-container-instances
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

vipullag-MSFT avatar image
1 Vote"
vipullag-MSFT answered RunshengGuo-9494 edited

@RunshengGuo-9494

The network throughput of the ACI depends on the network throughput of the node VM on which it is scheduled.
However, since the underlying infrastructure is abstracted (for non GPU enabled Container groups) the exact sku of VM should be considered as non-deterministic. But ACI runs on sets of Azure VMs of various SKUs, primarily from the F and the D series. We expect this to change in the future as we continue to develop and optimize the service. Refer this FAQ document for more details.

For GPU enabled container groups, from this document, this will help narrow down the network performance based on the VM SKU.

N-series VMs communicate over the low latency and high bandwidth InfiniBand network, please refer this.

NVlink Interconnect is currently not supported, please refer below links:
https://docs.microsoft.com/en-us/azure/virtual-machines/nc-series
https://docs.microsoft.com/en-us/azure/virtual-machines/ncv2-series
https://docs.microsoft.com/en-us/azure/virtual-machines/ncv3-series

Hope this helps.

Please 'Accept as answer' if the provided information is helpful, so that it can help others in the community looking for help on similar topics.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the detailed response @vipullag-MSFT !

I had a follow up question. So I decided I would like to make use of the Infiniband network on the N-series (specifically for the K80 nc-series).

In particular I'd like to enabled IP over Infiniband on a container running Ubuntu 16.04. I was able to install the Mellanox OFED driver, but I was not able to find how to enable IP over IB. (The instructions https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/hpc/enable-infiniband#vm-images-with-infiniband-drivers are for RHEL, and I'm not sure if the container would have the permissions to execute 'systemctl restart waagent').

Is there any pages/examples you could point me to enable IP over IB (or setting up infiniband on ACI)?

Thanks

0 Votes 0 ·
vipullag-MSFT avatar image
0 Votes"
vipullag-MSFT answered RunshengGuo-9494 commented

On InfiniBand (IB) enabled VMs, the appropriate drivers are required to enable RDMA.

  • The CentOS-HPC VM images in the Marketplace come pre-configured with the appropriate IB drivers.

  • The Ubuntu-HPC VM images in the Marketplace come pre-configured with the appropriate IB drivers and GPU drivers.

These VM images (VMI) are based on the base CentOS and Ubuntu marketplace VM images. Scripts used in the creation of these VM images from their base CentOS Marketplace image are on the azhpc-images repo.

On GPU enabled N-series VMs, the appropriate GPU drivers are additionally required. This can be available by the following methods:
- Use the Ubuntu-HPC VM images which come pre-configured with the Nvidia GPU drivers and GPU compute software stack (CUDA, NCCL).
- Add the GPU drivers through the VM extensions
- Install the GPU drivers manually.
- Some other VM images on the Marketplace also come pre-installed with the Nvidia GPU drivers, including some VM images from Nvidia.

However, since the container group is an isolated group of processes it will not have permissions on other Guest OS process like waagent. As Azure Container Instance is a Container as a Service offering the guest OS of the underlying Virtual Machine is abstracted and the restart of the Windows Azure Agent process cannot be operated from the guest OS level either. One suggestion would be running docker containers on Azure GPU enabled VMs. Enabling IPoIB should be as mentioned here and the container can be created as detailed here with necessary use case changes to the base image Dockerfile.

Hope this helps.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks again for the informative answer, @vipullag-MSFT .

So from what I understand from your answer, if I want to use Infiniband, I would have to use Azure VMs directly. There is no way to do so via ACI. Or is this only if I want to use IP over Infiniband specifically?

Also, do you anticipate there to be support for infiniband on ACI in the future?

0 Votes 0 ·

@RunshengGuo-9494

Yes, as of today there is no way to implement IP over Infiniband in ACI.

Requesting you to raise a feature request here, product teams will check and add comments and prioritize accordingly.


0 Votes 0 ·

Thanks @vipullag-MSFT ,

I will look into raising a feature request.

One last follow up on this question:
Although IP over Infiniband is not currently possible on ACI, it should still be possible to use Infiniband directly after manually installing the OFED drivers right? It does not look like you need waagent or restarting for just driver installation.

Thanks in advance.

0 Votes 0 ·
Show more comments