Enabling Deep Learning with Azure Container for PyTorch in Azure Machine Learning
Published Oct 12 2022 09:00 AM 13.1K Views
Microsoft

Overview 

With AzureML being the platform of choice for many PyTorch developers, we have developed the new Azure Container for PyTorch (ACPT), a curated environment to include the best of Microsoft technologies for training with PyTorch on Azure. We are excited to announce the Public Preview of ACPT within Azure Machine Learning (AzureML). This new curated environment is a lightweight, standalone environment that includes needed components to effectively run optimized training for large models on AzureML. The AzureML curated environments are available in the user’s workspace by default and are backed by cached Docker images that use the latest version of the AzureML SDK. It helps with reducing preparation costs and faster deployment time.  

 

The ACPT curated environment expands on our existing PyTorch curated environment by including the latest PyTorch version, tested and validated against dozens of production models to ensure high quality while also providing various Microsoft technologies for training and optimization, such as ONNX Runtime and DeepSpeed. All components are already installed and validated to reduce setup costs and accelerate training time for customers.  

 

ACPT_CE.png

 

ACPT curated environment properties and description.

Benefits 

 Benefits of using the ACPT curated environment include: 

  • Optimized Training framework to set up, develop, accelerate PyTorch model on large workloads. 
  • Up-to-date stack with the latest compatible versions of Ubuntu, Python, PyTorch, CUDA\RocM, etc.   
  • Ease of use: All components installed and validated against dozens of Microsoft workloads to reduce setup costs and accelerate time to value  
  • Latest Training Optimization Technologies: Onnx / Onnx Runtime / Onnx Runtime Training, ORT MoE, DeepSpeed,  MSCCL, and others.. 
  • Integration with Azure ML: Track your PyTorch experiments on ML Studio or using the AML SDK  
  • As-IS use with pre-installed packages or build on top of the curated environment  
  • The image is also available as a DSVM 
  • Azure Customer Support 

Metrics and Data 

ACPT curated environment allows our customers to efficiently train PyTorch models. The optimization libraries like ONNX RunTime and DeepSpeed composed within the container can increase production speed up from 54% to 163% over regular PyTorch workloads as seen on various HuggingFace Models.  

 

jessicacioffi_3-1665528539845.png

Metrics for HuggingFace Models

 

ACPT reduces compute costs by performing the same training jobs in significantly less time. Training runs are also easily trackable as the curated environment is integrated with AzureML tools such as the Azure Machine Learning Studio and the AzureML SDK. 

 

Here is an example of an NLP products review finetuning training run, taking approximately two hours to complete. 

 

jessicacioffi_4-1665528608604.png

 

Using the ACPT curated environment, we observe a decrease in the overall training time to just over 45 minutes on a Finetune HuggingFace Classifier using DeepSpeed. This is a 62% training time reduction on this model. 

 

jessicacioffi_5-1665528648900.png

 

 

Quotes from Customers  

 ACPT images help us successfully run large scale jobs on over 100 GPUs. It provides integration of different acceleration packages, such as OFED and APEX. More importantly, it alleviates our engineering efforts on building the images. It enables us to focus more on the research work. 

- Project Florence, a Microsoft AI Cognitive Services initiative 

 

“ACPT environment helps us in running our model with Distributed Data-Parallel (DDP) implementation. We were able to successfully run the model with multi nodes and get the results effectively and efficiently. When using an environment other than ACPT, we were not able to achieve that. “- Orlando Ribas Fernandes, Co-Founder and CEO, Fashable 

Learn more about the use case of Fashable here. You can also try Fashable's solution and get the limited edition of AI T-shirts on their website

 

Get Started Today 

Set up a curated environment with Azure Container for PyTorch 

Start Azure Machine Learing for free 

Learn more about PyTorch on Azure 

 

Also, watch Microsoft Ignite 2022 to get familiar more with Azure Container for PyTorch and other Azure Mahine Learning announcements.  

For additional details see:  Curated environments - Azure Machine Learning | Microsoft Learn 

 

 

Version history
Last update:
‎Nov 10 2022 11:19 AM
Updated by: