question

AkahoshiMasaya-7804 avatar image
1 Vote"
AkahoshiMasaya-7804 asked GiftA-MSFT edited

Azure Machine Learning - Uses invalid Pytorch version when training

Hi, I am training my models via Azure Machine Learning.

On other day, my training is running with GPU support, however today I found my training is running on a CPU.
I'm not modified training environment, only training script was modified.
My computing cluster is NC6v3 - have a GPU.

I investigate a situation, and I found training script is running on PyTorch 1.6.0.
On other day, it ran on Pytorch 1.8.1.
I think my "don't use GPU" problem is caused by the situation that CUDA toolkit version is not suitable for Pytorch version.

Then, I output a installed package to the log.
The log says 'Pytorch 1.8.1 was installed, however uses 1.6.0'.
I confused by this weird circumstances.
Can someone tell me the solution?

<My code snippet>
<<conda_dependencies.yaml>>

channels:
- conda-forge
- pytorch
- nvidia
dependencies:
- python=3.8.10
- mesa-libgl-cos6-x86_64
- cudatoolkit=11.1
- pytorch==1.8.1
- torchvision==0.9.1
- tqdm
- scikit-learn
- matplotlib
- pandas
- pip < 20.3
- pip:
- azureml-defaults
- opencv-python-headless
- pillow==8.2.0

<<Environment definition>>
environment_definition_file = experiment_dir / 'conda_dependencies.yaml'
environment_name = 'pytorch-1.8.1-gpu'
base_image_name = 'mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04'
environment = Environment.from_docker_image(environment_name, base_image_name, conda_specification = environment_definition_file)
docker_run_config = DockerConfiguration(use_docker=True)

script_run_config = ScriptRunConfig(
source_directory = experiment_dir,
script = SCRIPT_FILE_NAME,
arguments = arguments,
compute_target = compute_target,
docker_runtime_config = docker_run_config,
environment = environment)

<<Output a log in the training script>>
import torch
import pip

pip.main(['list'])
print(f'PyTorch version: {torch.version}')

<My logs>
Package Version


adal 1.2.7
applicationinsights 0.11.10
(omission)
torch 1.8.1
torchvision 0.9.0a0
(omission)

PyTorch version: 1.6.0

azure-machine-learning
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GiftA-MSFT avatar image
0 Votes"
GiftA-MSFT answered GiftA-MSFT edited

Hi, thanks for reaching out. These are the supported versions for PyTorch. Please refer to this document for creating a custom environment. As shown, you'll need to use versions <= 1.6.0. Hope this helps.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AkahoshiMasaya-7804 avatar image
0 Votes"
AkahoshiMasaya-7804 answered

Hi, GiftA-MSFT

Thank you for your reply.
I understand that AML supports Pytorch <= 1.6.0.

I hope AML supports Pytorch 1.8.x at early days.


Sincerely,

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.