question

LimZiLian-7585 avatar image
0 Votes"
LimZiLian-7585 asked PierreLouis-3997 commented

Cannot use GPU on Azure Notebooks in Azure Machine Learning Studio

Hey All,

I am new to Azure Machine Learning Studio and am currently trying to train some models on a GPU compute instance in on Azure Machine Learning Studio. The compute instance that I am using is Standard_NC6.

The problem I am currently facing is that even though I can successfully train my models, I realize that Tensorflow is using the CPU instead of the GPU when I run

 device_name = tensorflow.test.gpu_device_name()
 if device_name != '/device:GPU:0':
   raise SystemError('GPU device not found')
 print('Found GPU at: {}'.format(device_name))
 print("Num GPUs Available: ", len(tensorflow.config.list_physical_devices('GPU')))

which raises the system error. Am I doing something wrong in the setup, my code is literally the same from when I was training on Google Colab and can successfully train on a Tesla K80 there but it is somehow not working within the Azure Notebook.

Appreciate any help given!

azure-machine-learningazure-machine-learning-studio-classic
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

romungi-MSFT avatar image
1 Vote"
romungi-MSFT answered PierreLouis-3997 commented

@LimZiLian-7585 I have noticed a similar issue earlier but it was observed on a DSVM machine Jupyter installation rather than a studio notebook. But, since the compute for these machines might be similar I suspect the package for tensorflow gpu might need an upgrade. Could you please check the installed version of tensorflow and upgrade it to 2.5.0 from your notebook cell and then check again after a kernel restart?

 !pip install --upgrade tensorflow-gpu 



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @romungi-MSFT ,

Thanks a lot for this tip! I was struggling to find to correct way to have TF use GPU on an instance created with Azure ML Studio and it is working now!

The only concern I have is that, before, the "tensorflow-gpu" upgrade, TF was listing the GPU as an available device (even if not using it ^^):

 tf.config.list_physical_devices('GPU')

Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


Now that the upgrade is done, the GPU is not listed anymore... even if the epoch time is divided by 10 times (which makes me believe that the script is actually using the GPU).

In the same way, nvidia-smi does not show a GPU activity when the script is running...


If you have any thoughts about it, I would be curious to know what the corresponding root causes might be :-)


Before "tensorflow-gpu" upgrade:

114166-before-tensorflow-gpu-upgrade.png


After "tensorflow-gpu" upgrade (from 2.3 to 2.5):

114160-after-tensorflow-gpu-upgrade.png




1 Vote 1 ·