question

Suchi-2434 avatar image
1 Vote"
Suchi-2434 asked VijayP-2887 commented

NC6 DSVM Ubuntu 18.04 Gen 1 - NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

I created a NC6 GPU with Ubuntu DSVM Gen 1 image. When I try nvidia-smi from ssh terminal I get the error

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

This used to work fine even few weeks ago. I am noticing this strange behavior only recently. I have tried to create new VMs many times but this issue still exists every time. Can you please help? Has anything changed?

azure-data-science-vm
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Suchi-2434 avatar image
1 Vote"
Suchi-2434 answered ramr-msft commented

I found the solution and I have wasted more than a day on this. I really wish Azure team can provide correct versions bundled with DSVMs

The NVIDIA library that is bundled along with Ubuntu 18.04 Gen 1 DSVM is Nvidia-495. This is not supported by Ubuntu 18.04. I had to try various installs, refreshes, network settings etc. to arrive at this junction. Finally I found in syslog that Ubuntu 18.04 was ignoring library 495 and hence GPU is not loaded.

Then I had to do a lot of circus to remove 495 in a clean manner and install 470 which is supported by Ubuntu 18.04 and then it worked.

Meanwhile, the DSVM page in Microsoft also mentions that K80 (NC6) machines are loaded with 470 drivers. However in practice I found that it comes with 495 which was the root cause of this issue.

Can someone from Microsoft Azure team, please update your image for DSVM such that it comes preloaded with NVIDIA driver 470 instead of 495?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Suchi-2434 Thanks for the details. We have forwarded to product team to update on this.

0 Votes 0 ·
VijayP-2887 avatar image
1 Vote"
VijayP-2887 answered VijayP-2887 commented

I can confirm I have the same problem with an N6 DSVM. I have Ubuntu 20.04 and Nvidia 495 drivers - which are supported and installed by default. nvidia-smi fails on my machine too. Any solution or workaround would be helpful.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi - I completely installed the new driver from shell after logging into the VM

Purge old version
$ sudo apt purge 'nvidia.*'

Add repository to get drivers
$sudo add-apt-repository ppa:graphics-drivers/ppa

Update
$ sudo apt update

Install 470 driver
$ sudo apt install nvidia-driver-470

reboot
$ sudo reboot

1 Vote 1 ·

Thank you Suchi. Yes, 470 works with 20.04 and K80.

0 Votes 0 ·