Setup GPU specific packages on Linux

This section outlines the packages you need to setup in order for CNTK to leverage NVIDIA GPUs.

Checking your GPU compatibility and getting the latest driver

You need a CUDA-compatible graphic card available to use CNTK GPU capabilities. You can check whether your card is CUDA-compatible here and here (for older cards). Your GPU card Compute Capability (CC) must be 3.0 or more.

If you have the required card, install the latest driver:

  • Select your card and download the driver pack from here (usually available as .run file)

  • If your X Windows manager is running, the driver installation will likely fail. Open a remote terminal session to your machine and stop the X Windows manager. Refer to your platform documentation for the exact commands.

Example: for Ubuntu, use the following command (assuming lightdm is your X Windows manager):

sudo stop lightdm
  • Install the driver as in the example below (note that the file name may be different for your system):
sudo chmod +x ./NVIDIA-Linux-x86_64-384.111.run
sudo ./NVIDIA-Linux-x86_64-384.111.run

We recommend accepting the default installation options

Note that the driver installation program may complain about nouveau kernel driver. Refer to your platform documentation for instructions for disabling it. For Ubuntu you may use this set of instructions.

  • If stopped during the steps above, start X Windows manager. Refer to your platform documentation for the exact commands.

Example: for Ubuntu, use the following command (in case of lightdm as your X Windows manager):

sudo start lightdm

CUDA 9

The procedure below explains how to install CUDA using .run file distribution. You can also use DEB or RPM packages installation. You will find the package for your system at NVIDIA CUDA 9.0 Download page and installation instructions in CUDA Online Documentation.

Download and install the NVIDIA CUDA 9.0 Toolkit:

  • Find the .run file for your platform here and download it.

  • If your X Windows manager is running, the installation will likely fail. Open a remote terminal session to your machine and stop the X Windows manager. Refer to your platform documentation for the exact commands.

Example: for Ubuntu use the following command (in case of lightdm as your X Windows manager):

sudo stop lightdm
  • Install the CUDA 9.0 Toolkit (note that the .run file name may be different for your system):
chmod +x ./cuda_9.0.176_384.81_linux.run
sudo ./cuda_9.0.176_384.81_linux.run

When prompted by the installer:

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit:

Select no if you have already installed the latest driver by performing the steps in the previous section. If you have not done it, select yes, but we strongly recommend updating to the latest driver after installing CUDA toolkit.

If you declined the driver installation from the CUDA 9.0 package, you will get the following warning at the end of the installation:

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Ignore this warning.

If stopped during the steps above, start X Windows manager. Refer to your platform documentation for exact commands.

Example: for Ubuntu use the following command (in case of lightdm as your X Windows manager):

sudo start lightdm

Add the following environment variable to your current session and your .bashrc profile (if you modified the default paths during the CUDA installation, change the values below accordingly):

export PATH=/usr/local/cuda-9.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH

The next step is optional. You may skip further to the next section.

OPTIONAL. Verifying CUDA 9.0 installation

You may verify your CUDA installation by compiling the CUDA samples (we assume the default paths were used during the CUDA installation). Note that building all samples is a lengthy operation:

cd ~/NVIDIA_CUDA-9.0_Samples/
make

After the successful build invoke deviceQuery utility:

~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery/deviceQuery

If everything works well, you should get an output similar to the one below:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 2025 MBytes (2123235328 bytes)
  ( 8) Multiprocessors, (128) CUDA Cores/MP:     1024 CUDA Cores
  GPU Max Clock rate:                            1253 MHz (1.25 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1, Device0 = GeForce GTX 960
Result = PASS

GPU Deployment Kit

Starting from CUDA version 8, the GPU Deployment Kit is a part of the CUDA package and is no longer required to be installed separately.

cuDNN

Install NVIDIA CUDA Deep Neural Network library (cuDNN).

Important

If you previously installed cuDNN for an older version make sure that you upgrade to the CUDA 9.0 compatible version

Important

Install cuDNN using the exact version and target path as specified below. This is necessary because it is expected by the CNTK build configuration program.

  • Use the following commands:
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb

OPTIONAL. NCCL

NVIDIA's NCCL library provides optimized primitives for collective multi-GPU communication on Linux. CNTK can take advantage of these accelerated primitives for parallel jobs running on a single host (cf. here for an introduction into parallel training with CNTK).

Please follow instructions here to download the NVIDIA NCCL library.

wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl-dev_2.1.2-1+cuda9.0_amd64.deb
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libnccl2_2.1.2-1+cuda9.0_amd64.deb
sudo dpkg -i libnccl2_2.1.2-1+cuda9.0_amd64.deb libnccl-dev_2.1.2-1+cuda9.0_amd64.deb

Then, use the CNTK configure option --with-nccl=<path> to enable building with NVIDIA NCCL. For example, if NCCL are installed in folder other than the default folder /usr, use configure --with-nccl=<nccl install folder> (plus additional options) to build with NVIDIA NCCL support.

Note

Currently, CNTK's support for NVIDIA NCCL is limited to data-parallel SGD with 32/64 gradient bits, using the CNTK binary. Support for additional parallelization methods and CNTK v2 will be added in the future. The official release of CNTK is built with NCCL enabled. All linux Python wheels already include NCCL binary. For Brainscript users on Linux, NCCL needs to be installed. If user prefers to not use NCCL, please build CNTK from source. Note that configure automatically detects NCCL installed under /usr, so please uninstall NCCL before build.

CUB

If you are installing CNTK for Python, you may skip to the next section. Otherwise proceed further.

Get and install NVIDIA CUB using the commands below.

Important

Install NVIDIA CUB using the exact version and target path as specified below. This is necessary because it is expected by the CNTK build configuration program.

Use the following commands:

wget https://github.com/NVlabs/cub/archive/1.7.4.zip
unzip ./1.7.4.zip
sudo cp -r cub-1.7.4 /usr/local