在 Databricks Runtime 6.5 ML GPU 群集上安装 TensorFlow 2.1Install TensorFlow 2.1 on Databricks Runtime 6.5 ML GPU clusters

Databricks Runtime ML包括 TensorFlow 版本,因此你可以在不安装任何包的情况下使用。Databricks Runtime ML includes versions of TensorFlow so you can use it without installing any packages.

Databricks Runtime ML 版Databricks Runtime ML Version TensorFlow 版本TensorFlow Version
7.07.0 2.2.02.2.0
6.3-6。66.3 - 6.6 1.15.01.15.0

可以通过使用群集范围内的初始化脚本来安装其他版本的 TensorFlow。You can install other versions of TensorFlow by using a cluster-scoped init script.

本文介绍如何在 Databricks Runtime 6.5 ML GPU 群集上安装 TensorFlow 2.1。In this article, you learn how to install TensorFlow 2.1 on Databricks Runtime 6.5 ML GPU clusters.

重要

删除默认库和安装新版本可能会导致不稳定或完全中断 Azure Databricks 群集。Removing default libraries and installing new versions may cause instability or completely break your Azure Databricks cluster. 在运行生产作业之前,你应该在你的环境中全面测试所有新的库版本。You should thoroughly test any new library version in your environment before running production jobs.

安装 init 脚本Install the init script

  1. 在 Databricks Runtime 6.5 ML GPU 群集上安装以下群集范围的初始化脚本Install the following cluster-scoped init script on your Databricks Runtime 6.5 ML GPU cluster.

    #!/bin/bash
    set -e
    
    apt-get update
    apt-get install -y --no-install-recommends --allow-downgrades \
      libnccl2=2.4.8-1+cuda10.1 \
      libnccl-dev=2.4.8-1+cuda10.1 \
      cuda-libraries-10-1 \
      libcudnn7=7.6.4.38-1+cuda10.1 \
      libcudnn7-dev=7.6.4.38-1+cuda10.1 \
      libnvinfer6=6.0.1-1+cuda10.1 \
      libnvinfer-dev=6.0.1-1+cuda10.1 \
      libnvinfer-plugin6=6.0.1-1+cuda10.1
    apt-get clean
    ln -sfn cuda-10.1 /usr/local/cuda
    
    pip install tensorflow==2.1.* setuptools==41.* grpcio==1.24.*
    
    # This `conda list` is necessary to recognize the pip-installed packages.
    conda list
    conda install cudatoolkit=10.1
    
  2. 重新启动群集。Restart the cluster.