Databricks Runtime 5.0 ML (Beta)

Databricks released this image in November 2018.

Databricks Runtime 5.0 ML provides a ready-to-go environment for machine learning and data science. It contains many popular libraries, including TensorFlow, Keras, and XGBoost. It also supports distributed TensorFlow training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

New features

Databricks Runtime 5.0 ML is built on top of Databricks Runtime 5.0. For information on what’s new in Databricks Runtime 5.0, see the Databricks Runtime 5.0 (Unsupported) release notes. In addition to the new features in Databricks Runtime 5.0, Databricks Runtime 5.0 ML includes the following new features:

Note

Databricks Runtime ML releases pick up all maintenance updates to the base Databricks Runtime release. For a list of all maintenance updates, see Databricks Runtime Maintenance Updates.

System environment

The difference in system environment in Databricks Runtime 5.0 and that in Databricks Runtime 5.0 ML is listed below.

  • Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla driver 396.44
    • CUDA 9.2
    • CUDNN 7.2.1

Libraries

The differences in the libraries included in Databricks Runtime 5.0 and those included in Databricks Runtime 5.0 ML are listed below.

Python libraries

Databricks Runtime 5.0 ML uses Conda for Python package management. Following is the full list of provided Python packages and versions installed using Conda package manager.

Library Version Library Version Library Version
absl-py 0.6.1 argparse 1.4.0 asn1crypto 0.24.0
astor 0.7.1 backports-abc 0.5 backports.functools-lru-cache 1.5
backports.weakref 1.0.post1 bcrypt 3.1.4 bleach 2.1.3
boto 2.48.0 boto3 1.7.62 botocore 1.10.62
certifi 2018.04.16 cffi 1.11.5 chardet 3.0.4
cloudpickle 0.5.3 colorama 0.3.9 configparser 3.5.0
cryptography 2.2.2 cycler 0.10.0 Cython 0.28.2
decorator 4.3.0 docutils 0.14 entrypoints 0.2.3
enum34 1.1.6 et-xmlfile 1.0.1 funcsigs 1.0.2
functools32 3.2.3-2 fusepy 2.0.4 futures 3.2.0
gast 0.2.0 grpcio 1.12.1 h5py 2.8.0
horovod 0.15.0 html5lib 1.0.1 idna 2.6
ipaddress 1.0.22 ipython 5.7.0 ipython_genutils 0.2.0
jdcal 1.4 Jinja2 2.10 jmespath 0.9.3
jsonschema 2.6.0 jupyter-client 5.2.3 jupyter-core 4.4.0
Keras 2.2.4 Keras-Applications 1.0.6 Keras-Preprocessing 1.0.5
kiwisolver 1.0.1 linecache2 1.0.0 llvmlite 0.23.1
lxml 4.2.1 Markdown 3.0.1 MarkupSafe 1.0
matplotlib 2.2.2 mistune 0.8.3 mleap 0.8.1
mock 2.0.0 msgpack 0.5.6 nbconvert 5.3.1
nbformat 4.4.0 nose 1.3.7 nose-exclude 0.5.0
numba 0.38.0+0.g2a2b772fc.dirty numpy 1.14.3 olefile 0.45.1
openpyxl 2.5.3 pandas 0.23.0 pandocfilters 1.4.2
paramiko 2.4.1 pathlib2 2.3.2 patsy 0.5.0
pbr 5.1.0 pexpect 4.5.0 pickleshare 0.7.4
Pillow 5.1.0 pip 10.0.1 ply 3.11
prompt-toolkit 1.0.15 protobuf 3.6.1 psycopg2 2.7.5
ptyprocess 0.5.2 pyarrow 0.8.0 pyasn1 0.4.4
pycparser 2.18 Pygments 2.2.0 PyNaCl 1.3.0
pyOpenSSL 18.0.0 pyparsing 2.2.0 PySocks 1.6.8
Python 2.7.15 python-dateutil 2.7.3 pytz 2018.4
PyYAML 3.12 pyzmq 17.0.0 requests 2.18.4
s3transfer 0.1.13 scandir 1.7 scikit-learn 0.19.1
scipy 1.1.0 seaborn 0.8.1 setuptools 39.1.0
simplegeneric 0.8.1 singledispatch 3.4.0.3 six 1.11.0
statsmodels 0.9.0 subprocess32 3.5.3 tensorboard 1.10.0
tensorflow 1.10.0 termcolor 1.1.0 testpath 0.3.1
tornado 5.0.2 traceback2 1.4.0 traitlets 4.3.2
unittest2 1.1.0 urllib3 1.22 virtualenv 16.0.0
wcwidth 0.1.7 webencodings 0.5.1 Werkzeug 0.14.1
wheel 0.31.1 wrapt 1.10.11 wsgiref 0.1.2

In addition, the following Spark packages include Python modules:

Spark Package Python Module Version
tensorframes tensorframes 0.5.0-s_2.11
graphframes graphframes 0.6.0-db3-spark2.4
spark-deep-learning sparkdl 1.3.0-db2-spark2.4

R libraries

The R libraries are identical to R Libraries on |DBR| 5.0.

Java and Scala libraries (Scala 2.11 cluster)

In addition to Java and Scala libraries in Databricks Runtime 5.0, Databricks Runtime 5.0 ML contains the following JARs:

Group ID Artifact ID Version
com.databricks spark-deep-learning 1.3.0-db2-spark2.4
org.tensorframes tensorframes 0.5.0-s_2.11
org.graphframes graphframes_2.11 0.6.0-db3-spark2.4
org.tensorflow libtensorflow 1.10.0
org.tensorflow libtensorflow_jni 1.10.0
org.tensorflow spark-tensorflow-connector_2.11 1.10.0-spark2.4-001
org.tensorflow tensorflow 1.10.0
ml.dmlc xgboost4j 0.80
ml.dmlc xgboost4j-spark 0.80
ml.combust.mleap mleap-databricks-runtime_2.11 0.13.0-SNAPSHOT