Databricks Runtime 10.5 for Machine Learning

Databricks Runtime 10.5 for Machine Learning provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 10.5. Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

New features and improvements

Databricks Runtime 10.5 ML is built on top of Databricks Runtime 10.5. For information on what’s new in Databricks Runtime 10.5, including Apache Spark MLlib and SparkR, see the Databricks Runtime 10.5 release notes.

Enhancements to Databricks AutoML

The following enhancements have been made to Databricks AutoML.

  • Improved memory usage allows AutoML to train on larger datasets.
  • With AutoML forecasting, you can now export the best model’s predictions to a table using the API. If output_database is provided, AutoML saves predictions of the best model to a new table in the specified database. The predictions are not saved if output_database is not specified.

Enhancements to Databricks Feature Store

The following enhancements have been made to Databricks Feature Store.

System environment

The system environment in Databricks Runtime 10.5 ML differs from Databricks Runtime 10.5 as follows:

Libraries

The following sections list the libraries included in Databricks Runtime 10.5 ML that differ from those included in Databricks Runtime 10.5.

In this section:

Top-tier libraries

Databricks Runtime 10.5 ML includes the following top-tier libraries:

Python libraries

Databricks Runtime 10.5 ML uses Virtualenv for Python package management and includes many popular ML packages.

In addition to the packages specified in the in the following sections, Databricks Runtime 10.5 ML also includes the following packages:

  • hyperopt 0.2.7.db1
  • sparkdl 2.2.0-db6
  • feature_store 0.4.1
  • automl 1.8.0

Python libraries on CPU clusters

Library Version Library Version Library Version
absl-py 0.11.0 Antergos Linux 2015.10 (ISO-Rolling) appdirs 1.4.4
argon2-cffi 20.1.0 astor 0.8.1 astunparse 1.6.3
async-generator 1.10 attrs 20.3.0 backcall 0.2.0
bcrypt 3.2.0 bidict 0.21.4 bleach 3.3.0
blis 0.7.7 boto3 1.16.7 botocore 1.19.7
cachetools 4.2.4 catalogue 2.0.7 certifi 2020.12.5
cffi 1.14.5 chardet 4.0.0 click 7.1.2
cloudpickle 1.6.0 cmdstanpy 0.9.68 configparser 5.0.1
convertdate 2.4.0 cryptography 3.4.7 cycler 0.10.0
cymem 2.0.6 Cython 0.29.23 databricks-automl-runtime 0.2.7
databricks-cli 0.16.4 dbl-tempo 0.1.2 dbus-python 1.2.16
decorator 5.0.6 defusedxml 0.7.1 dill 0.3.2
diskcache 5.4.0 distlib 0.3.4 distro-info 0.23ubuntu1
entrypoints 0.3 ephem 4.1.3 facets-overview 1.0.0
fasttext 0.9.2 filelock 3.0.12 Flask 1.1.2
flatbuffers 2.0 fsspec 0.9.0 future 0.18.2
gast 0.4.0 gitdb 4.0.9 GitPython 3.1.12
google-auth 1.22.1 google-auth-oauthlib 0.4.2 google-pasta 0.2.0
grpcio 1.39.0 gunicorn 20.0.4 gviz-api 1.10.0
h5py 3.1.0 hijri-converter 2.2.3 holidays 0.13
horovod 0.23.0 htmlmin 0.1.12 huggingface-hub 0.5.1
idna 2.10 ImageHash 4.2.1 imbalanced-learn 0.8.1
importlib-metadata 3.10.0 ipykernel 5.3.4 ipython 7.22.0
ipython-genutils 0.2.0 ipywidgets 7.6.3 isodate 0.6.0
itsdangerous 1.1.0 jedi 0.17.2 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.0.1 joblibspark 0.3.0
jsonschema 3.2.0 jupyter-client 6.1.12 jupyter-core 4.7.1
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.8.0
Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 koalas 1.8.2
korean-lunar-calendar 0.2.1 langcodes 3.3.0 libclang 13.0.0
lightgbm 3.3.2 llvmlite 0.38.0 LunarCalendar 0.0.9
Mako 1.1.3 Markdown 3.3.3 MarkupSafe 2.0.1
matplotlib 3.4.2 missingno 0.5.1 mistune 0.8.4
mleap 0.18.1 mlflow-skinny 1.24.0 multimethod 1.8
murmurhash 1.0.6 nbclient 0.5.3 nbconvert 6.0.7
nbformat 5.1.3 nest-asyncio 1.5.1 networkx 2.5
nltk 3.6.1 notebook 6.3.0 numba 0.55.1
numpy 1.20.1 oauthlib 3.1.0 opt-einsum 3.3.0
packaging 21.3 pandas 1.2.4 pandas-profiling 3.1.0
pandocfilters 1.4.3 paramiko 2.7.2 parso 0.7.0
pathy 0.6.1 patsy 0.5.1 petastorm 0.11.4
pexpect 4.8.0 phik 0.12.2 pickleshare 0.7.5
Pillow 8.2.0 pip 21.0.1 plotly 5.6.0
pmdarima 1.8.5 preshed 3.0.6 prometheus-client 0.10.1
prompt-toolkit 3.0.17 prophet 1.0.1 protobuf 3.17.2
psutil 5.8.0 psycopg2 2.8.5 ptyprocess 0.7.0
pyarrow 4.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8
pybind11 2.9.2 pycparser 2.20 pydantic 1.8.2
Pygments 2.8.1 PyGObject 3.36.0 PyMeeus 0.5.11
PyNaCl 1.5.0 pyodbc 4.0.30 pyparsing 2.4.7
pyrsistent 0.17.3 pystan 2.19.1.1 python-apt 2.0.0+ubuntu0.20.4.7
python-dateutil 2.8.1 python-editor 1.0.4 python-engineio 4.3.0
python-socketio 5.4.1 pytz 2020.5 PyWavelets 1.1.1
PyYAML 5.4.1 pyzmq 20.0.0 regex 2021.4.4
requests 2.25.1 requests-oauthlib 1.3.0 requests-unixsocket 0.2.0
rsa 4.8 s3transfer 0.3.7 sacremoses 0.0.49
scikit-learn 0.24.1 scipy 1.6.2 seaborn 0.11.1
Send2Trash 1.5.0 setuptools 52.0.0 setuptools-git 1.2
shap 0.40.0 simplejson 3.17.2 six 1.15.0
slicer 0.0.7 smart-open 5.2.1 smmap 3.0.5
spacy 3.2.3 spacy-legacy 3.0.9 spacy-loggers 1.0.2
spark-tensorflow-distributor 1.0.0 sqlparse 0.4.1 srsly 2.4.3
ssh-import-id 5.10 statsmodels 0.12.2 tabulate 0.8.7
tangled-up-in-unicode 0.1.0 tenacity 6.2.0 tensorboard 2.8.0
tensorboard-data-server 0.6.1 tensorboard-plugin-profile 2.5.0 tensorboard-plugin-wit 1.8.1
tensorflow-cpu 2.8.0 tensorflow-estimator 2.8.0 tensorflow-io-gcs-filesystem 0.24.0
termcolor 1.1.0 terminado 0.9.4 testpath 0.4.4
tf-estimator-nightly 2.8.0.dev2021122109 thinc 8.0.15 threadpoolctl 2.1.0
tokenizers 0.12.1 torch 1.10.2+cpu torchvision 0.11.3+cpu
tornado 6.1 tqdm 4.59.0 traitlets 5.0.5
transformers 4.17.0 typer 0.4.1 typing-extensions 3.7.4.3
ujson 4.0.2 unattended-upgrades 0.1 urllib3 1.25.11
virtualenv 20.4.1 visions 0.7.4 wasabi 0.9.1
wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.57.0
Werkzeug 1.0.1 wheel 0.36.2 widgetsnbextension 3.5.1
wrapt 1.12.1 xgboost 1.5.2 zipp 3.4.1

Python libraries on GPU clusters

Library Version Library Version Library Version
absl-py 0.11.0 Antergos Linux 2015.10 (ISO-Rolling) appdirs 1.4.4
argon2-cffi 20.1.0 astor 0.8.1 astunparse 1.6.3
async-generator 1.10 attrs 20.3.0 backcall 0.2.0
bcrypt 3.2.0 bidict 0.21.4 bleach 3.3.0
blis 0.7.7 boto3 1.16.7 botocore 1.19.7
cachetools 4.2.4 catalogue 2.0.7 certifi 2020.12.5
cffi 1.14.5 chardet 4.0.0 click 7.1.2
cloudpickle 1.6.0 cmdstanpy 0.9.68 configparser 5.0.1
convertdate 2.4.0 cryptography 3.4.7 cycler 0.10.0
cymem 2.0.6 Cython 0.29.23 databricks-automl-runtime 0.2.7
databricks-cli 0.16.4 dbl-tempo 0.1.2 dbus-python 1.2.16
decorator 5.0.6 defusedxml 0.7.1 dill 0.3.2
diskcache 5.4.0 distlib 0.3.4 distro-info 0.23ubuntu1
entrypoints 0.3 ephem 4.1.3 facets-overview 1.0.0
fasttext 0.9.2 filelock 3.0.12 Flask 1.1.2
flatbuffers 2.0 fsspec 0.9.0 future 0.18.2
gast 0.4.0 gitdb 4.0.9 GitPython 3.1.12
google-auth 1.22.1 google-auth-oauthlib 0.4.2 google-pasta 0.2.0
grpcio 1.39.0 gunicorn 20.0.4 gviz-api 1.10.0
h5py 3.1.0 hijri-converter 2.2.3 holidays 0.13
horovod 0.23.0 htmlmin 0.1.12 huggingface-hub 0.5.1
idna 2.10 ImageHash 4.2.1 imbalanced-learn 0.8.1
importlib-metadata 3.10.0 ipykernel 5.3.4 ipython 7.22.0
ipython-genutils 0.2.0 ipywidgets 7.6.3 isodate 0.6.0
itsdangerous 1.1.0 jedi 0.17.2 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.0.1 joblibspark 0.3.0
jsonschema 3.2.0 jupyter-client 6.1.12 jupyter-core 4.7.1
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.8.0
Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 koalas 1.8.2
korean-lunar-calendar 0.2.1 langcodes 3.3.0 libclang 13.0.0
lightgbm 3.3.2 llvmlite 0.38.0 LunarCalendar 0.0.9
Mako 1.1.3 Markdown 3.3.3 MarkupSafe 2.0.1
matplotlib 3.4.2 missingno 0.5.1 mistune 0.8.4
mleap 0.18.1 mlflow-skinny 1.24.0 multimethod 1.8
murmurhash 1.0.6 nbclient 0.5.3 nbconvert 6.0.7
nbformat 5.1.3 nest-asyncio 1.5.1 networkx 2.5
nltk 3.6.1 notebook 6.3.0 numba 0.55.1
numpy 1.20.1 oauthlib 3.1.0 opt-einsum 3.3.0
packaging 21.3 pandas 1.2.4 pandas-profiling 3.1.0
pandocfilters 1.4.3 paramiko 2.7.2 parso 0.7.0
pathy 0.6.1 patsy 0.5.1 petastorm 0.11.4
pexpect 4.8.0 phik 0.12.2 pickleshare 0.7.5
Pillow 8.2.0 pip 21.0.1 plotly 5.6.0
pmdarima 1.8.5 preshed 3.0.6 prompt-toolkit 3.0.17
prophet 1.0.1 protobuf 3.17.2 psutil 5.8.0
psycopg2 2.8.5 ptyprocess 0.7.0 pyarrow 4.0.0
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.9.2
pycparser 2.20 pydantic 1.8.2 Pygments 2.8.1
PyGObject 3.36.0 PyMeeus 0.5.11 PyNaCl 1.5.0
pyodbc 4.0.30 pyparsing 2.4.7 pyrsistent 0.17.3
pystan 2.19.1.1 python-apt 2.0.0+ubuntu0.20.4.7 python-dateutil 2.8.1
python-editor 1.0.4 python-engineio 4.3.0 python-socketio 5.4.1
pytz 2020.5 PyWavelets 1.1.1 PyYAML 5.4.1
pyzmq 20.0.0 regex 2021.4.4 requests 2.25.1
requests-oauthlib 1.3.0 requests-unixsocket 0.2.0 rsa 4.8
s3transfer 0.3.7 sacremoses 0.0.49 scikit-learn 0.24.1
scipy 1.6.2 seaborn 0.11.1 Send2Trash 1.5.0
setuptools 52.0.0 setuptools-git 1.2 shap 0.40.0
simplejson 3.17.2 six 1.15.0 slicer 0.0.7
smart-open 5.2.1 smmap 3.0.5 spacy 3.2.3
spacy-legacy 3.0.9 spacy-loggers 1.0.2 spark-tensorflow-distributor 1.0.0
sqlparse 0.4.1 srsly 2.4.3 ssh-import-id 5.10
statsmodels 0.12.2 tabulate 0.8.7 tangled-up-in-unicode 0.1.0
tenacity 6.2.0 tensorboard 2.8.0 tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.5.0 tensorboard-plugin-wit 1.8.1 tensorflow 2.8.0
tensorflow-estimator 2.8.0 tensorflow-io-gcs-filesystem 0.24.0 termcolor 1.1.0
terminado 0.9.4 testpath 0.4.4 tf-estimator-nightly 2.8.0.dev2021122109
thinc 8.0.15 threadpoolctl 2.1.0 tokenizers 0.12.1
torch 1.10.2+cu113 torchvision 0.11.3+cu113 tornado 6.1
tqdm 4.59.0 traitlets 5.0.5 transformers 4.17.0
typer 0.4.1 typing-extensions 3.7.4.3 ujson 4.0.2
unattended-upgrades 0.1 urllib3 1.25.11 virtualenv 20.4.1
visions 0.7.4 wasabi 0.9.1 wcwidth 0.2.5
webencodings 0.5.1 websocket-client 0.57.0 Werkzeug 1.0.1
wheel 0.36.2 widgetsnbextension 3.5.1 wrapt 1.12.1
xgboost 1.5.2 zipp 3.4.1

Spark packages containing Python modules

Spark Package Python Module Version
graphframes graphframes 0.8.2-db1-spark3.2

R libraries

The R libraries are identical to the R Libraries in Databricks Runtime 10.5.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 10.5, Databricks Runtime 10.5 ML contains the following JARs:

CPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 0.18.1-23eb1ef
ml.dmlc xgboost4j-spark_2.12 1.5.2
ml.dmlc xgboost4j_2.12 1.5.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 1.24.0
org.mlflow mlflow-spark 1.24.0
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

GPU clusters

Group ID Artifact ID Version
com.typesafe.akka akka-actor_2.12 2.5.23
ml.combust.mleap mleap-databricks-runtime_2.12 0.18.1-23eb1ef
ml.dmlc xgboost4j-spark_2.12 1.5.2
ml.dmlc xgboost4j_2.12 1.5.2
org.graphframes graphframes_2.12 0.8.2-db1-spark3.2
org.mlflow mlflow-client 1.24.0
org.mlflow mlflow-spark 1.24.0
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0