Databricks Runtime 5.1 ML (Beta) Databricks Runtime 5.1 ML (Beta)

Databricks 于2018年12月发布此图像。Databricks released this image in December 2018.

Databricks Runtime 5.1 ML 基于Databricks Runtime 5.1 (不支持的) 为机器学习和数据科学提供现成的环境。Databricks Runtime 5.1 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.1 (Unsupported). ML 的 Databricks 运行时包含许多常见的机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtimes for ML contain many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 的分布式 TensorFlow 培训。It also supports distributed TensorFlow training using Horovod.

有关详细信息,包括创建 Databricks Runtime ML 群集的说明,请参阅机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.1 ML 以 Databricks Runtime 5.1 为基础构建。Databricks Runtime 5.1 ML is built on top of Databricks Runtime 5.1. 有关 Databricks Runtime 5.1 的新增功能的信息,请参阅Databricks Runtime 5.1 (不支持的) 发行说明。For information on what’s new in Databricks Runtime 5.1, see the Databricks Runtime 5.1 (Unsupported) release notes. 除了中现有库的更新外,DATABRICKS RUNTIME 5.1 ML 还包括以下新功能:In addition to the updates to existing libraries in Libraries, Databricks Runtime 5.1 ML includes the following new features:

  • 用于构建深度学习网络的PyTorchPyTorch for building deep learning networks.

备注

Databricks Runtime ML 版本选取基本 Databricks Runtime 版本的所有维护更新。Databricks Runtime ML releases pick up all maintenance updates to the base Databricks Runtime release. 有关所有维护更新的列表,请参阅Databricks 运行时维护更新For a list of all maintenance updates, see Databricks runtime maintenance updates.

系统环境System environment

在 Databricks Runtime 5.1 的系统环境中,Databricks Runtime 5.1 ML 中的不同之处在于:The difference in system environment in Databricks Runtime 5.1 and that in Databricks Runtime 5.1 ML is:

  • Python:适用于 python 2 群集的2.7.15 和 python 3 群集的3.6.5。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils: DATABRICKS RUNTIME 5.1 ML 不包含库实用程序DBUtils: Databricks Runtime 5.1 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序396.44Tesla driver 396.44
    • CUDA 9。2CUDA 9.2
    • CUDNN 7.2。1CUDNN 7.2.1

Libraries

本部分列出了 Databricks Runtime 5.1 中包含的库和 Databricks Runtime 5.1 ML 中包含的库之间的差异。The differences in the libraries included in Databricks Runtime 5.1 and those included in Databricks Runtime 5.1 ML are listed in this section.

Python 库Python libraries

Databricks Runtime 5.1 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.1 ML uses Conda for Python package management. 因此,与 Databricks Runtime 相比,预安装的 Python 库发生了重大更改。As a result, there are major changes in pre-installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。Following is the full list of provided Python packages and versions installed using Conda package manager.

Library VersionVersion Library VersionVersion Library VersionVersion
absl-pyabsl-py 0.6.10.6.1 argparseargparse 1.4.01.4.0 asn1cryptoasn1crypto 0.24.00.24.0
astorastor 0.7.10.7.1 precise-backports-abcbackports-abc 0.50.5 precise-backports. functools-缓存backports.functools-lru-cache 1.51.5
precise-backports. weakrefbackports.weakref 1.0. post11.0.post1 bcryptbcrypt 3.1.43.1.4 bleachbleach 2.1.32.1.3
botoboto 2.48.02.48.0 boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi 1.11.51.11.5 chardetchardet 3.0.43.0.4
cloudpicklecloudpickle 0.5.30.5.3 coloramacolorama 0.3.90.3.9 configparserconfigparser 3.5.03.5.0
密码系统cryptography 2.2.22.2.2 cyclercycler 0.10.00.10.0 CythonCython 0.28.20.28.2
修饰器decorator 4.3.04.3.0 docutilsdocutils 0.140.14 sentrypoints 0.2.30.2.3
enum34enum34 1.1.61.1.6 et-xmlfileet-xmlfile 1.0.11.0.1 funcsigsfuncsigs 1.0.21.0.2
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy 2.0.42.0.4 Futurefutures 3.2.03.2.0
gastgast 0.2.00.2.0 grpciogrpcio 1.12.11.12.1 h5pyh5py 2.8.02.8.0
horovodhorovod 0.15.00.15.0 html5libhtml5lib 1.0.11.0.1 idnaidna 2.62.6
地址ipaddress 1.0.221.0.22 ipythonipython 5.7.05.7.0 ipython_genutilsipython_genutils 0.2.00.2.0
jdcaljdcal 1.41.4 Jinja2Jinja2 2.102.10 jmespathjmespath 0.9.30.9.3
jsonschemajsonschema 2.6.02.6.0 jupyter-客户端jupyter-client 5.2.35.2.3 jupyter-核心jupyter-core 4.4.04.4.0
KerasKeras 2.2.42.2.4 Keras-应用程序Keras-Applications 1.0.61.0.6 Keras-预处理Keras-Preprocessing 1.0.51.0.5
kiwisolverkiwisolver 1.0.11.0.1 linecache2linecache2 1.0.01.0.0 llvmlitellvmlite 0.23.10.23.1
lxmllxml 4.2.14.2.1 MarkdownMarkdown 3.0.13.0.1 MarkupSafeMarkupSafe 1.01.0
matplotlibmatplotlib 2.2.22.2.2 mistunemistune 0.8.30.8.3 mleapmleap 0.8.10.8.1
mockmock 2.0.02.0.0 msgpackmsgpack 0.5.60.5.6 nbconvertnbconvert 5.3.15.3.1
nbformatnbformat 4.4.04.4.0 nose 1.3.71.3.7 鼻子-排除nose-exclude 0.5.00.5.0
numbanumba 0.38.0 +0. g2a2b772fc0.38.0+0.g2a2b772fc.dirty numpynumpy 1.14.31.14.3 olefileolefile 0.45.10.45.1
openpyxlopenpyxl 2.5.32.5.3 pandaspandas 0.23.00.23.0 pandocfilterspandocfilters 1.4.21.4.2
paramikoparamiko 2.4.12.4.1 pathlib2pathlib2 2.3.22.3.2 patsypatsy 0.5.00.5.0
.pbrpbr 5.1.15.1.1 pexpectpexpect 4.5.04.5.0 picklesharepickleshare 0.7.40.7.4
PillowPillow 5.1.05.1.0 pippip 10.0.110.0.1 ply 3.113.11
提示-工具包prompt-toolkit 1.0.151.0.15 protobufprotobuf 3.6.13.6.1 psycopg2psycopg2 2.7.52.7.5
ptyprocessptyprocess 0.5.20.5.2 pyarrowpyarrow 0.8.00.8.0 pyasn1pyasn1 0.4.40.4.4
pycparserpycparser 2.182.18 PygmentsPygments 2.2.02.2.0 PyNaClPyNaCl 1.3.01.3.0
pyOpenSSLpyOpenSSL 18.0.018.0.0 pyparsingpyparsing 2.2.02.2.0 PySocksPySocks 1.6.81.6.8
PythonPython 2.7.152.7.15 python-dateutilpython-dateutil 2.7.32.7.3 pytzpytz 2018.42018.4
PyYAMLPyYAML 3.123.12 pyzmqpyzmq 17.0.017.0.0 请求requests 2.18.42.18.4
s3transfers3transfer 0.1.130.1.13 scandirscandir 1.71.7 scikit-learnscikit-learn 0.19.10.19.1
scipyscipy 1.1.01.1.0 seabornseaborn 0.8.10.8.1 setuptoolssetuptools 39.1.039.1.0
simplegenericsimplegeneric 0.8.10.8.1 singledispatchsingledispatch 3.4.0.33.4.0.3 6six 1.11.01.11.0
statsmodelsstatsmodels 0.9.00.9.0 subprocess32subprocess32 3.5.33.5.3 tensorboardtensorboard 1.12.01.12.0
tensorboardXtensorboardX 1.41.4 tensorflowtensorflow 1.12.01.12.0 termcolortermcolor 1.1.01.1.0
microsoft.vsts.test.testpathtestpath 0.3.10.3.1 torchtorch 0.4.10.4.1 torchvisiontorchvision 0.2.10.2.1
龙卷风tornado 5.0.25.0.2 traceback2traceback2 1.4.01.4.0 traitletstraitlets 4.3.24.3.2
unittest2unittest2 1.1.01.1.0 urllib3urllib3 1.221.22 virtualenvvirtualenv 16.0.016.0.0
wcwidthwcwidth 0.1.70.1.7 webencodingswebencodings 0.5.10.5.1 WerkzeugWerkzeug 0.14.10.14.1
wheelwheel 0.31.10.31.1 wraptwrapt 1.10.111.10.11 wsgirefwsgiref 0.1.20.1.2

此外,以下 Spark 包包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module VersionVersion
tensorframestensorframes tensorframestensorframes 0.6.0-s_2 110.6.0-s_2.11
graphframesgraphframes graphframesgraphframes 0.6.0-db3-spark 2。40.6.0-db3-spark2.4
spark-深入了解spark-deep-learning sparkdlsparkdl 1.4.0-db2-spark 2。41.4.0-db2-spark2.4

R 库R libraries

R 库与Databricks Runtime 5.1 上的 r 库相同。The R libraries are identical to R Libraries on Databricks Runtime 5.1.

Java 和 Scala 库 (Scala 2.11 群集) Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.1 中的 Java 和 Scala 库,Databricks Runtime 5.1 ML 包含以下 Jar:In addition to Java and Scala libraries in Databricks Runtime 5.1, Databricks Runtime 5.1 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID VersionVersion
databrickscom.databricks spark-深入了解spark-deep-learning 1.4.0-db2-spark 2。41.4.0-db2-spark2.4
tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2 110.6.0-s_2.11
graphframesorg.graphframes graphframes_2 11graphframes_2.11 0.6.0-db3-spark 2。40.6.0-db3-spark2.4
tensorfloworg.tensorflow libtensorflowlibtensorflow 1.12.01.12.0
tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni 1.12.01.12.0
tensorfloworg.tensorflow spark-tensorflow-connector_2spark-tensorflow-connector_2.11 1.12.01.12.0
tensorfloworg.tensorflow tensorflowtensorflow 1.12.01.12.0
ml dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
combust. mleapml.combust.mleap mleap-databricks-runtime_2mleap-databricks-runtime_2.11 0.13.00.13.0