Databricks Runtime 5.4 ML (不受支持) Databricks Runtime 5.4 ML (Unsupported)

Databricks 在2019年6月发布此图像。Databricks released this image in June 2019.

Databricks Runtime 5.4 ML 基于Databricks Runtime 5.4 (不支持的) 为机器学习和数据科学提供现成的环境。Databricks Runtime 5.4 ML provides a ready-to-go environment for machine learning and data science based on Databricks Runtime 5.4 (Unsupported). ML Databricks Runtime 包含许多热门机器学习库,包括 TensorFlow、PyTorch、Keras 和 XGBoost。Databricks Runtime for ML contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. 它还支持使用 Horovod 的分布式深度学习培训。It also supports distributed deep learning training using Horovod.

有关详细信息,包括创建 Databricks Runtime ML 群集的说明,请参阅机器学习的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

新增功能New features

Databricks Runtime 5.4 ML 以 Databricks Runtime 5.4 为基础构建。Databricks Runtime 5.4 ML is built on top of Databricks Runtime 5.4. 有关 Databricks Runtime 5.4 的新增功能的信息,请参阅Databricks Runtime 5.4 (不支持的) 发行说明。For information on what’s new in Databricks Runtime 5.4, see the Databricks Runtime 5.4 (Unsupported) release notes.

除了库更新外,DATABRICKS RUNTIME 5.4 ML 引入了以下新功能:In addition to library updates, Databricks Runtime 5.4 ML introduces the following new features:

分布式 Hyperopt + 自动MLflow跟踪Distributed Hyperopt + automated MLflow tracking

Databricks Runtime 5.4 ML 引入了新的Hyperopt实现,该实现由Apache Spark为扩展和简化超参数调整提供支持。Databricks Runtime 5.4 ML introduces a new implementation of Hyperopt powered by Apache Spark to scale and simplify hyperparameter tuning. 实现了一个新 Trials 类, SparkTrials 以便使用 Apache Spark 在多个计算机和节点之间分发 Hyperopt 试用版。A new Trials class SparkTrials is implemented to distribute Hyperopt trial runs among multiple machines and nodes using Apache Spark. 此外,所有优化试验以及优化的超参数和目标指标会自动记录到MLflow 运行In addition, all tuning experiments, along with the tuned hyperparameters and targeted metrics, are automatically logged to MLflow runs. 请参阅Distributed Hyperopt and 自动 MLflow 跟踪See Distributed Hyperopt and automated MLflow tracking.

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

Apache Spark MLlib + 自动MLflow跟踪Apache Spark MLlib + automated MLflow tracking

Databricks Runtime 5.4 ML 支持自动记录适用于模型的MLflow 运行,适用于使用 PySpark 优化算法 CrossValidatorTrainValidationSplitDatabricks Runtime 5.4 ML supports automatic logging of MLflow runs for models fit using PySpark tuning algorithms CrossValidator and TrainValidationSplit. 请参阅Apache Spark MLlib 和自动 MLflow 跟踪See Apache Spark MLlib and automated MLflow tracking. 默认情况下,此功能在 Databricks Runtime 5.4 ML 中启用,但在 Databricks Runtime 5.3 ML 中默认情况下处于关闭状态。This feature is on by default in Databricks Runtime 5.4 ML but was off by default in Databricks Runtime 5.3 ML.

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

HorovodRunner改进HorovodRunner improvement

从 Horovod 发送到 Spark 驱动程序节点的输出现在显示在笔记本单元中。Output sent from Horovod to the Spark driver node is now visible in notebook cells.

XGBoost Python 包更新XGBoost Python package update

已安装XGBoost Python 包0.80。XGBoost Python package 0.80 is installed.

系统环境System environment

Databricks Runtime 5.4 ML 中的系统环境不同于 Databricks Runtime 5.4,如下所示:The system environment in Databricks Runtime 5.4 ML differs from Databricks Runtime 5.4 as follows:

  • Python:适用于 python 2 群集的2.7.15 和 python 3 群集的3.6.5。Python: 2.7.15 for Python 2 clusters and 3.6.5 for Python 3 clusters.
  • DBUtils: DATABRICKS RUNTIME 5.4 ML 不包含库实用程序DBUtils: Databricks Runtime 5.4 ML does not contain Library utilities.
  • 对于 GPU 群集,以下 NVIDIA GPU 库:For GPU clusters, the following NVIDIA GPU libraries:
    • Tesla 驱动程序396.44Tesla driver 396.44
    • CUDA 9。2CUDA 9.2
    • CUDNN 7.2。1CUDNN 7.2.1

Libraries

以下部分列出了 Databricks Runtime 5.4 ML 中包含的库,这些库不同于 Databricks Runtime 5.4 中包含的库。The following sections list the libraries included in Databricks Runtime 5.4 ML that differ from those included in Databricks Runtime 5.4.

顶层库Top-tier libraries

Databricks Runtime 5.4 ML 包含以下顶层Databricks Runtime 5.4 ML includes the following top-tier libraries:

Python 库Python libraries

Databricks Runtime 5.4 ML 使用 Conda 进行 Python 包管理。Databricks Runtime 5.4 ML uses Conda for Python package management. 因此,与 Databricks Runtime 相比,安装的 Python 库有重大差别。As a result, there are major differences in installed Python libraries compared to Databricks Runtime. 下面是所提供的 Python 包和使用 Conda 包管理器安装的版本的完整列表。The following is a full list of provided Python packages and versions installed using Conda package manager.

Library 版本Version Library 版本Version Library 版本Version
absl-pyabsl-py 0.7.10.7.1 argparseargparse 1.4.01.4.0 asn1cryptoasn1crypto 0.24.00.24.0
astorastor 0.7.10.7.1 precise-backports-abcbackports-abc 0.50.5 precise-backports. functools-缓存backports.functools-lru-cache 1.51.5
precise-backports. weakrefbackports.weakref 1.0. post11.0.post1 bcryptbcrypt 3.1.63.1.6 bleachbleach 2.1.32.1.3
botoboto 2.48.02.48.0 boto3boto3 1.7.621.7.62 botocorebotocore 1.10.621.10.62
certificertifi 2018.04.162018.04.16 cfficffi 1.11.51.11.5 chardetchardet 3.0.43.0.4
cloudpicklecloudpickle 0.5.30.5.3 coloramacolorama 0.3.90.3.9 configparserconfigparser 3.5.03.5.0
密码系统cryptography 2.2.22.2.2 cyclercycler 0.10.00.10.0 CythonCython 0.28.20.28.2
修饰器decorator 4.3.04.3.0 docutilsdocutils 0.140.14 sentrypoints 0.2.30.2.3
enum34enum34 1.1.61.1.6 et-xmlfileet-xmlfile 1.0.11.0.1 funcsigsfuncsigs 1.0.21.0.2
functools32functools32 3.2.3-23.2.3-2 fusepyfusepy 2.0.42.0.4 此后future 0.17.10.17.1
Futurefutures 3.2.03.2.0 gastgast 0.2.20.2.2 grpciogrpcio 1.12.11.12.1
h5pyh5py 2.8.02.8.0 horovodhorovod 0.16.00.16.0 html5libhtml5lib 1.0.11.0.1
hyperopthyperopt 0.1.2. db40.1.2.db4 idnaidna 2.62.6 地址ipaddress 1.0.221.0.22
ipythonipython 5.7.05.7.0 ipython_genutilsipython_genutils 0.2.00.2.0 jdcaljdcal 1.41.4
Jinja2Jinja2 2.102.10 jmespathjmespath 0.9.40.9.4 jsonschemajsonschema 2.6.02.6.0
jupyter-客户端jupyter-client 5.2.35.2.3 jupyter-核心jupyter-core 4.4.04.4.0 KerasKeras 2.2.42.2.4
Keras-应用程序Keras-Applications 1.0.71.0.7 Keras-预处理Keras-Preprocessing 1.0.91.0.9 kiwisolverkiwisolver 1.1.01.1.0
linecache2linecache2 1.0.01.0.0 llvmlitellvmlite 0.23.10.23.1 lxmllxml 4.2.14.2.1
MarkdownMarkdown 3.1.13.1.1 MarkupSafeMarkupSafe 1.01.0 matplotlibmatplotlib 2.2.22.2.2
mistunemistune 0.8.30.8.3 mkl-fftmkl-fft 1.0.01.0.0 mkl-随机mkl-random 1.0.11.0.1
mleapmleap 0.8.10.8.1 mockmock 2.0.02.0.0 msgpackmsgpack 0.5.60.5.6
nbconvertnbconvert 5.3.15.3.1 nbformatnbformat 4.4.04.4.0 networkxnetworkx 2.22.2
nose 1.3.71.3.7 鼻子-排除nose-exclude 0.5.00.5.0 numbanumba 0.38.0 +0. g2a2b772fc0.38.0+0.g2a2b772fc.dirty
numpynumpy 1.14.31.14.3 olefileolefile 0.45.10.45.1 openpyxlopenpyxl 2.5.32.5.3
pandaspandas 0.23.00.23.0 pandocfilterspandocfilters 1.4.21.4.2 paramikoparamiko 2.4.12.4.1
pathlib2pathlib2 2.3.22.3.2 patsypatsy 0.5.00.5.0 .pbrpbr 5.1.35.1.3
pexpectpexpect 4.5.04.5.0 picklesharepickleshare 0.7.40.7.4 PillowPillow 5.1.05.1.0
pippip 10.0.110.0.1 ply 3.113.11 提示-工具包prompt-toolkit 1.0.151.0.15
protobufprotobuf 3.7.13.7.1 psutilpsutil 5.6.25.6.2 psycopg2psycopg2 2.7.52.7.5
ptyprocessptyprocess 0.5.20.5.2 pyarrowpyarrow 0.12.10.12.1 pyasn1pyasn1 0.4.50.4.5
pycparserpycparser 2.182.18 PygmentsPygments 2.2.02.2.0 pymongopymongo 3.8.03.8.0
PyNaClPyNaCl 1.3.01.3.0 pyOpenSSLpyOpenSSL 18.0.018.0.0 pyparsingpyparsing 2.2.02.2.0
PySocksPySocks 1.6.81.6.8 PythonPython 2.7.152.7.15 python-dateutilpython-dateutil 2.7.32.7.3
pytzpytz 2018.42018.4 PyYAMLPyYAML 5.15.1 pyzmqpyzmq 17.0.017.0.0
请求requests 2.18.42.18.4 s3transfers3transfer 0.1.130.1.13 scandirscandir 1.71.7
scikit-learnscikit-learn 0.19.10.19.1 scipyscipy 1.1.01.1.0 seabornseaborn 0.8.10.8.1
setuptoolssetuptools 39.1.039.1.0 simplegenericsimplegeneric 0.8.10.8.1 singledispatchsingledispatch 3.4.0.33.4.0.3
6six 1.11.01.11.0 statsmodelsstatsmodels 0.9.00.9.0 subprocess32subprocess32 3.5.43.5.4
tensorboardtensorboard 1.12.21.12.2 tensorboardXtensorboardX 1.61.6 tensorflowtensorflow 1.12.01.12.0
termcolortermcolor 1.1.01.1.0 microsoft.vsts.test.testpathtestpath 0.3.10.3.1 torchtorch 0.4.10.4.1
torchvisiontorchvision 0.2.10.2.1 龙卷风tornado 5.0.25.0.2 tqdmtqdm 4.32.14.32.1
traceback2traceback2 1.4.01.4.0 traitletstraitlets 4.3.24.3.2 unittest2unittest2 1.1.01.1.0
urllib3urllib3 1.221.22 virtualenvvirtualenv 16.0.016.0.0 wcwidthwcwidth 0.1.70.1.7
webencodingswebencodings 0.5.10.5.1 WerkzeugWerkzeug 0.14.10.14.1 wheelwheel 0.31.10.31.1
wraptwrapt 1.10.111.10.11 wsgirefwsgiref 0.1.20.1.2

此外,以下 Spark 包包括 Python 模块:In addition, the following Spark packages include Python modules:

Spark 包Spark Package Python 模块Python Module 版本Version
graphframesgraphframes graphframesgraphframes 0.7.0 编写-db1-spark 2。40.7.0-db1-spark2.4
spark-深入了解spark-deep-learning sparkdlsparkdl 1.5.0-db3-spark 2。41.5.0-db3-spark2.4
tensorframestensorframes tensorframestensorframes 0.6.0-s_2 110.6.0-s_2.11

R 库R libraries

R 库与Databricks Runtime 5.4 中的 r 库相同。The R libraries are identical to the R Libraries in Databricks Runtime 5.4.

Java 和 Scala 库 (Scala 2.11 群集) Java and Scala libraries (Scala 2.11 cluster)

除了 Databricks Runtime 5.4 中的 Java 和 Scala 库,Databricks Runtime 5.4 ML 包含以下 Jar:In addition to Java and Scala libraries in Databricks Runtime 5.4, Databricks Runtime 5.4 ML contains the following JARs:

组 IDGroup ID 项目 IDArtifact ID 版本Version
databrickscom.databricks spark-深入了解spark-deep-learning 1.5.0-db3-spark 2。41.5.0-db3-spark2.4
类型安全. akkacom.typesafe.akka akka-actor_2 11akka-actor_2.11 2.3.112.3.11
combust. mleapml.combust.mleap mleap-databricks-runtime_2mleap-databricks-runtime_2.11 0.13.00.13.0
ml dmlcml.dmlc xgboost4jxgboost4j 0.810.81
ml dmlcml.dmlc xgboost4j-sparkxgboost4j-spark 0.810.81
graphframesorg.graphframes graphframes_2 11graphframes_2.11 0.7.0 编写-db1-spark 2。40.7.0-db1-spark2.4
tensorfloworg.tensorflow libtensorflowlibtensorflow 1.12.01.12.0
tensorfloworg.tensorflow libtensorflow_jnilibtensorflow_jni 1.12.01.12.0
tensorfloworg.tensorflow spark-tensorflow-connector_2spark-tensorflow-connector_2.11 1.12.01.12.0
tensorfloworg.tensorflow tensorflowtensorflow 1.12.01.12.0
tensorframesorg.tensorframes tensorframestensorframes 0.6.0-s_2 110.6.0-s_2.11