用于机器学习的 Databricks Runtime Databricks Runtime for Machine Learning

用于机器学习的 Databricks Runtime (Databricks Runtime ML) 是一个针对机器学习而优化的现成环境。Databricks Runtime for Machine Learning (Databricks Runtime ML) automates the creation of a cluster optimized for machine learning. Databricks Runtime ML 群集包括最常见的机器学习库,例如 TensorFlow、PyTorch、Keras 和 XGBoost,还包括分布式训练所需的库,如 Horovod。Databricks Runtime ML clusters include the most popular machine learning libraries, such as TensorFlow, PyTorch, Keras, and XGBoost, and also include libraries required for distributed training such as Horovod. 使用 Databricks Runtime ML 可以加快群集创建速度,并确保已安装的库版本兼容。Using Databricks Runtime ML speeds up cluster creation and ensures that the installed library versions are compatible.

有关使用 Azure Databricks 进行机器学习和深度学习的完整信息,请参阅机器学习和深度学习For complete information about using Azure Databricks for machine learning and deep learning, see Machine learning and deep learning.

有关每个 Databricks Runtime ML 版本的内容的信息,请参阅发行说明For information about the contents of each Databricks Runtime ML version, see the release notes.

Databricks Runtime ML 基于 Databricks Runtime 构建。Databricks Runtime ML is built on Databricks Runtime. 例如,Databricks Runtime 7.3 LTS ML 是在 Databricks Runtime 7.3 LTS 上构建的。For example, Databricks Runtime 7.3 LTS ML is built on Databricks Runtime 7.3 LTS. Databricks Runtime 发行说明中列出了基本 Databricks Runtime 中包含的库。The libraries included in the base Databricks Runtime are listed in the Databricks Runtime release notes.

用于机器学习的 Databricks Runtime 的简介 Introduction to Databricks Runtime for Machine Learning

本教程为 Databricks Runtime ML 的新用户设计。This tutorial is designed for new users of Databricks Runtime ML. 完成此过程大约需要 10 分钟,并显示加载表格数据、训练模型、分布式超参数优化和模型推理的完整端到端示例。It takes about 10 minutes to work through, and shows a complete end-to-end example of loading tabular data, training a model, distributed hyperparameter tuning, and model inference. 示例还演示了如何使用 MLflow API 和 MLflow 模型注册表。It also illustrates how to use the MLflow API and MLflow Model Registry.

Databricks 教程笔记本Databricks tutorial notebook

获取笔记本Get notebook

Databricks Runtime ML 中已包含库 Libraries included in Databricks Runtime ML

备注

库实用程序在 Databricks Runtime ML 中不可用。Library utilities are not available in Databricks Runtime ML.

Databricks Runtime ML 包含各种常见的 ML 库。The Databricks Runtime ML includes a variety of popular ML libraries. 该库使用每个发行版进行更新,以包括新功能和修复。The libraries are updated with each release to include new features and fixes.

Azure Databricks 已将受支持的库的子集指定为顶层库。Azure Databricks has designated a subset of the supported libraries as top-tier libraries. 对于这些库,Azure Databricks 提供了更快的更新节奏,并使用每个运行时版本更新到最新的包版本(禁止依赖项冲突)。For these libraries, Azure Databricks provides a faster update cadence, updating to the latest package releases with each runtime release (barring dependency conflicts). Azure Databricks 还为顶层库提供高级支持、测试以及嵌入式优化。Azure Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.

有关顶层库和其他提供的库的完整列表,请参阅以下有关每个可用运行时的文章:For a full list of top-tier and other provided libraries, see the following articles for each available runtime:

如何使用 Databricks Runtime MLHow to use Databricks Runtime ML

除了预安装的库之外,Databricks Runtime ML 与群集配置中的 Databricks Runtime 和管理 Python 包方式有所不同。In addition to the pre-installed libraries, Databricks Runtime ML differs from Databricks Runtime in the cluster configuration and in how you manage Python packages.

使用 Databricks Runtime ML 创建群集Create a cluster using Databricks Runtime ML

创建群集时,请从“Databricks 运行时版本”下拉列表中选择 Databricks Runtime ML 版本。When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. CPU 和启用 GPU 的 ML 运行时均可用。Both CPU and GPU-enabled ML runtimes are available.

选择 Databricks Runtime MLSelect Databricks Runtime ML

如果选择已启用 GPU 的 ML 运行时,系统将提示你选择兼容的驱动程序类型和辅助角色类型 。If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. 下拉列表中不兼容的实例类型将灰显。Incompatible instance types are grayed out in the drop-downs. “GPU 加速”标签下列出了已启用 GPU 的实例类型。GPU-enabled instance types are listed under the GPU-Accelerated label.

警告

工作区中自动安装到所有群集的库可能与 Databricks Runtime ML 中包含的库冲突。Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. 在使用 Databricks Runtime ML 创建群集之前,为了避免库冲突,请清除“在所有群集上自动安装”复选框。Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries.

管理 Python 包 Manage Python packages

在 Databricks Runtime ML 中,Conda 包管理器用于安装 Python 包。In Databricks Runtime ML the Conda package manager is used to install Python packages. 所有 Python 包都安装在单个环境中:/databricks/python2 在使用 Python 2 的群集上,/databricks/python3 在使用 Python 3 群集上。All Python packages are installed inside a single environment: /databricks/python2 on clusters using Python 2 and /databricks/python3 on clusters using Python 3. 不支持切换(或激活)Conda 环境。Switching (or activating) Conda environments is not supported.

有关管理 Python 库的信息,请参阅For information on managing Python libraries, see Libraries.

AutoML 支持AutoML support

Databricks Runtime ML 包括用于自动执行模型开发过程的工具,并帮助你有效地查找性能最佳的模型。Databricks Runtime ML includes tools to automate the model development process and help you efficiently find the best performing model.

  • 托管的 MLFlow 管理端到端模型生命周期,包括跟踪试验运行、部署和共享模型以及维护集中式模型注册表。Managed MLFlow manages the end-to-end model lifecycle, including tracking experimental runs, deploying and sharing models, and maintaining a centralized model registry.
  • Hyperopt,扩充了 SparkTrials 类,可自动执行并分发 ML 模型参数优化。Hyperopt, augmented with the SparkTrials class, automates and distributes ML model parameter tuning.