2020 年 5 月May 2020

此功能和 Azure Databricks 平台的改进已于2020年5月发布。These features and Azure Databricks platform improvements were released in May 2020.

备注

发布。Releases are staged. 在首次发布日期之后的一个星期内,你的 Azure Databricks 帐户可能不会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Easv4 系列 VM(Beta 版本)Easv4-series VMs (Beta)

5月29日2020May 29, 2020

Azure Databricks 现在为 Easv4 系列 的 Vm 提供 Beta 支持,这些 vm 使用高级 SSD,并可实现 3.35 ghz 的提升最大频率。Azure Databricks now provides Beta support for Easv4-series VMs, which use a premium SSD and can achieve a boosted maximum frequency of 3.35GHz. 这些实例类型可以优化内存密集型企业应用程序的工作负荷性能。These instance types can optimize your workload performance for memory-intensive enterprise applications.

用于基因组学的 Databricks Runtime 6.6 正式版Databricks Runtime 6.6 for Genomics GA

5月26日,2020May 26, 2020

基因组学的 Databricks Runtime 6.6 是在 Databricks Runtime 6.6 之上构建的,其中包含以下新功能:Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 读取器GFF3 reader
  • 自定义引用基因组支持Custom reference genome support
  • 每个样本的管道超时Per-sample pipeline timeouts
  • BAM 导出选项BAM export option
  • 清单 blobManifest blobs

有关详细信息,请参阅完整的 Databricks Runtime 6.6 For 基因组学 发行说明。For more information, see the complete Databricks Runtime 6.6 for Genomics release notes.

Databricks Runtime 6.6 ML 正式版Databricks Runtime 6.6 ML GA

5月26日,2020May 26, 2020

Databricks Runtime 6.6 ML 构建于 Databricks Runtime 6.6 之上,并包括以下新功能:Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • 已将 mlflow:1.7.0 升级到1.8。0Upgraded mlflow: 1.7.0 to 1.8.0

有关详细信息,请参阅完整的 Databricks Runtime 6.6 ML 发行说明。For more information, see the complete Databricks Runtime 6.6 ML release notes.

Databricks Runtime 6.6 正式版Databricks Runtime 6.6 GA

5月26日,2020May 26, 2020

Databricks Runtime 6.6 引入了许多库升级和新功能,包括以下增量 Lake 功能:Databricks Runtime 6.6 brings many library upgrades and new features, including the following Delta Lake features:

  • 现在,可以通过操作自动提升表的架构 mergeYou can now evolve the schema of the table automatically with the merge operation. 这对于以下情况非常有用:您希望将数据 upsert 更改为一个表,并在一段时间内更改数据的架构。This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. 不是在插入之前检测并应用架构更改,而是 merge 可以同时演化架构并 upsert 更改。Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. 请参阅 自动架构演变See Automatic schema evolution.
  • 已改进仅具有匹配子句的合并操作的性能,即,它们只有 updatedelete 操作,无 insert 操作。The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.
  • 现在,在 Hive 元存储中引用的 Parquet 表通过其表标识符可转换为增量 Lake CONVERT TO DELTAParquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

有关详细信息,请参阅完整的 Databricks Runtime 6.6 发行说明。For more information, see the complete Databricks Runtime 6.6 release notes.

DBFS REST API 删除终结点大小限制DBFS REST API delete endpoint size limit

5月21-28,2020:版本3.20May 21-28, 2020: Version 3.20

使用 DBFS API以递归方式删除大量文件时,删除操作将以增量方式执行。When you delete a large number of files recursively using the DBFS API, the delete operation is done in increments. 此调用在大约45s 后返回响应,并返回一条错误消息,要求您重新调用删除操作,直到目录结构被完全删除。The call returns a response after approximately 45s with an error message asking you to re-invoke the delete operation until the directory structure is fully deleted. 例如:For example:

{
  "error_code":"PARTIAL_DELETE","message":"The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}

轻松查看大量已注册的 MLflow 模型Easily view large numbers of MLflow registered models

5月21-28,2020:版本3.20May 21-28, 2020: Version 3.20

MLflow 模型注册表现在支持用于已注册模型的服务器端搜索和分页,这使具有大量模型的组织可以高效地执行列出和搜索。The MLflow Model Registry now supports server-side search and pagination for registered models, which enables organizations with large numbers of models to efficiently perform listing and search. 与之前一样,你可以按名称搜索模型并获取按名称或上次更新时间排序的结果。As before, you can search models by name and get results ordered by name or the last updated time. 但是,如果您有大量的模型,则这些页面的加载速度将更快,搜索将提取最新的模型视图。However, if you have a large number of models, the pages will load much faster, and search will fetch the most up-to-date view of models.

配置为要安装在所有群集上的库未安装在运行 Databricks Runtime 7.0 及更高版本的群集上Libraries configured to be installed on all clusters are not installed on clusters running Databricks Runtime 7.0 and above

5月21-28,2020:版本3.20May 21-28, 2020: Version 3.20

在 Databricks Runtime 7.0 及更高版本中,Apache Spark 的基础版本使用 Scala 2.12。In Databricks Runtime 7.0 and above, the underlying version of Apache Spark uses Scala 2.12. 由于针对 Scala 2.11 编译的库可以以意外的方式禁用 Databricks Runtime 7.0 群集,运行 Databricks Runtime 7.0 及更高版本的群集不会安装 配置为安装在所有群集上的库。Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. "群集 库" 选项卡 显示与 Skipped 库处理中的更改相关的状态和弃用消息。The cluster Libraries tab shows a status Skipped and a deprecation message related to the changes in library handling.

如果你的群集是在 _3.20 之前_的 Databricks Runtime 版本中创建的,并且你现在编辑该群集以使用 Databricks Runtime 7.0,则配置为在所有群集上安装的任何库将安装在该群集上。If you have a cluster that was created on an earlier version of Databricks Runtime before 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. 在这种情况下,已安装库中的所有不兼容 Jar 都可能导致群集被禁用。In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. 解决方法是克隆群集或创建新群集。The workaround is either to clone the cluster or to create a new cluster.

用于基因组学的 Databricks Runtime 7.0(Beta 版本)Databricks Runtime 7.0 for Genomics (Beta)

5月21日,2020May 21, 2020

基因组学的 Databricks Runtime 7.0 是在 Databricks Runtime 7.0 之上构建的,其中包括以下库更改:Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0 and includes the following library changes:

  • ADAM 库已从版本0.30.0 更新为0.32.0。The ADAM library has been updated from version 0.30.0 to 0.32.0.
  • Hail 库不包含在基因组学 Databricks Runtime 7.0 中,因为没有基于 Apache Spark 3.0 的发布。The Hail library is not included in Databricks Runtime 7.0 for Genomics as there is no release based on Apache Spark 3.0.

有关详细信息,请参阅完整的 Databricks Runtime 7.0 For 基因组学 发行说明。For more information, see the complete Databricks Runtime 7.0 for Genomics release notes.

Databricks Runtime 7.0 ML(Beta 版本)Databricks Runtime 7.0 ML (Beta)

5月21日,2020May 21, 2020

Databricks Runtime 7.0 ML 构建于 Databricks Runtime 7.0 之上,并包括以下新功能:Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0 and includes the following new features:

  • 笔记本范围的 Python 库和由 conda 和 pip 命令管理的自定义环境。Notebook-scoped Python libraries and custom environments managed by conda and pip commands.
  • 主要 Python 包的更新,包括 tensorflow、tensorboard、pytorch、xgboost、sparkdl 和 hyperopt。Updates for major Python packages including tensorflow, tensorboard, pytorch, xgboost, sparkdl, and hyperopt.
  • 新添加的 Python 包 lightgbm、nltk、petastorm 和 plotly。Newly added Python packages lightgbm, nltk, petastorm, and plotly.
  • RStudio 服务器开源版本1.2。RStudio Server Open Source v1.2.

有关详细信息,请参阅完整的 Databricks Runtime 7.0 ML 发行说明。For more information, see the complete Databricks Runtime 7.0 ML release notes.

用于基因组学的 Databricks Runtime 6.6(Beta 版本)Databricks Runtime 6.6 for Genomics (Beta)

2020 年 5 月 7 日May 7, 2020

基因组学的 Databricks Runtime 6.6 是在 Databricks Runtime 6.6 之上构建的,其中包含以下新功能:Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 读取器GFF3 reader
  • 自定义引用基因组支持Custom reference genome support
  • 每个样本的管道超时Per-sample pipeline timeouts
  • BAM 导出选项BAM export option
  • 清单 blobManifest blobs

有关详细信息,请参阅完整的 Databricks Runtime 6.6 For 基因组学 发行说明。For more information, see the complete Databricks Runtime 6.6 for Genomics release notes.

Databricks Runtime 6.6 ML(Beta 版本)Databricks Runtime 6.6 ML (Beta)

2020 年 5 月 7 日May 7, 2020

Databricks Runtime 6.6 ML 构建于 Databricks Runtime 6.6 之上,并包括以下新功能:Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • 已将 mlflow:1.7.0 升级到1.8。0Upgraded mlflow: 1.7.0 to 1.8.0

有关详细信息,请参阅完整的 Databricks Runtime 6.6 ML 发行说明。For more information, see the complete Databricks Runtime 6.6 ML release notes.

Databricks Runtime 6.6(Beta 版本)Databricks Runtime 6.6 (Beta)

2020 年 5 月 7 日May 7, 2020

Databricks Runtime 6.6 (Beta) 提供了许多库升级和新功能,包括以下增量 Lake 功能:Databricks Runtime 6.6 (Beta) brings many library upgrades and new features, including the following Delta Lake features:

  • 现在,可以通过操作自动提升表的架构 mergeYou can now evolve the schema of the table automatically with the merge operation. 这对于以下情况非常有用:您希望将数据 upsert 更改为一个表,并在一段时间内更改数据的架构。This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. 不是在插入之前检测并应用架构更改,而是 merge 可以同时演化架构并 upsert 更改。Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. 请参阅 自动架构演变See Automatic schema evolution.
  • 已改进仅具有匹配子句的合并操作的性能,即,它们只有 updatedelete 操作,无 insert 操作。The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.
  • 现在,在 Hive 元存储中引用的 Parquet 表通过其表标识符可转换为增量 Lake CONVERT TO DELTAParquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

有关详细信息,请参阅完整的 Databricks Runtime 6.6 发行说明。For more information, see the complete Databricks Runtime 6.6 release notes.

作业群集现在标有作业名称和 IDJob clusters now tagged with job name and ID

5月5-12,2020:版本3.19May 5-12, 2020: Version 3.19

作业群集将自动标记为作业名称和 ID。Job clusters are automatically tagged with the job name and ID. 这些标记将显示在 "计费使用情况" 报表中,以便你可以轻松地按作业来属性 DBU 使用情况并标识异常。The tags appear in the billable usage reports so that you can easily attribute your DBU usage by job and identify anomalies. 将标记净化为分类标记规范,如允许的字符、最大大小和最大标记数。The tags are sanitized to cluster tag specifications, such as allowed characters, maximum size, and maximum number of tags. 作业名称包含在 RunName 标记中,作业 ID 包含在 JobId 标记中。The job name is contained in the RunName tag and the job ID is contained in the JobId tag.

还原已删除的笔记本单元Restore deleted notebook cells

5月5-12,2020:版本3.19May 5-12, 2020: Version 3.19

你现在可以通过使用 (Z) 键盘快捷方式或通过选择 " 编辑" > "撤消删除单元格" 来还原已删除的单元格。You can now restore deleted cells either by using the (Z) keyboard shortcut or by selecting Edit > Undo Delete Cells.

作业挂起队列限制Jobs pending queue limit

5月5-12,2020:版本3.19May 5-12, 2020: Version 3.19

工作区现在限制为1000活动 (正在运行且挂起) 作业运行。A workspace is now limited to 1000 active (running and pending) job runs. 由于工作区限制为150并发 (运行) 作业运行,因此工作区最多可以有850个挂起队列。Since a workspace is limited to 150 concurrent (running) job runs, a workspace can have up to 850 runs in the pending queue.