2019 年 2 月February 2019

这些功能和 Azure Databricks 平台的改进已于2019年2月发布。These features and Azure Databricks platform improvements were released in February 2019.

备注

发布。Releases are staged. 在首次发布日期之后的一个星期内,你的 Azure Databricks 帐户可能不会更新。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Databricks Light 已推出正式版Databricks Light generally available

2019年2月26日-:版本2.92February 26 - March 5, 2019: Version 2.92

现在可以使用 Databricks 灯(也称为数据工程轻型)。Databricks Light (also known as Data Engineering Light) is now available. Databricks Light 是开放源代码 Apache Spark 运行时的 Databricks 打包。Databricks Light is the Databricks packaging of the open source Apache Spark runtime. 它为不需要 Databricks Runtime 所提供的高级性能、可靠性或自动缩放优势的作业提供运行时选项。It provides a runtime option for jobs that don’t need the advanced performance, reliability, or autoscaling benefits provided by Databricks Runtime. 仅当创建运行 JAR、Python 或 spark-submit 作业的群集时,才可以选择 Databricks Light;对于要在其上运行交互式或笔记本作业工作负荷的群集,不能选择此运行时。You can select Databricks Light only when you create a cluster to run a JAR, Python, or spark-submit job; you cannot select this runtime for clusters on which you run interactive or notebook job workloads. 请参阅Databricks LightSee Databricks Light.

Azure Databricks 上的托管 MLflow 公共预览版Managed MLflow on Azure Databricks Public Preview

2019年2月26日-:版本2.92February 26 - March 5, 2019: Version 2.92

MLflow 是用于管理端到端机器学习生命周期的开源平台。MLflow is an open source platform for managing the end-to-end machine learning lifecycle. 它处理三个主要功能:It tackles three primary functions:

  • 跟踪试验以记录和比较参数和结果。Tracking experiments to record and compare parameters and results.
  • 管理各种 ML 库中的模型并将其部署到各种模型服务和推理平台。Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms.
  • 将 ML 代码打包为可重复使用的可重复使用的形式,以便与其他数据科学家共享或转移到生产。Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.

Azure Databricks 现在提供了一个完全托管和托管版本的 MLflow,该版本与企业安全功能、高可用性和其他 Azure Databricks 工作区功能(如实验管理、运行管理和笔记本修订捕获)集成。Azure Databricks now provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Azure Databricks workspace features such as experiment management, run management, and notebook revision capture. Azure Databricks 上的 MLflow 提供集成体验用于跟踪和保护机器学习模型训练运行,以及运行机器学习项目。MLflow on Azure Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. 通过在 Azure Databricks 上使用托管的 MLflow,可以获得这两种平台的优势,包括:By using managed MLflow on Azure Databricks, you get the advantages of both platforms, including:

  • 工作区: 使用托管的 MLflow 跟踪服务器和集成实验 UI,在 Azure Databricks 工作区中协作跟踪和组织试验和结果。Workspaces: Collaboratively track and organize experiments and results within Azure Databricks Workspaces with a hosted MLflow Tracking Server and integrated experiment UI. 使用笔记本中的 MLflow 时,Azure Databricks 自动捕获笔记本修订版本,以便您可以重现相同的代码并稍后运行。When you use MLflow in notebooks, Azure Databricks automatically captures notebook revisions so you can reproduce the same code and runs later.
  • 安全性: 通过 Acl 利用整个 ML 生命周期的一个通用安全模型。Security: Take advantage of one common security model for the entire ML lifecycle via ACLs.
  • 作业: Azure Databricks 远程和直接从 Azure Databricks 笔记本中运行 MLflow 项目。Jobs: Run MLflow projects as Azure Databricks jobs remotely and directly from Azure Databricks notebooks.

下面是 Azure Databricks 工作区中跟踪工作流的演示:Here’s a demo of a tracking workflow in an Azure Databricks Workspace:

跟踪运行并组织试验工作流Track runs and organize experiment workflow

有关详细信息,请参阅 Azure Databricks 上的试验运行 MLflow 项目For details, see Experiments and Run MLflow Projects on Azure Databricks.

Azure Data Lake Storage Gen2 连接器已推出正式版Azure Data Lake Storage Gen2 connector is generally available

2019 年 2 月 15 日February 15, 2019

Azure Data Lake Storage Gen2 (ADLS Gen2)是用于大数据分析的下一代 Data Lake 解决方案,现已正式发布,这与 Azure Databricks 的 ADLS Gen2 连接器相同。Azure Data Lake Storage Gen2 (ADLS Gen2), the next-generation data lake solution for big data analytics, is now GA, as is the ADLS Gen2 connector for Azure Databricks. 在 Databricks Runtime 5.2 及更高版本上运行群集时,还很高兴地宣布 ADLS Gen2 支持 Databricks 增量。We are also pleased to announce that ADLS Gen2 supports Databricks Delta when you are running clusters on Databricks Runtime 5.2 and above.

Python 3 现已成为创建群集时的默认版本Python 3 now the default when you create clusters

12-19 年2月,2019:版本2.91February 12-19, 2019: Version 2.91

使用 UI 创建的群集的默认 Python 版本已从 Python 2 切换到 Python 3。The default Python version for clusters created using the UI has switched from Python 2 to Python 3. 使用 REST API 创建的群集的默认值仍为 Python 2。The default for clusters created using the REST API is still Python 2.

现有群集不会更改其 Python 版本。Existing clusters will not change their Python versions. 但是,如果您在创建新群集时采用了默认的 Python 2 默认值,则需要开始关注 Python 版本选择。But if you’ve been in the habit of taking the Python 2 default when you create new clusters, you’ll need to start paying attention to your Python version selection.

默认 Python 版本Default Python version

请参阅Python 版本See Python version.

Delta Lake 现已推出正式版Delta Lake generally available

2019年2月1日February 1, 2019

如今,每个人都可以获得 Databricks Delta 功能强大的事务存储层和超高速度读取的优点:从2月1日开始,Delta Lake 在所有受支持版本的 Databricks Runtime 上都可用。Now everyone can get the benefits of Databricks Delta’s powerful transactional storage layer and super-fast reads: as of February 1, Delta Lake is GA and available on all supported versions of Databricks Runtime. 有关差异的信息,请参阅Delta LakeFor information about Delta, see the Delta Lake.