2019 年 2 月February 2019

這些功能和 Azure Databricks 平臺改進已于2019年2月發行。These features and Azure Databricks platform improvements were released in February 2019.

注意

發行是暫存的。Releases are staged. 在初始發行日期之後,您的 Azure Databricks 帳戶可能不會更新到一周。Your Azure Databricks account may not be updated until up to a week after the initial release date.

Databricks Light 正式推出Databricks Light generally available

2019年2月26日-年3月5日:2.92 版February 26 - March 5, 2019: Version 2.92

Databricks Light (也稱為資料工程輕量) 現在已可供使用。Databricks Light (also known as Data Engineering Light) is now available. Databricks Light 是開放原始碼 Apache Spark runtime 的 Databricks 封裝。Databricks Light is the Databricks packaging of the open source Apache Spark runtime. 其可針對不需要 Databricks Runtime 所提供進階效能、可靠性或自動調整優點的作業,提供執行階段選項。It provides a runtime option for jobs that don’t need the advanced performance, reliability, or autoscaling benefits provided by Databricks Runtime. 只有當您建立叢集來執行 JAR、Python 或 spark-submit 作業時,才能選取 Databricks Light;您無法為用來執行互動式或筆記本作業工作負載的叢集選取此執行階段。You can select Databricks Light only when you create a cluster to run a JAR, Python, or spark-submit job; you cannot select this runtime for clusters on which you run interactive or notebook job workloads. 請參閱 Databricks LightSee Databricks Light.

Azure Databricks 上的受控 MLflow 已公開預覽Managed MLflow on Azure Databricks Public Preview

2019年2月26日-年3月5日:2.92 版February 26 - March 5, 2019: Version 2.92

MLflow 是一個開放原始碼平台,可供您管理端對端機器學習生命週期。MLflow is an open source platform for managing the end-to-end machine learning lifecycle. 它 esposito 著手處理三個主要功能:It tackles three primary functions:

  • 追蹤實驗來記錄和比較參數和結果。Tracking experiments to record and compare parameters and results.
  • 從各種 ML 程式庫管理和部署模型到各種模型服務和推斷平臺。Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms.
  • 將 ML 程式碼封裝成可重複使用、可重現的表單,以與其他資料科學家共用或轉移至生產環境。Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.

Azure Databricks 現在提供與企業安全性功能、高可用性和其他 Azure Databricks 工作區功能(例如實驗管理、執行管理和筆記本修訂版)整合的 MLflow 完全受控和裝載版本。Azure Databricks now provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Azure Databricks workspace features such as experiment management, run management, and notebook revision capture. Azure Databricks 上的 MLflow 可提供整合式體驗,讓您追蹤和保護機器學習模型訓練回合,以及執行機器學習專案。MLflow on Azure Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. 藉由在 Azure Databricks 上使用 managed MLflow,您可以獲得這兩種平臺的優點,包括:By using managed MLflow on Azure Databricks, you get the advantages of both platforms, including:

  • 工作區: 使用 hosted MLflow 追蹤伺服器與整合式實驗 UI,在 Azure Databricks 工作區中共同追蹤和組織實驗和結果。Workspaces: Collaboratively track and organize experiments and results within Azure Databricks Workspaces with a hosted MLflow Tracking Server and integrated experiment UI. 當您在筆記本中使用 MLflow 時,Azure Databricks 會自動捕捉筆記本修訂,讓您可以重現相同的程式碼,並于稍後執行。When you use MLflow in notebooks, Azure Databricks automatically captures notebook revisions so you can reproduce the same code and runs later.
  • 安全性: 透過 Acl 來利用整個 ML 生命週期的一個常見安全性模型。Security: Take advantage of one common security model for the entire ML lifecycle via ACLs.
  • 作業: 從遠端和直接從 Azure Databricks 筆記本,執行 MLflow 專案做為 Azure Databricks 的作業。Jobs: Run MLflow projects as Azure Databricks jobs remotely and directly from Azure Databricks notebooks.

以下是 Azure Databricks 工作區中追蹤工作流程的示範:Here’s a demo of a tracking workflow in an Azure Databricks Workspace:

追蹤執行和組織實驗工作流程Track runs and organize experiment workflow

如需詳細資訊,請參閱 Azure Databricks 上的 實驗執行 MLflow 專案For details, see Experiments and Run MLflow Projects on Azure Databricks.

Azure Data Lake Storage Gen2 連接器已正式推出Azure Data Lake Storage Gen2 connector is generally available

2019 年 2 月 15 日February 15, 2019

Azure Data Lake Storage Gen2 () ADLS Gen2 適用于大型資料分析的新一代 data Lake 解決方案,現已正式推出,其為 ADLS Gen2 的 Azure Databricks 連接器。Azure Data Lake Storage Gen2 (ADLS Gen2), the next-generation data lake solution for big data analytics, is now GA, as is the ADLS Gen2 connector for Azure Databricks. 我們也很高興宣佈,ADLS Gen2 在 Databricks Runtime 5.2 和更新版本上執行叢集時,支援 Databricks Delta。We are also pleased to announce that ADLS Gen2 supports Databricks Delta when you are running clusters on Databricks Runtime 5.2 and above.

Python 3 現在是您建立叢集時的預設值Python 3 now the default when you create clusters

2019年2月12-19 日:版本2.91February 12-19, 2019: Version 2.91

使用 UI 所建立之叢集的預設 Python 版本已從 Python 2 切換至 Python 3。The default Python version for clusters created using the UI has switched from Python 2 to Python 3. 使用 REST API 建立之叢集的預設值仍為 Python 2。The default for clusters created using the REST API is still Python 2.

現有的叢集不會變更其 Python 版本。Existing clusters will not change their Python versions. 但是,如果您習慣在建立新的叢集時採用 Python 2 預設值,就必須開始注意您的 Python 版本選取專案。But if you’ve been in the habit of taking the Python 2 default when you create new clusters, you’ll need to start paying attention to your Python version selection.

預設 Python 版本Default Python version

請參閱 Python 版本See Python version.

Delta Lake 正式推出Delta Lake generally available

2019年2月1日February 1, 2019

現在每個人都能獲得 Databricks Delta 強大交易儲存層和高度快速讀取的優點:自2月1日起,Delta Lake 正式運作,並可在所有支援的 Databricks Runtime 版本上使用。Now everyone can get the benefits of Databricks Delta’s powerful transactional storage layer and super-fast reads: as of February 1, Delta Lake is GA and available on all supported versions of Databricks Runtime. 如需差異的詳細資訊,請參閱 Delta Lake 和差異引擎指南For information about Delta, see the Delta Lake and Delta Engine guide.