May 2018

Releases are staged. Your Azure Databricks account may not be updated until a week after the initial release date.

General Data Protection Regulation (GDPR)

May 24, 2018: Version 2.72

To meet the requirements of the European Union General Data Protection Regulation (GDPR), which goes into effect on May 25, 2018, we have made a number of modifications to the Azure Databricks platform to provide you with more control of data retention at both the account and user level. Updates include:

  • Cluster delete: permanently delete a cluster configuration using the UI or the Clusters API. See Delete a compute.
  • Workspace purge (released in version 2.71): permanently delete workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. See Purge workspace storage.
  • Notebook revision history purge:
    • Permanently delete the revision history of all notebooks in a workspace for a defined time frame. See Purge workspace storage.
    • Permanently delete a single notebook revision or the entire revision history of a notebook. See Version history.

For information about deleting your Azure Databricks service or canceling your Azure account, see Manage your subscription.

Azure Databricks users must belong to Microsoft Entra ID tenant

May 24, 2018: Version 2.72

Users can now sign in to Azure Databricks only if they belong to the Microsoft Entra ID (formerly Azure Active Directory) tenant of the Azure Databricks workspace. If you have users who do not belong to the Microsoft Entra ID tenant, you can add them as standard or guest users.

HorovodEstimator

May 29, 2018: Version 2.72

Added documentation and a notebook for HorovodEstimator, an MLlib-style estimator API that leverages Uber’s Horovod framework. HorovodEstimator facilitates distributed, multi-GPU training of deep neural networks on Spark DataFrames, simplifying the integration of ETL in Spark with model training in TensorFlow.

MLeap ML Model Export

May 24, 2018: Version 2.72

Added documentation and notebooks on using MLeap on Azure Databricks. MLeap allows you to deploy machine learning pipelines from Apache Spark and scikit-learn to a portable format and execution engine. See MLeap ML model export.

Even more GPU cluster types

May 24, 2018: Version 2.72

In addition to the Azure NC instance types (NC12 and NC24) that we added in Release 2.71, we now support the NCv3 instance type series (NC6s_v3, NC12s_v3, and NC24s_v3) on Azure Databricks clusters. NC and NCv3 instances provide GPUs to power image processing, text analysis, and other machine learning and deep learning tasks that are computationally challenging and demand superior performance.

See GPU-enabled compute.

Notebook cells: hide and show

May 24, 2018: Version 2.72

New indicators and messaging make it easier to show Notebook cell contents after they’ve been hidden. See Hide and show cell content.

May 22, 2018

We have replaced our doc site search with a better search tool. You’ll see even more search improvements over the coming weeks.

Note

Search may look broken if you try it shortly after the new search is deployed. Just clear your browser cache to see the new search experience.

Databricks Runtime 4.1 ML for Machine Learning (Beta)

May 17, 2018

Databricks Runtime ML (Beta) provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, Keras, and XGBoost.

Databricks Runtime ML lets you start a Databricks cluster with all of the libraries required for distributed TensorFlow training. It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially decreases the cluster start-up time compared to using init scripts.

Note

Databricks Runtime 4.1 ML is available only in the Premium SKU.

See the complete release notes for Databricks Runtime 4.1 ML (unsupported).

Databricks Delta

May 17, 2018

Databricks Delta is now available in Private Preview to Azure Databricks users. Contact your account manager or sign up at https://databricks.com/product/databricks-delta. This release represents a candidate release in anticipation of the upcoming GA release.

For more information, see Databricks Runtime 4.1 (unsupported) and What is Delta Lake?.

Display() support for image data types

May 17, 2018

In Databricks Runtime 4.1, display() now renders columns containing image data types as rich HTML.

See Visualizations in Databricks notebooks.

GPU cluster types

May 15, 2018: Version 2.71

We’re pleased to announce support for Azure NC instance types (NC12 and NC24) on Azure Databricks clusters. NC instances provide GPUs to power image processing, text analysis, and other machine learning and deep learning tasks that are computationally challenging and demand superior performance.

Azure Databricks also provides pre-installed NVIDIA drivers and libraries configured for GPUs, along with material for getting started with several popular deep learning libraries.

See also:

Secret management GA

May 15, 2018: Version 2.71

Secret management, which had been in private preview, is now GA. It provides powerful tools for managing the credentials you need for authenticating to external data sources. Instead of typing your credentials directly into a notebook, use Databricks secret management to store and reference your credentials in notebooks and jobs. To manage secrets, you can use the Secrets CLI (legacy) to access the Secrets API.

Note

Secret management requires Databricks Runtime 4.0 or above and Databricks CLI 0.7.1 or above.

See Secret management.

Secrets API endpoint and CLI command changes

May 15, 2018: Version 2.71

The following changes were made to the Secrets API endpoints:

  • For all endpoints, the root path was changed from /secret to /secrets.
  • For the secrets endpoint, the /secret/secrets was collapsed to /secrets/.
  • The write method was changed to put.

Databricks CLI 0.7.1 includes updates to Secrets commands to align with these updated API endpoints.

See Secrets API and Secret management.

Cluster pinning

May 15, 2018: Version 2.71

You can now pin a cluster to the Clusters list. This lets you retain the configuration of clusters terminated over 30 days old.

Pin cluster

In addition, the Clusters page now displays all clusters that were terminated within 30 days (increased from 7 days).

See Pin a compute.

Cluster autostart

May 15, 2018: Version 2.71

Before this release, jobs scheduled to run on Terminated clusters failed. For clusters created in Azure Databricks version 2.71 and above, commands from a JDBC/ODBC interface or a job run assigned to an existing terminated cluster automatically restarts that cluster. See JDBC connect and Create a job.

Autostart allows you to configure clusters to autoterminate, without requiring manual intervention to restart the clusters for scheduled jobs. Furthermore, you can schedule cluster initialization by scheduling a job that restarts terminated clusters at a specified time.

Cluster access control is enforced and job owner permissions are checked as usual.

Workspace purging

May 15, 2018: Version 2.71

As part of our ongoing effort to comply with the European Union General Data Protection Regulation (GDPR), we have added the ability to purge workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. We will release more functionality and documentation to support GDPR compliance in the coming weeks.

See Purge workspace storage.

Databricks CLI 0.7.1

May 10, 2018

Databricks CLI 0.7.1 includes updates to Secrets commands to align with updated API endpoints.

See Databricks CLI (legacy) and Secret management.