July 2019

These features and Azure Databricks platform improvements were released in July 2019.

Note

Releases are staged. Your Azure Databricks account may not be updated until up to a week after the initial release date.

Coming soon: Databricks 6.0 will not support Python 2

In anticipation of the upcoming end of life of Python 2, announced for 2020, Python 2 will not be supported in Databricks Runtime 6.0. Earlier versions of Databricks Runtime will continue to support Python 2. We expect to release Databricks Runtime 6.0 later in 2019.

Preload the Databricks Runtime version on pool idle instances

July 30 - Aug 6, 2019: Version 2.103

You can now speed up pool-backed cluster launches by selecting a Databricks Runtime version to be loaded on idle instances in the pool. The field on the Pool UI is called Preloaded Spark Version.

Preloaded Spark version

Custom cluster tags and pool tags play better together

July 30 - Aug 6, 2019: Version 2.103

Earlier this month, Azure Databricks introduced pools, a set of idle instances that help you spin up clusters fast. In the original release, pool-backed clusters inherited default and custom tags from the pool configuration, and you could not modify these tags at the cluster level. Now you can configure custom tags specific to a pool-backed cluster, and that cluster will apply all custom tags, whether inherited from the pool or assigned to that cluster specifically. You cannot add a cluster-specific custom tag with the same key name as a custom tag inherited from a pool (that is, you cannot override a custom tag that is inherited from the pool). For details, see Pool tags.

MLflow 1.1 brings several UI and API improvements

July 30 - Aug 6, 2019: Version 2.103

MLflow 1.1 introduces several new features to improve UI and API usability:

  • The runs overview UI now lets you browse through multiple pages of runs if the number of runs exceeds 100. After the 100th run, click the Load more button to load the next 100 runs.

    Paged runs

  • The compare runs UI now provides a parallel coordinates plot. The plot allows you to observe relationships between an n-dimensional set of parameters and metrics. It visualizes all runs as lines that are color-coded based on the value of a metric (for example, accuracy), and shows the parameter values that each run took on.

    Parallel coordinates plot

  • Now you can add and edit tags from the run overview UI and view tags in the experiment search view.

  • The new MLflowContext API lets you create and log runs in a way that is similar to the Python API. This API contrasts with the existing low-level MlflowClient API, which simply wraps the REST APIs.

  • You can now delete tags from MLflow runs using the DeleteTag API.

For details, see the MLflow 1.1 blog post. For the complete list of features and fixes, see the MLflow Changelog.

pandas DataFrame display renders like it does in Jupyter

July 30 - Aug 6, 2019: Version 2.103

Now when you call a pandas DataFrame, it will render the same way as it does in Jupyter.

Display pandas DataFrame

New regions

July 30, 2019

Azure Databricks is now available in the following additional regions:

  • Korea Central
  • South Africa North

Databricks Runtime 5.5 with Conda (Beta)

July 23, 2019

Important

Databricks Runtime with Conda is in Beta. The contents of the supported environments may change in upcoming Beta releases. Changes can include the list of packages or versions of installed packages. Databricks Runtime 5.5 with Conda is built on top of Databricks Runtime 5.5 LTS (unsupported).

The Databricks Runtime 5.5 with Conda release adds a new notebook-scoped library API to support updating the notebook’s Conda environment with a YAML specification (see Conda documentation).

See the complete release notes at Databricks Runtime 5.5 with Conda (unsupported).

Updated metastore connection limit

July 16 - 23, 2019: Version 2.102

New Azure Databricks workspaces in eastus, eastus2, centralus, westus, westus2, westeurope, northeurope will have a higher metastore connection limit of 250. Existing workspaces will continue to use the current metastore with no disruption and continue to have a connection limit of 100.

Set permissions on pools (Public Preview)

July 16 - 23, 2019: Version 2.102

The pool UI now supports setting permissions on who can manage pools and who can attach clusters to pools.

For details, see Pool permissions.

Databricks Runtime 5.5 for Machine Learning

July 15, 2019

Databricks Runtime 5.5 ML is built on top of Databricks Runtime 5.5 LTS (unsupported). It contains many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost, and provides distributed TensorFlow training using Horovod.

This release includes the following new features and improvements:

  • Added the MLflow 1.0 Python package
  • Upgraded machine learning libraries
    • TensorFlow upgraded from 1.12.0 to 1.13.1
    • PyTorch upgraded from 0.4.1 to 1.1.0
    • scikit-learn upgraded from 0.19.1 to 0.20.3
  • Single-node operation for HorovodRunner

For details, see Databricks Runtime 5.5 LTS for ML (unsupported).

Databricks Runtime 5.5

July 15, 2019

Databricks Runtime 5.5 is now available. Databricks Runtime 5.5 includes Apache Spark 2.4.3, upgraded Python, R, Java, and Scala libraries, and the following new features:

  • Delta Lake on Azure Databricks Auto Optimize GA
  • Delta Lake on Azure Databricks improved min, max, and count aggregation query performance
  • Faster model inference pipelines with improved binary file data source and scalar iterator pandas UDF (Public Preview)
  • Secrets API in R notebooks

For details, see Databricks Runtime 5.5 LTS (unsupported).

Keep a pool of instances on standby for quick cluster launch (Public Preview)

July 9 - 11, 2019: Version 2.101

To reduce cluster start time, Azure Databricks now supports attaching a cluster to a pre-defined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, the pool expands by allocating new instances from the cloud provider. When an attached cluster is terminated, the instances it used are returned to the pool and can be reused by a different cluster.

Azure Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply. See pricing.

For details, see Pool configuration reference.

Ganglia metrics

July 9 - 11, 2019: Version 2.101

Ganglia is a scalable distributed monitoring system that is now available on Azure Databricks clusters. Ganglia metrics help you to monitor cluster performance and health. You can access Ganglia metrics from the cluster details page:

Ganglia Metrics tab

For details on using and configuring metrics, see Ganglia metrics.

Global series color

July 9 - 11, 2019: Version 2.101

You can now specify that the colors of a series should be consistent across all charts in your notebook. See Color consistency across charts.

Global series color