Hail 0.2


Hail is supported in all releases of Databricks Runtime 6.x for Genomics and in Databricks Runtime 7.4 for Genomics and above.

Hail is a library built on Apache Spark for analyzing large genomic datasets. Hail 0.2 is integrated into Databricks Runtime for Genomics.

Create a Hail cluster

To create a cluster with Hail installed:

  1. Set the following environment variable:


    This environment variable causes the cluster to launch with Hail 0.2, its dependencies, and Python 3.6 installed.

Use Hail in a notebook

For the most part, Hail 0.2 code in Azure Databricks works identically to the Hail documentation. However, there are a few modifications that are necessary for the Azure Databricks environment.


When initializing Hail, pass in the pre-created SparkContext and mark the initialization as idempotent. This setting enables multiple Azure Databricks notebooks to use the same Hail context.


Enable skip_logging_configuration to save logs to the rolling driver log4j output. This setting is only supported in Databricks Runtime 6.6 for Genomics and above.

import hail as hl
hl.init(sc, idempotent=True, quiet=True, skip_logging_configuration=True)


Hail uses the Bokeh library to create plots. The show function built into Bokeh does not work in Azure Databricks. To display a Bokeh plot generated by Hail, you can run a command like:

from bokeh.embed import components, file_html
from bokeh.resources import CDN
plot = hl.plot.histogram(mt.DP, range=(0,30), bins=30, title='DP Histogram', legend='DP')
html = file_html(plot, CDN, "Chart")

See Bokeh for more information.


  • When Hail support is enabled, your cluster uses Python 3.6, so notebooks written against different versions of Python may not work.
  • When Hail support is enabled, fewer Python libraries are installed by default. You can still use the Libraries feature to install new libraries.

After you’ve set up a Hail cluster, try out the Hail overview notebook.

Hail overview notebook

Get notebook