Create a Hail cluster
To create a cluster with Hail installed:
Set the following environment variable:
This environment variable causes the cluster to launch with Hail 0.2, its dependencies, and Python 3.6 installed.
Use Hail in a notebook
For the most part, Hail 0.2 code in Azure Databricks works identically to the Hail documentation. However, there are a few modifications that are necessary for the Azure Databricks environment.
When initializing Hail, pass in the pre-created
SparkContext and mark the initialization
as idempotent. This setting enables multiple Azure Databricks notebooks to use the same Hail context.
import hail as hl hl.init(sc, idempotent=True)
Hail uses the Bokeh library to create plots. The
show function built into Bokeh does not work
in Azure Databricks. To display a Bokeh plot generated by Hail, you can run a command like:
from bokeh.embed import components, file_html from bokeh.resources import CDN plot = hl.plot.histogram(mt.DP, range=(0,30), bins=30, title='DP Histogram', legend='DP') html = file_html(plot, CDN, "Chart") displayHTML(html)
See Bokeh in Python Notebooks for more information.
- When Hail support is enabled, your cluster uses Python 3.6, so notebooks written against different versions of Python may not work.
- When Hail support is enabled, fewer Python libraries are installed by default. You can still use the Libraries feature to install new libraries.
After you’ve set up a Hail cluster, try out the Hail overview notebook.
Hail overview notebook