Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Azure Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics.
Databricks Runtime for Genomics is generally available (GA) beginning with version 6.0.
What’s in Databricks Runtime for Genomics?
- An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as
- Spark SQL support for reading and writing variant data
- Functions for common workflow elements
- Optimizations for common query patterns
- Turn-key pipelines parallelized with Apache Spark
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability
- ADAM v0.25.0
- GATK v126.96.36.199
- Hadoop-bam v7.9.2
- Popular command line tools
- samtools v1.9
- Reference data (grch37 or 38, known SNP sites)
Your Azure Databricks workspace must have Databricks Runtime for Genomics enabled.
Create a cluster using Databricks Runtime for Genomics
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.