Databricks Runtime 6.2 for Genomics
Databricks released this image in December 2019.
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime 6.2 optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics. For more information on developing genomics applications, see Genomics.
Databricks Runtime 6.2 for Genomics is built on top of Databricks Runtime 6.2. For information on what’s new in Databricks Runtime 6.2, see the Databricks Runtime 6.2 release notes.
Firth logistic regression
User-defined sample quality control metrics
You can aggregate over genotypes for each sample in a DataFrame using aggregate_by_index. This function allows you to compute per-sample quality control (QC) metrics that are included in built-in QC functions.
Pipe transformer performance
The overhead of the pipe transformer has been reduced by roughly half. This speedup means that you can use Databricks Runtime for Genomics to parallelize command-line tools without sacrificing per-core efficiency.
Joint genotyping robustness
The joint genotyping provided in Databricks Runtime 6.2 for Genomics more efficiently handles sample manifests with thousands of entries. In addition, the pipeline now handles missing gVCF blocks gracefully by inserting explicit no-calls.
Simplified integration with LOFTEE
The VEP annotation pipeline included in Databricks Runtime for Genomics provides streamlined integration with LOFTEE.
Databricks Runtime 6.2 for Genomics includes Hail 0.26.0.
Samtools 1.9 is now installed.