適用于 Genomics 的 Databricks Runtime 6.3 (不支援的) Databricks Runtime 6.3 for Genomics (Unsupported)

Databricks 于2020年1月發行此映射。Databricks released this image in January 2020.

適用于 Genomics 的 Databricks Runtime (Databricks Runtime Genomics) 是 Databricks Runtime 6.3 的變異, (已針對使用基因和生物醫學資料優化 不支援的) Databricks Runtime for Genomics (Databricks Runtime Genomics) is a variant of Databricks Runtime 6.3 (Unsupported) optimized for working with genomic and biomedical data. 它是適用于 Genomics 的 Databricks 統一分析平臺的元件。It is a component of the Databricks Unified Analytics Platform for Genomics.

如需詳細資訊(包括建立 Genomics 叢集 Databricks Runtime 的指示),請參閱 Genomics (已淘汰) 的 Databricks Runtime For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics (Deprecated). 如需開發 genomics 應用程式的詳細資訊,請參閱 genomics 指南For more information on developing genomics applications, see Genomics guide.

新功能New features

Genomics 的 Databricks Runtime 6.3 是以 Databricks Runtime 6.3 為基礎。Databricks Runtime 6.3 for Genomics is built on top of Databricks Runtime 6.3. 如需 Databricks Runtime 6.3 新功能的詳細資訊,請參閱 Databricks Runtime 6.3 (不支援的) 版本資訊。For information on what’s new in Databricks Runtime 6.3, see the Databricks Runtime 6.3 (Unsupported) release notes.

來自差異的聯合基因型分型管線Joint genotyping pipeline from Delta

適用于 Genomics 的 Databricks Runtime 6.3 中的 聯合基因型分型 現在可以採用 DNASeq 管線所撰寫的 Delta 資料表做為輸入。The joint genotyping in Databricks Runtime 6.3 for Genomics can now take Delta tables written by the DNASeq pipeline as input. 此功能可讓您在不將結果匯出至 gVCFs 的情況下,將兩個管線一起使用。This functionality allows you to use the two pipelines together without exporting results to gVCFs.

讀取 VCFs 時自動注釋剖析Automatic annotation parsing when reading VCFs

Genomics 的 Databricks Runtime 6.3 中包含的 發光 版本會在 CSQ 讀取 VCFs 時自動剖析和 ANN 資訊欄位。The version of Glow included in Databricks Runtime 6.3 for Genomics automatically parses CSQ and ANN INFO fields when reading VCFs. INFO_CSQ````INFO_ANN結果資料框架中的欄位現在有結構化架構可簡化查詢。INFO_CSQ and INFO_ANN fields in the resulting DataFrames now have structured schemas for simplified querying.

改善Improvements

改良的 multiallelic variant 分隔器Improved multiallelic variant splitter

Multiallelic 的發光和 Databricks Runtime 中的 Genomics variant 分隔器現在會處理更複雜的 multiallelic 網站類型。The multiallelic variant splitter in Glow and Databricks Runtime for Genomics now handles more complex types of multiallelic sites. 新的行為會反映 vt 分解 命令列工具。The new behavior mirrors the vt decompose command line tool. 此外,您現在可以藉由呼叫來使用分隔器作為獨立的轉換器 glow.transform('split_multiallelics'...In addition, you can now use the splitter as a standalone transformer by calling glow.transform('split_multiallelics'....

更快速的線性和羅吉斯回歸函數Faster linear and logistic regression functions

logistic_regression_gwas適用于 Genomics 的 Databricks Runtime 6.3 中的函式比6.2 版的速度快上60%。The logistic_regression_gwas function in Databricks Runtime 6.3 for Genomics is about 60% faster than the version in 6.2. linear_regression_gwas 的速度大約是50%。linear_regression_gwas is about 50% faster.