適用于 Genomics 的 Databricks Runtime 7.0 (不支援的) Databricks Runtime 7.0 for Genomics (Unsupported)

Databricks 在2020年6月發行此映射。Databricks released this image in June 2020.

適用于 Genomics 的 Databricks Runtime 7.0 是 Databricks Runtime 7.0 的版本, (針對使用基因和生物醫學資料而優化的 不支援) Databricks Runtime 7.0 for Genomics is a version of Databricks Runtime 7.0 (Unsupported) optimized for working with genomic and biomedical data. 它是適用于 Genomics 的 Databricks 統一分析平臺的元件。It is a component of the Databricks Unified Analytics Platform for Genomics.

如需詳細資訊(包括建立 Genomics 叢集 Databricks Runtime 的指示),請參閱 Genomics (已淘汰) 的 Databricks Runtime For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics (Deprecated). 如需開發 genomics 應用程式的詳細資訊,請參閱 genomics 指南For more information on developing genomics applications, see Genomics guide.

新功能New features

Genomics 的 Databricks Runtime 7.0 是以 Databricks Runtime 7.0 為基礎。Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0. 如需 Databricks Runtime 7.0 新功能的詳細資訊,請參閱 Databricks Runtime 7.0 (不支援的) 版本資訊。For information on what’s new in Databricks Runtime 7.0, see the Databricks Runtime 7.0 (Unsupported) release notes.

GloWGR:整個基因組迴歸GloWGR: Whole genome regression

發光現在包含可擴充的整個基因組回歸方法 GloWGRGlow now includes a scalable whole genome regression method, GloWGR. GloWGR 是單一節點工具 regenie的分散式版本。GloWGR is a distributed version of the single-node tool regenie. GloWGR 是符合企業需求的工具,可為基因組回歸的其他方法提供相等的精確度,但速度會有較大規模的改進。GloWGR is an enterprise-ready tool that provides equivalent accuracy to other methods for whole-genome regression, but with an order-of-magnitude improvement in speed. 如需詳細資訊,請參閱開放原始碼中的 整個基因組回歸For details, see whole genome regression in open source.

轉換器接受非字串類型引數Transformers accept non-string typed arguments

所有的發光轉換器(包括管道轉換器和變異正規化程式)現在接受值不是字串的引數。All Glow transformers, including the pipe transformer and variant normalizer, now accept arguments whose values are not strings. 管道轉換器的發光檔會反映新的使用方式。The Glow documentation for the pipe transformer reflects the new usage. 為了回溯相容性,仍然接受所有引數的字串值。For backwards compatibility, string values are still accepted for all arguments.

Numpy ndarray 常值Numpy ndarray literals

您現在可以將常值 numpy 1D 和2D 浮點數型別 ndarrays 傳遞給預期資料框架資料行與型別的函式 array<double> DenseMatrixYou can now pass literal numpy 1D and 2D float-typed ndarrays to functions that expect DataFrame columns with types array<double> and DenseMatrix respectively. 「發光全 基因組關聯」研究檔 會示範新的使用方式。The Glow genome-wide association study documentation demonstrates the new usage.

Mean 替代函數Mean substitution function

「發光」現在提供 mean_substitute 函式,以非遺漏值的平均值取代陣列中的遺漏值。Glow now provides a mean_substitute function to substitute missing values in an array with the mean of the non-missing values.

改善Improvements

聯合基因型分型效能Joint genotyping performance

聯合基因型分型管線的效能已改善5-20%。The performance of the Joint genotyping pipeline has improved by 5-20%. 在每個節點上使用具有多個核心的叢集節點類型時,這項改善特別明顯。The improvement is particularly pronounced when using cluster node types with many cores per node.

.VCF 讀取器忽略 tabix 的索引檔案VCF reader ignores tabix index files

在先前的版本中,如果目錄包含 tabix 索引檔案,則在讀取 .VCF 檔案的目錄時,可能會失敗。In previous releases, the VCF reader could fail when reading a directory of VCF files if the directory contained tabix index files. 讀取器會嘗試將 tabix 檔解讀為 .VCF 檔案,並回報錯誤。The reader would attempt to interpret the tabix files as VCF files and report an error. 現在,讀取器只會使用索引檔來決定要讀取的資料檔案。Now, the reader only uses index files to determine which data files to read.

splitToBiallelic 從 .vcf 讀取程式移除選項Removed splitToBiallelic option from VCF reader

此選項已移除,以利 split_multiallelics 的 轉換器This option has been removed in favor of the split_multiallelics transformer. 轉換器比 .VCF 讀取器選項更快且更精確。The transformer is faster and more accurate than the VCF reader option.

程式庫Libraries

下列各節列出 Databricks Runtime 7.0 中包含的程式庫,與 Databricks Runtime 7.0 中包含的程式庫不同。The following sections list the libraries included in Databricks Runtime 7.0 for Genomics that differ from those included in Databricks Runtime 7.0.

升級的程式庫Upgraded libraries

  • ADAM:0.30.0 至0.32。0ADAM: 0.30.0 to 0.32.0

已移除程式庫Removed libraries

Hail 不包含在適用于 Genomics 的 Databricks Runtime 7.0 中,因為沒有任何版本是以 Apache Spark 3.0 為基礎。Hail is not included in Databricks Runtime 7.0 for Genomics as there is no release based on Apache Spark 3.0.

封裝的程式庫Packaged libraries

程式庫Library 版本Version
亞當ADAM 0.32.00.32.0
GATKGATK 4.1.4.14.1.4.1
Hadoop-bamHadoop-bam 7.9.27.9.2
samtoolssamtools 1.91.9
VepVEP 9696