使用管道转换器的变 体批注 Variant Annotation using Pipe Transformer

任何批注方法均可用于使用发光的管道变压器的变量数据。Any annotation method can be used on variant data using Glow’s Pipe Transformer.

例如,VEP 批注是通过将批注数据源(缓存)下载到群集中的每个节点来执行的,并使用类似于以下单元的脚本通过管道转换器调用VEP 命令行脚本For example, VEP annotation is performed by downloading annotation data sources (the cache) to each node in a cluster and calling the VEP command line script with the Pipe Transformer using a script similar to the following cell.

import glow
import json

input_vcf = "/databricks-datasets/hail/data-001/1kg_sample.vcf.bgz"
input_df = spark.read.format("vcf").load(input_vcf)
cmd = json.dumps([
  "/opt/vep/src/ensembl-vep/vep",
  "--dir_cache", "/mnt/dbnucleus/dbgenomics/grch37_merged_vep_96",
  "--fasta", "/mnt/dbnucleus/dbgenomics/grch37_merged_vep_96/data/human_g1k_v37.fa",
  "--assembly", "GRCh37",
  "--format", "vcf",
  "--output_file", "STDOUT",
  "--no_stats",
  "--cache",
  "--offline",
  "--vcf",
  "--merged"])
output_df = glow.transform("pipe", input_df, cmd=cmd, input_formatter='vcf', in_vcf_header=input_vcf, output_formatter='vcf')
output_df.write.format("delta").save("dbfs:/mnt/vep-pipe")