NCBI BLAST

Version: 1.0

Description

This sample demonstrates how to run a parametric sweep application that matches inputted nucleotide sequences to the human genome. The sample also demonstrates how to submit a job using the HPC Job Scheduler’s REST API and how to upload and download files from Windows Azure blob and table storage.

Overview

The National Center for Biotechnology Information’s Basic Local Alignment Search Tool (NCBI BLAST) HPC Sample demonstrates how to run a nucleotide match algorithm on the human genome using an HPC parametric sweep application.

The parametric sweep application uses a set of input files that contain sequences of nucleotides, comparing them to the human genome database. The application creates output files containing sequence similarities and uploads these files to a BLAST output visualizer (BOV) website.

To run the nucleotide match, the sample uses the blastn utility, which is a part of the BLAST+ application.

The architecture of the sample and the steps of its execution are described in Figure 1:

Figure 1

Architecture of the BLAST sample

  1. The client application submits a parametric sweep job to the HPC Job Scheduler using the representational state transfer (REST) interface of the HPC Pack web features.
  2. The Windows HPC Server 2008 R2 SP2 cluster submits the job to the Windows Azure nodes.
  3. Each parametric sweep application downloads an input file from a Windows Azure blob storage. The input file includes a nucleotide that is compared to the human genome database previously downloaded to each Windows Azure compute node.
  4. After completing a sweep index, the BLAST application uploads the resulting matches file to the BLAST output visualization (BOV) website and receives a matching URL for the file’s visualization page.
  5. The output file and the URL are written to Windows Azure storage: the file is uploaded to a blob, and the URL is written in a table.
  6. While the parametric sweep job is running, the client application retrieves the list of URLs from Windows Azure table storage and shows it to the user.
  7. The user can select any of the URLs to see the rendered image for the nucleotide match.
Note:
To use this sample application, you will need to download the human genome compressed database from the NCBI FTP server, extract the database, and copy it to a Windows Azure blob storage as described later on in the document.

This sample demonstrates some of the new features offered by Windows HPC Server 2008 R2 Service Pack 2 (SP2). Refer to the What's New in Windows HPC Server 2008 R2 Service Pack 2 article on TechNet for the complete list of new features offered in this version.

Key Features

This sample demonstrates the following features:

  • Uploading a parametric sweep application package to Windows Azure nodes.
  • Uploading and downloading files to Windows Azure blob storage.
  • Writing and reading from Windows Azure table storage.
  • Creating and running a parametric sweep job in a Windows HPC Server 2008 R2 SP2 cluster.
  • Controlling jobs with the REST interface of the HPC Pack web features.