Science Grid This Week
July 13, 2005 Current Issue | About SGTW | Subscribe | Archive | Contact SGTW  
GADU/GNARE

GADU/GNARE
Courtesy GADU/GNARE
A powerful approach to the interpretation of newly sequenced genomes is comparative analysis against all annotated sequences in publicly available resources. The largest sequence database at the National Center for Biotechnology Information currently contains 2.4 million protein sequences. The precision of genetic sequence analysis and assignment of function to genes can be increased markedly by the use of multiple bioinformatics algorithms for data analysis. The GNARE’s analysis module GADU, a Genome Analysis and Databases Update tool for the Mathematics and Computer Science department at Argonne National Laboratories, pre-computes analysis results for every sequence, finding protein similarities (BLAST), protein family domains (BLOCKS), and structural characteristics. Grid resources are used to run the resulting millions of processes, a task that must be repeated frequently owing to the exponentially growing amount of data.

GADU searches periodically through DNA and protein databases for new and updated genomes and then computes and publishes derived values. Analysis of a single bacterial genome of 4000 sequences by three bioinformatics tools (BLAST, PFAM, and BLOCKS) requires 12,000 steps, each taking on the order of 30 seconds of run time. GADU is able to perform these tasks in a timely fashion only because it has access to distributed resources provided by two U.S. national- scale infrastructures, TeraGrid and Open Science Grid.

—Dinanath Sulakhe