
A PUMA2 protein sequence analysis. |
Biologist Natalia Maltsev and a team at Argonne National Laboratory and the University of Chicago use grid computing to help researchers solve the mysteries of life. Their Genome Analysis and Database Update system (GADU) provides the core for several bioinformatics applications that search for similarities and differences among thousands of genome and protein sequences and metabolic pathways.
"Bioinformatics is the science of big numbers," says Maltsev. "Most of the scientific insight comes from the comparison of what is unknown to what is known. To understand similarities between organisms, you need to integrate huge amounts of data using algorithms, which means you need a high-throughput computational backend."
GADU provides such a backend, using computational resources from the TeraGrid and the Open Science Grid. Applications such as GNARE and PUMA2, used by over 2,400 researchers worldwide, provide information about the analysis of thousands of genome and protein sequences and allow scientists to compare their own genomes against databases of millions of sequences. The applications are used to study topics such as bioremediation, the use of microorganisms to clean up pollution.

Section of a metabolic pathway. (Click on image for full version.) |
"We are analyzing the organisms that live under the Hanford site in Washington, where there are leaking tanks of radioactive waste," explains Maltsev. "Certain microorganisms live in boiling nitric acid laced with very high levels of radiation and chromium pollution, and we are analyzing these organisms' genomes to figure out what makes them resistant to all these pollutants."
Researchers can't collect complete genomes of individual pollution-resistant microorganisms. Instead, they collect all the DNA they can from the environment, and use GADU to compare chunks of collected DNA to millions of sequences available in public databases.
"We are trying to predict what kind of organisms live in such a polluted environment by doing evolutionary analysis of sequences provided to us by the researchers and determining the functions of genes encoded by the chunks of DNA," adds Maltsev. "We are hoping to predict the physiology of the organisms and to see where they fit on the evolutionary tree. We can't do that without high-throughput genetic sequence analysis, can't do analysis without GADU, and can't do GADU without the grid."
Learn more at the ANL Computational Biology Group's Web site.
—Katie Yurkewicz
e-mail this article
|