Science Grid This Week
October 26, 2005 Current Issue | About SGTW | Subscribe | Archive | Contact SGTW  
SAMGrid Lifts Job and Data Handling Burden

SAMGrid Team
The SAM team. (Bottom row, from left: Sudhamsh Reddy, Valeria Bartsch, Parag Mhashilkar, Krzysztof Genser, Andrew Baranovski, Steve Sherwood. Top row: Lee Lueking, Stephen White, Adam Lyon, Gabriele Garzoglio. Not pictured: Randolph Herber, Dehong Zhang, Wyatt Merritt, Rick St. Denis, Sinisa Veseli, Lauri Carpenter and Vicky White.)
For scientists whose research includes working with large amounts of data, writing a grid computing application can be a daunting prospect. SAMGrid makes this task a little easier by integrating SAM, a robust data handling system in use by the particle physics experiments DZero, CDF and MINOS, with grid technology.

"SAMGrid integrates job handling and data handling with standard grid tools and services," said Adam Lyon from the Fermi National Accelerator Laboratory, where SAM and SAMGrid are developed and used. "In many grids it's the responsibility of the application to find, transport and annotate data. With SAMGrid, the application delegates SAM to find the best location to get the file from, transport it, record that you've asked for it and what you are doing with it."

SAM, which stands for Sequential Access via Metadata, started development in 1997 as a data handling system for Fermilab's DZero particle physics experiment. It was designed to store and retrieve data files and associated metadata—information about the data—including a complete record of the processing for each and every file.

"You can find out who has used a certain file, and how, since the beginning of time," said Fermilab's Gabriele Garzoglio. "This is important for particle physicists who may need to know exactly how the data was processed so that they can reproduce results."

SAMGrid Architecture
In 2001, the SAM team started exploring ways of linking SAM's data management with job management on the grid. The vision of SAMGrid was for a user to submit a job to the grid where it would be sent to the best available computer cluster, with SAM automatically finding and transferring the necessary data. SAMGrid now interfaces with almost any storage or computing resource configuration, and its flexible job dispatch system allows users to run on local or remote clusters.

When the DZero experiment started using SAMGrid in 2004 to simulate millions of particle physics events in the DZero detector, computing resources around the world installed the software for job submission and data handling. For the past 6 months, SAMGrid has been used to reprocess 250 terabytes of data collected by the DZero experiment—a task that would have taken many years using only the Fermilab computing facilities.

Lyon, Garzoglio and other SAMGrid developers are now integrating the system with other major grids, and introducing more standard grid tools into the SAM data handling system. In the future they also plan to adapt SAMGrid to use grid storage resources opportunistically and to make the whole system modular.

Learn more at the SAMGrid Web site.

—Katie Yurkewicz