
Members of the L-Store team at Vanderbilt University. |
The flood of data generated by scientific simulation and experimentation poses many challenges for researchers, including deciding how and where to store the data, and how to securely transfer it among worldwide collaborations. A team of computer scientists from Vanderbilt University is helping scientists meet these challenges with L-Store, a new system for distributed storage.
"When you visit a Web site, you don't think about the fact that the images come from one place and the ads from another," says Alan Tackett, technical director at Vanderbilt's Advanced Computing Center for Research and Education. "That's the way we feel data storage should work."
When complete, L-Store will provide a scalable, secure way for data to be stored on distributed systems. Users will install L-Store on their resources—anything from a single laptop to a multi-site computing grid—and save their data to the same "cloud" structure no matter how large the amount of data or storage becomes. The problem of accessing such distributed data files is solved using a new technology for sending data over networks called the Internet Backplane Protocol. IBP, developed at the University of Tennessee, Knoxville, enables the efficient movement of large data sets over the Internet by breaking up each file into chunks and transferring the fragments simultaneously rather than sequentially.
"People think of the Internet as a point-to-point system like airline traffic," explains Tackett. "But it's really more like the highway system. In some places there are four-lane highways, some places go at 35 miles per hour, and there can be traffic jams. L-Store and IBP treat the Internet like the highway system—it might be faster to fan a data transfer out to five different paths and bring it together on the back end."
In addition to distributing the storage of data files, L-Store will also distribute the metadata describing those data files. Metadata is data about data, and is the type of information listed when viewing the contents of a directory. While many scientists generate small numbers of large files, making their metadata needs rather simple, more and more scientists are generating very large numbers of small files. These scientists have metadata storage needs that are even larger than their data storage needs, making distributed metadata storage an attractive option.
The L-Store project is still in the beginning stages, but already had a demonstration of its first capabilities at Supercomputing 2005. The development team has implemented distributed file storage, and is now working on distributing the metadata, with an initial release scheduled for the end of this month. The team has also submitted a proposal for a Research and Education Data Depot Network (REDDnet), which would set up a prototype distributed storage facility using L-Store and the advanced network technology of the UltraLight project. REDDnet will have more than 320 terabytes of storage capacity and will be used to benchmark the L-Store system.
Learn more at the ACCRE Web site.
—Katie Yurkewicz
e-mail this article
|