Science Grid This Week
July 26, 2006 Current Issue | About SGTW | Search | Subscribe | Archive | Contact SGTW  
Corralling Grid Power
Ed Walker
Ed Walker

Ed Walker has dedicated his professional career to harnessing the power of distributed computers. From Singapore's National Supercomputing Research Center, to Platform Computing in Toronto, and now at the Texas Advanced Computing Center in Austin, he's helped people get the computing power they need from computers across the room or around the world.

Today at TACC, Walker works closely with users of the TeraGrid who need to run large numbers of grid jobs on multiple machines. The architect of GridShell and MyCluster, he focuses on creating virtual environments that submit jobs to the grid without involving users in the details of job management.

"It enhances the user's work experience without changing their work environment," adds Walker.

With GridShell, Walker developed a shell environment that spawns agents to execute grid jobs on a user's behalf. To users, it looks as if they are interacting with their local desktop. MyCluster uses GridShell to create a virtual cluster environment within the TeraGrid. Users specify which machines they have allocations on, and log in to something that looks like a Condor pool or local cluster. MyCluster creates a dynamic Condor pool that changes size depending on which machines are available for computation.

As part of his work with MyCluster, Walker works closely with TeraGrid users who have large allocations on multiple machines. So far this year, he's helped high-energy physicists from Caltech, National Virtual Observatory scientists and a team from Rice University to get the most out of the TeraGrid.

"Michael Deem's team at Rice is using the TeraGrid to look for simulated crystal structure that might correspond to things you would find in nature," says Walker. "In the last two months, he's run over 50,000 jobs across eight systems on four TeraGrid sites."

Development on MyCluster continues, with a global distributed file system currently in the works. Today, users like Deem have to copy their source code to every machine on which they have an allocation and compile them separately. Any change that needs to be made to the code has to be done everywhere. With a distributed file system, all machines in the virtual cluster could see Deem's home directory, so he could make changes in only one place and output files could be send directly to his directory.

"We've got most of the distributed file system going now, it just needs a little more testing," says Walker. "We're engaging more TeraGrid users to try it out, and hope to have it ready for production and on all the TeraGrid sites soon."

For more information on GridShell, read Walker's IBM developerWorks article.

—Katie Yurkewicz