By Mark D. Uehling
May 7, 2002 | If you need to model a few eons of climate data, the new Science Grid from the Department of Energy is just the ticket.
With help from IBM Corp., the forthcoming Science Grid will put impressive resources at the disposal of 2,100 DOE scientists. When it comes online at the end of this year, the project, undertaken at the Lawrence Berkeley Laboratory, will deliver two supercomputers (one purported to be the third-most powerful in the world), 1.3 petabytes of storage, or 200 times the amount of information in the Library of Congress, and the ability to perform 5 teraflops (trillion calculations per second).
Even without the Science Grid, of course, DOE researchers can already get their hands on some of the world's most powerful computing machines. "The grid is not about computing horsepower," explains IBM's Jeffrey Augen, director of business strategy for life science solutions. "It's about how broadly available that horsepower is. It's about getting the horsepower to everyone on the grid."
Thanks to Linux and the open source Globus load-balancing software at the heart of the Science Grid, there isn't a big opportunity for IBM to sell applications. But Augen says the company is bringing to bear its expertise in security. "It's critical to think about security up front," he says, pointing out that with future grids, physicians may be perfectly willing to share basic demographic and clinical facts about their patients, but not their private office notes. By working with the security-conscious DOE, Augen says, IBM should be able to work out grid-related security kinks and apply the lessons elsewhere.
Once the security issue is resolved, the Science Grid will make access to high-performance computers ordinary. No longer will scientists have to worry about the exact location of either the computer or the data. As with electrical outlets, scientists will not need to worry about where the computational power is coming from. They will simply log on and work. Researchers at the DOE's Oak Ridge, Tenn., lab should be able to manipulate datasets in California without knowing or caring whether their own desktop computer or a supercomputer is crunching the numbers.
"You won't know where the computing resources are coming from," says IBM spokesman John Buscemi. "You're not saying put this work on that computer. You just do it." By government standards, the cost of the Science Grid is minimal, $1.5 million annually for 3 years, because the supercomputers were already in place.
At the outset, most of what gets done on the Science Grid will be related to unclassified nuclear fusion research and other basic research; just a few percent of the hours available will be devoted to the life sciences. One example of the Science Grid's future life science potential will be Berkeley Lab Senior Scientist Ken Downing's team, which is using an electron microscope to determine the structures of molecules or groups of molecules.
A biophysicist, Downing takes 50 to 100 images per hour, each of which is 30MB. "It's a lot of data," he says. "We need tens or hundreds of thousands of images and do a lot of processing. The idea of having data archived and available anywhere is attractive. It will be extremely handy to have someone remotely running the microscope." Typically, grids run over the internet, but the demands of the research on the Science Grid will require the DOE's own ESNet, a 622 Mbps network.
Downing can't cite a specific scientific mystery the Science Grid will allow him to solve, but he does think it offers a peek at how all scientists, inside and outside government, will eventually collaborate.
"This is a new paradigm for connectivity," says Downing. "This is going to be like another generation of the Internet. In 1957, nobody would have predicted what would become of the Arpanet. The grid is going to be the model of how computing evolves."
One of the nation's leading molecular modeling researchers agrees. Physicist Klaus Schulten of the University of Illinois has access to no less than 10 supercomputers at various research centers around the country, including the nearby National Center for Supercomputing Applications in Champaign, Ill.
Schulten points out that the datasets of many life science researchers are so huge that they pose particular challenges, requiring special disk arrays, staff, and expertise. "You have a burning issue of data storage," says Schulten.
For all their imposing scale, Schulten suggests research grids like the Science Grid or the academic TeraGrid may actually have been inspired by a more pedestrian phenomenon. "We learned from America Online." Powerful scientific grids, he says, can be thought of as easy-to-use virtual communities where datasets of like-minded researchers are handled with a minimal amount of hassle.
"It becomes easier for users who are not so sophisticated to imagine their data are on a local disk," says Schulten. "Whole groups of researchers can submit jobs, start jobs, look at each others' jobs. Many of our fields have been waiting to use computers to work together, and that is starting to happen."