By Karen HopkinJune 12, 2002 | Computers have come a long way in the past 20 years. "My thesis took three years of computational effort," says chemist David Osguthorpe of the University of Bath in England. Osguthorpe worked on an algorithm for predicting the formation of alpha helices and beta sheets. "I could do it in less than a week now," he says.
Not only are more powerful machines now available, they're also relatively inexpensive. By springing for a handful of basic, bottom-of-the-line PCs, anyone can piece together a system with enough power to tackle simple structure prediction without breaking the bank. "We type 'cheap cheap cheap' into a search engine and see what comes up," says Stanford University's Michael Levitt, who harvests his structure predictions from a string of 150 PCs running on Linux.
Although most CASPers have access to large clusters of dual processors — Pentium IIIs are pretty common — this year the Pittsburgh Supercomputing Center (PSC) is offering CASP5 participants time on its terascale computer system to help level the computational playing field. As of April, the PSC had allocated about 100,000 CPU hours on its machine, a cluster of 3,000 Compaq alpha processors capable of performing six teraflops (trillion floating-point operations per second). Another million hours are available, says the PSC's David Deerfield, who was expecting a rush of applications once the targets were in hand.
With some tweaking, CASPers should be able to compile their Fortran or C code programs on the PSC platform. The trick, notes the PSC's Troy Wymore, will be in writing programs that take full advantage of the parallel performance of the machines. By executing operations in parallel, a job that would take nearly six weeks on a single CPU can be completed in six hours when distributed over, say, 164 processors.
At the same time, computer scientists at IBM's Thomas J. Watson Research Center outside New York City continue to build Blue Gene, a petaflop machine that could add muscle to folding programs by allowing researchers to track every atom in a protein. Made from a million CPUs connected in ever larger bunches, Blue Gene will be able to perform a quadrillion (1015) calculations per second (petaflop). That's about a thousand times faster than Deep Blue, the computer that beat world chess champion Garry Kasparov in 1997.
Although the added speed will help investigators monitor the folding process, it might not be fast enough for CASP. Using an all-atom simulation, Blue Gene would take a year to fully fold a protein of 300 amino acids. At that rate, says Ram Samudrala of the University of Washington, "you might as well solve the structure in lab using X-ray crystallography or magnetic resonance techniques."
And in the end, the problems researchers face in predicting protein structure might be better attacked on the blackboard than the keyboard. "It's a hard problem," notes the University of Iowa's Alberto Segre, "because even if we had all the computational power in the world, we still don't fully understand the forces that make proteins adopt their shape."
So for now, Levitt says, an old-fashioned approach remains viable. "If I thought that with 10 times more computer power I could solve the folding problem, I'd go out and buy it," he says. "But maybe we need a million times more. So instead I sit down and think."
Back to Computational Biologists Join the Fold