By Salvatore Salamone
July 14, 2004 | Blue Gene, IBM’s next-generation petaFLOP (quadrillion floating-point operations per second) supercomputer intended to be complete in 2006, moved to an elite neighborhood last month when two Blue Gene prototypes appeared among the top 10 most powerful supercomputers in the world. That’s according to Top500.org, which twice yearly publishes a list of the top 500 supercomputers in the world.
Ranked fourth and eighth, respectively, were Blue Gene/L DD1 and Blue Gene/L DD2. The two systems attained benchmarks of 11.68 teraFLOPS (trillion floating-point operations per second) sustained speed and 16 teraFLOPS peak performance (DD1), and 8.66 teraFLOPS sustained and 11.47 teraFLOPS peak (DD2), respectively.
What’s significant is these performances suggest Blue Gene architecture scales extremely well, which is critical to the future of the Blue Gene project. The DD1 system is a larger version of the same system that benchmarked as the world’s 73rd most powerful computer last November. Its performance scaled linearly as nodes were added -- something that is not always the case with modular systems.
“In most systems, you can throw processors in a room, but they don’t scale because of interconnection [issues],” says William Pulleyblank, director of exploratory server systems at IBM Research. “We built in tight integration [of components] and bandwidth between processors.”
The general rule in high-performance computing -- whether as a cluster or a multiprocessor system -- is that as more nodes are added, more system resources are diverted from computational work to handle the mundane (but necessary) internodal communications and data-handling tasks. The practical consequence is that system performance typically does not scale linearly with added nodes.
IBM seems to have tamed this problem. The latest DD1 is a four-rack unit versus a half-rack for the previous system, so it is about eight times larger and attains more than eight times greater performance (11.6 teraFLOPS for DD1 versus 1.4 teraFLOPS for the first prototype). Part of the extra gain stems from tweaking the system software.
The DD2 system provides a different insight into the performance of future Blue Gene systems. It uses a new version of the computer’s processors. The first set of Big Blue chips, produced last year and used in DD1, have a clock speed of 500 MHz. The new chips used in the DD2 run at 700 MHz. “They give us about a 40-percent performance improvement,” Pulleyblank says. To put this into perspective, the two-rack DD2 system has half the number of nodes of the DD1, but its sustained benchmark performance of 8.66 teraFLOPS is about three-quarters that of the DD1 system.
Protein-Folding Gains
One proof-of-concept application driving Blue Gene is protein folding. Scientists at the IBM T.J. Watson Research Center have been using the Blue Gene/L Prototype system to study how G-protein-coupled receptors (GPCRs) act in a membrane environment. One particular GPCR that IBM is studying is rhodopsin, a membrane protein found in the retina.
“GPCRs represent a large class of proteins that are important drug targets,” says Robert Germain, manager of biomolecular dynamics and scalable modeling at IBM Research (part of the Computational Biology Center). Diseases that are currently being treated with drugs that target GPCRs include allergies, anxiety, asthma, cancer, congestive heart failure, hypertension, migraines, Parkinson’s disease, psychosis, stroke, and ulcers. In fact, about half of all drug targets today are GPCR-based. And many blockbuster drugs are GPCR-based, including Allegra, Claritin, Cozaar, Imitrex, Plavix, Risperdal, and Zyprexa, which all work through a GPCR target.
Nanosecond simulations of a GPCR protein folding take four days to run on the high-end commercial supercomputers available now (these systems are based on IBM’s SP technology). Using a 512-node partition of a Blue Gene/L prototype, the same nanosecond simulation takes about 8 to 10 hours.
“We’re getting about an order of magnitude improvement in performance over the [existing] systems,” Germain says. While this is quite promising, there is hope that even longer simulations will be within reach as larger Blue Gene systems become available. “We’d like to do simulations up to the microsecond scale,” Germain says.
Over time, IBM plans to build a 20-rack Blue Gene/L system at the T.J. Watson Research Center.