IBM Research's Joseph M. Jasinski sees exciting life science applications for next-generation supercomputing.
Well, IBM wants to push performance even further. The company is about two and a half years into a $100-million effort dubbed Blue Gene. IBM aims to produce a new supercomputer architecture capable of performing in the petaflop (quadrillion floating-point operations per second) range — about 500 times more powerful than today's leading supercomputers.
When completed in 2006, Blue Gene will be 1,000 times more powerful than Deep Blue. Instead of playing chess, Blue Gene will focus on the life sciences in general, and the simulation of protein folding in particular.
"Of all the possible applications for a next- generation supercomputer, the most exciting application is in the life sciences," says Joseph M. Jasinski, senior manager of the computational biology center for IBM Research.
Obviously, IBM's stake in the life sciences is self-serving. It wants to sell lots of expensive machines, and IBM sees great potential for its high-end systems in the life sciences. "Life sciences is a significant driver for computational capacity, just as the physical sciences were last century," says Jasinski.
Blue Gene got its start in 1999 when IBM announced plans to build a massively parallel high-performance computer. Like its predecessor Deep Blue, IBM wanted a sexy application to show off Blue Gene's processing power. IBM decided to tackle protein folding, which holds great promise for helping understand how proteins behave and thus helping speed the discovery of drugs for many diseases (see June Bio·IT World, page 56).
The trouble with protein folding simulations is the daunting computational tasks required to do them. "We're looking at all atom/molecular dynamics simulation," says Jasinski. Essentially, the algorithms that will run on Blue Gene will simulate how all the atoms of a protein move over time when subject to the various chemical and molecular forces encountered in nature. Typically, there are 30,000 or more atoms in a given protein.
Delivering the Power
To create a computer capable of performing this immense calculation, IBM is taking a different approach with Blue Gene; it will employ a novel cellular architecture.
"Blue Gene represents a paradigm for a new way to make supercomputers," says Jasinski.
Supercomputers are great number crunchers, but the time it takes to move data from the memory
|Blue Gene's Protein Origami
|To get a sense of the computational power required to perform protein-folding calculations, one needs to look in more detail at what exactly is being simulated. A typical protein can have 30,000 atoms, surrounded by perhaps 500,000 water molecules.
chips to the processors limits the performance of many data-intensive applications. To relieve this bottleneck, Blue Gene's architecture will use what IBM calls data-chip cells optimized for simultaneous data access during calculations. Each cell includes two processors — one for computing, the other for communications — and its own on-board memory.
IBM is also developing a new way for these cells to interoperate. "We think a tremendous gain in performance will be made possible by the first major revolution in how computers are built since the mid-1980s," says Dr. Ambuj Goyal, IBM Research's vice president of computer science. "We call this new approach to computer architecture SMASH, which stands for simple, many, and self-healing."
The SMASH architecture differs from existing approaches in three ways. First, it dramatically simplifies the number of instructions carried out by each processor, allowing them to work faster and with significantly lower power. (Goyal notes the traditional approach has been to add complex features to gain performance.)
Second, SMASH enables a massively parallel system capable of more than 8 million simultaneous threads of computation, compared to the maximum of 5,000 threads today, says Goyal.
And third, SMASH makes a computer self-stabilizing and self-healing — automatically able to overcome failures of individual processors and computing threads.
Blue Gene will consist of more than 1 million processors, each capable of 1 gigaflop. Fitted with 32 processors, each individual chip will yield 32 gigaflops of processing power. Sixty-four of these chips will be arranged on separate 2-foot by 2-foot boards, with each board delivering up to 2 teraflops of power. Six-foot high racks will be stacked with eight of these boards, generating 16 teraflops per rack. All told, Blue Gene will comprise 64 racks linked together to deliver petaflop performance. IBM expects the total space required to be less than 2,000 square feet.
To put this petaflop performance into perspective, the most powerful computer for several years has been the ASCI White. Used by the U.S. Department of Energy's Lawrence Livermore National Laboratory to develop 3-D simulation tools to support nuclear stockpile stewardship efforts, ASCI White is capable of performing 12 teraflops. In theory, a single Blue Gene rack can deliver more processing power.
ASCI White was ousted from the top of the list of most powerful supercomputers this May with the release of independent benchmark tests of an NEC supercomputer in Japan. The Earth Simulator supercomputer in Japan's Earth Simulation Center, which is part of the Japan Marine Science & Technology Center, was benchmarked at 35.86 teraflops — slightly more processing power than two Blue Gene racks.
Not Exclusive to Life Sciences
Just as Deep Blue is used for things other than playing chess, Blue Gene will be used for non-life science tasks, too. "We see Blue Gene as a low cost, low power, simulation machine for any physical system," says Jasinski.
|Top of the Heap
|Blue Gene is designed to significantly raise the bar when it comes to supercomputer performance. In theory, Blue Gene will deliver 100 times more raw processing power than today's top supercomputers.
Last November, IBM announced Blue Gene/L, a new project within Blue Gene. Blue Gene/L, a joint effort between IBM and the U.S. Department of Energy's National Nuclear Security Agency, is expected to produce a computer that operates at between about 180 and 200 teraflops. According to IBM, such a computer would deliver more processing power than the aggregate of the top 500 supercomputers in the world today.
IBM and Livermore Lab will jointly design the new computer using the same technology intended for Blue Gene. The goal is to make Blue Gene/L 15 times faster and more energy efficient, and consume about 50 times less space per computation than today's fastest supercomputers.
"We realized we could meet our original Blue Gene goal of protein science simulations, while using the same technology to expand the project to deliver more commercially viable architectures for a broader set of customers," says Mark Dean, vice president of systems at IBM Research. "Partnering with Lawrence Livermore is a key part of this strategy since they bring application and design expertise to the project."
Researchers at the Livermore Lab plan to use Blue Gene/L to simulate physical phenomena of national interest — such as the aging of materials, and the progress of fires and explosions — that require computational capability much greater than presently available.
This customization represents a new thrust, very different from the approach taken by the main line of ASCI machines used in the labs. "Up until now, ASCI supercomputers have been designed to address the entire spectrum of numerical simulations required of the stockpile stewardship effort," says David Nowak, ASCI Program Leader at Livermore Lab, in a statement released by the lab. "Blue Gene/L can address an important subset of those computational problems, those that can be easily divided to run on many tens of thousands of processors."
But note, it will be a while before Livermore Lab or anyone sees Blue Gene. The next real milestone in the Blue Gene project will be reached in the middle of next year when the first chips are delivered. Blue Gene/L is expected to be completed by 2005, one year before Blue Gene.