Rewriting the Rulebook for Supercomputing and Research



GUEST COMMENTARY · IBM's Blue Gene supercomputer project leader highlights progress and future applications

BY WILLIAM PULLEYBLANK

October 14, 2004 | When IBM Research launched the Blue Gene project in 1999, our goal was to create a new type of architecture that would be required to build the world's most powerful supercomputer — a massively scalable machine capable of surmounting great challenges, such as how to simulate protein folding.

Today, Blue Gene has evolved into a family of supercomputers with a broad set of application possibilities, exceeding all prior expectations in terms of performance, scalability, accessibility, the scope of the problems it can address, and potential for science and research.

Early next year, we will deliver our first commercial Blue Gene supercomputer — Blue Gene/L — to the U.S. Department of Energy's Lawrence Livermore National Laboratory (LLNL). Two early prototypes of that machine are already among the 10 most powerful supercomputers in the world, with sustained speeds of 11.68 and 8.66 teraFLOPS (a teraFLOPS is a trillion floating-point operations per second). And just last month, a 16,000-processor Blue Gene/L hit the top of the list, with sustained performance of 36.01 teraFLOPS.


Seeing the Submicroscopic 
We are already performing scientific research on the early Blue Gene/L prototypes, and we will expand our agenda as we scale up to bigger machines. Currently we are studying G-protein-coupled receptors (GPCRs), a large class of membrane-bound proteins that are important drug targets. We are simulating rhodopsin, a GPCR found in the retina, in hopes of understanding how lipids reorganize in the presence of this protein. These sorts of phenomena are difficult to observe experimentally, but we expect to gain new insights using the unique capabilities provided by Blue Gene/L.


MASSIVE: IBM says Blue Gene will enable scientists to routinely simulate microsecond time scales.
One obvious benefit of Blue Gene/L is how much faster than conventional supercomputers it can perform these sorts of simulations. It already has a 10x performance improvement over existing commercial supercomputers, and this should increase as larger Blue Gene systems become available. Nanosecond simulations that took four days to run on standard systems currently require 14 hours on our small Blue Gene/L prototype. The full-size Blue Gene systems, currently under construction, will enable us to routinely simulate microsecond time scales — 1,000 times longer than currently possible — making visible submicroscopic biological events (including protein folding) in a way previously impossible.

This leap forward in speed will dramatically change the way scientists work and will have a huge impact on scientific discovery. Scientists will be able to run simulations over much longer periods to understand long-term effects. High-performance computing will become a truly interactive tool; instead of waiting weeks for results, researchers will be able to run a calculation in an hour, see the results, make adjustments, and run it again.

Blue Gene's computational power will open the door to an extraordinary period of progress and understanding, in not only the life sciences but also other fields, including hydrodynamics, material sciences, quantum chemistry, molecular dynamics, fluid dynamics, climate modeling, and financial modeling.

Blue Gene was designed to give researchers access to unprecedented levels of computing power. But Blue Gene/L was also designed to cost less, consume less power, and take up far less floor space than any comparable supercomputer.

Whereas many of today's supercomputers require their own buildings and power supply, the 11.68-teraFLOPS Blue Gene/L prototype occupies only four refrigerator-sized racks. Our completed 64-rack Blue Gene/L machine will be eight times faster, consume 15 times less power, and be 10 times smaller than today's fastest supercomputers. In fact, this Blue Gene/L system should deliver power equivalent to the aggregate processing power of the top 40 supercomputers in the world today — and take up less floor space than half a tennis court.


Scaling Up 
A university that needs massive computing power will be able to install a few racks of Blue Gene/L in an office-sized room and then scale up as desired. Unlike traditional Linux cluster machines, Blue Gene/L scales up quickly, easily, and without performance degradation. The system uses thousands of nodes and an innovative system-on-chip technology that enables it to scale to the highest tiers of performance while maintaining high reliability. An institution that deploys a Blue Gene/L machine will be able to deal with increasing demand by scaling upwards to hundreds of thousands of processors. With conventional clusters, we often encounter scalability problems after a few hundred processors.

(Blue Gene's high degree of scalability is illustrated by the fact that in November 2003, a 512-node Blue Gene/L prototype was ranked the 73rd most powerful supercomputer in the world. Six months later, a 4,096-node machine was ranked fourth.)

Blue Gene's architecture may be unique, but the machine was designed to be familiar and usable. Architecturally, Blue Gene/L contains thousands of power-efficient processing chips, each with dual Power PC processor cores, on-chip memory, and two dual floating-point units to speed calculation. The system is integrated via multiple interconnection networks and compressed into a dense package.

This represents a new way of building a supercomputer. Yet while Blue Gene/L is not a Linux cluster, in many ways it looks and feels like one — quite deliberately. We run the machine under control of the Linux operating system and have designed it to ensure that programs written for clusters will run on Blue Gene/L. While Blue Gene's speed and computing power open enormous possibilities, the transition from traditional supercomputers to the BlueGene/L environment should be easy for most users.


The Road to a PetaFLOPS 
As the Blue Gene project progresses toward the goal of becoming the world's most powerful supercomputer, we are exploring ways to create a community of Blue Gene users at the research level. The fastest way to truly test Blue Gene's capabilities is to allow the scientific community to tackle its challenges on this new architecture.

In addition to the LLNL machine, we are building Blue Gene/L supercomputers for the Argonne National Laboratory; ASTRON, a leading astronomy research group in the Netherlands; and the Computational Biology Research Center at Japan's AIST (National Institute of Advanced Industrial Science and Technology). IBM's computing-on-demand centers will provide a way for our customers to work on Blue Gene systems as well.

Plans are already under way to develop a successor to Blue Gene/L that will reach the 1-petaFLOPS (quadrillion floating-point operations per second) performance level — and beyond that, our sights are set on building a machine capable of performing at several petaFLOPS. For researchers who require massive computing power, the next few years promise to be exciting. *



William Pulleyblank is director of exploratory server systems and the director of the Deep Computing Institute at IBM Research. E-mail: wp@ibm.us.com. 








For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.