SUPERCOMPUTING · Despite surge in computing power, bioscience problems remain; Microsoft publicly dips toe in HPC waters
By Salvatore Salamone
December 15, 2004 | At last month's SC2004 conference, it was clear that substantially more processing power is becoming available for life science applications. The challenge now is how to make more efficient use of this power and how to make it easier to use, particularly as multidisciplinary approaches are taken to solve complex biological problems.
IBM and SGI picked up the top bragging rights at the show and significantly advanced the high end of the supercomputing spectrum by placing systems first (BlueGene/L) and second (Altix 1.5GHz, Infiniband), respectively, on the Top 500 list of the world's most powerful supercomputers. Many systems vendors introduced high-density clusters with dual-, quad-, and eight-way servers packing incredible processing power.
Also notable at this year's conference was the presence of Microsoft. While generally not considered an HPC vendor, the company is starting to rev up its efforts in the area with an eye to next year, when a number of enterprise HPC products such as the Microsoft Data Protection Server and an HPC version of Windows Server are due out. Microsoft used the conference to show off a few partnerships, including one with the cheminformatics company Optive Research.
Tip of the Iceberg
Despite all the new computing firepower on display, speakers at the conference still had questions as to whether HPC performance was growing fast enough to cope with life science applications. For several years, researchers have noted that the amount of genomic data is growing at a rate faster than Moore's Law.
NASA's 10,240-processor Columbia supercomputer is built from 20 SGI Altix systems, each powered by 512 Intel Itanium 2 processors.
Moreover, Gane Ka-Shu Wong, associate director of the Beijing Institute of Genomics, noted that the growth of data is just the tip of the iceberg. In his presentation, titled "Computing Opportunities in the Era of Abundant Biological Data," Wong said, "[The] genome sequence [data] is only the beginning." When one goes from DNA sequences to RNA to proteins, modeling biological activity becomes very complex.
Wong cited protein-folding calculations as an example. Modeling a protein of any meaningful size could keep a petaFLOPS (quadrillion floating-point operations per second) computer, once available, busy for a long time (see "Blue Gene's Protein Origami," Aug. 2002 Bio·IT World, page 28). Wong noted that many proteins require an interaction with another molecule for the protein-folding process to start. So modeling how an isolated protein folds would offer some insight into the protein's behavior, but it might not be a true representation of what happens in nature.
Wong said the idea of matching computing power to the increasingly complex biological systems being modeled will change the way HPC systems are bought. "You have very diverse users who will be doing so many different things," he said. "[They] will buy many different machines, rather than one [large] machine."
Others attendees discussed the growing complexity in biological research. "Many projects in life sciences require a multidisciplinary approach," said Daniel Reed, director of the Renaissance Computing Institute of Duke University, University of North Carolina, and North Carolina State University. In his presentation, "Computing — An Intellectual Lever for Multidisciplinary Discovery," Reed cited systems biology, whole-cell modeling, and longitudinal biomedical data fusion (the attempt to combine environmental, public health, clinical, and experimental data to better understand a disease and its treatment).
"These are large-scale data challenges and large-scale computing challenges," Reed said. "The challenge is how do we create the virtual organizations to bring the right people, with the right ideas, together at the right time."
Efficiency and Ease of Use
Wong and Reed set the tone for much of the conference and laid out the long-term challenges that face the life sciences. On the show floor, the vendors pitched ideas for shorter-term solutions.
On the easier-to-use front, many systems vendors, including Appro, HP, IBM, RLX Technologies, and others, announced that they are offering switches that are tightly integrated with their rack or blade HPC systems. The advantage here is that a lab could buy a pre-assembled, pre-configured system that includes servers, storage, and a high-performance switch that ties everything together.
Along the same lines of simplification by integration, systems vendor Linux Networx announced Xilo, a high-performance clustered storage system that integrates into its HPC system. (For more on product announcements at SC2004, see www.bio-itworld.com/news, DocFinder 6570.)