By Salvatore Salamone
December 15, 2003 | PHOENIX – One of the biggest challenges life scientists face is mining useful information out of mountains of raw data.
It takes more than raw computing power. That was the message delivered by one supercomputer user at the festival of raw computing power, SC2003.*
While many organizations are using teraFLOPS to process terabytes of data, true insight requires creativity, said Donna Cox of the National Center for Supercomputing Applications (NCSA) in her keynote speech here. Cox, a senior research scientist and associate director of experimental technologies, said the real challenge is representing data in such a way as to make it useful for discovery.
Things were simpler when computing “was a small localized group activity,” Cox said. Now, with multidisciplinary groups, dispersed internationally, collaborating virtually, new approaches are needed to visualize large amounts of incongruous data. One of the keys, she said, is creating visual metaphors that convey characteristics or concepts.
She used as an example an advertisement for a German beer. In the ad, a bottle of beer is shown in a champagne ice bucket. “[With] the beer engulfed in a champagne bucket, qualities of champagne are associated with the beer,” Cox said. “We don’t see the champagne, but the attributes are mapped onto the beer.”
Still, researchers love their floating-point operations. One of the highlights of SC was the latest list of the world’s top 500 supercomputers. Several innovative systems tailored for life sciences made the grade this time. Roaring in at number 3: Virginia Tech’s Terascale Computing Facility, a homebuilt supercomputing cluster composed of 1,100 Apple G5 computers, where each node has dual 2GHz 64-bit PowerPC processors, 4 GB of memory, and 160 GB of storage. The Virginia Tech system, just announced last month, has a peak measured performance of 10.28 teraFLOPS (10.28 trillion floating point operations per second).
Virginia Tech will use the cluster to support work in computational chemistry, molecular statistics, and molecular modeling of proteins. The Virginia Tech system is only the third system ever to be benchmarked with a peak performance that exceeds 10 teraFLOPS.
The other major notable life science computer to make the list for the first time is IBM’s Blue Gene/L Prototype. The system is ranked 73rd on the list with an official measured peak performance of 1.435 teraFLOPS. IBM’s Blue Gene program is devoted to developing new hardware and new protein folding algorithms.
A number of university supercomputer systems, which will be dedicated to scientific research, also made the list for the first time. Among the new university entrants in the top 100 are the Chinese Academy of Science, the Korea Institute of Science and Technology, and the University of Liverpool.
Adoption of of clusters continues to increase in high-performance computing environments. Seven of the top ten computers on the new Top 500 list are clusters. On last November’s Top500 list, there were only two. All told, 208 cluster systems made the most recent list. For sheer performance, the total combined processing power of the entire top 500 supercomputers is 528 teraFLOPS. Six months ago, when the previous list was released, the total combined power was 375 teraFLOPS.
SC2003 had the slick booths associated with most trade shows, but attendees could be seen in intense discussions with vendors about high-performance computing issues. The conference network, Scinet, required installation of 55 miles of fiber, supporting a 40Gbps backbone. Next year, organizers hope to undertake a new initiative called StorCloud that would provide bandwidth of 1 terabyte per second and demonstrate innovative management and allocation technologies.
* SC2003, Nov. 15-21, Phoenix