Benchmarks: Your Performance May Vary

SUPERCOMPUTING · The new list of Top 500 supercomputers is coming. But do the results help with purchase plans?

BY SALVATORE SALAMONE

October 14, 2004 | At next month's SC2004 supercomputer conference, in Pittsburgh, a new list of the world's top 500 supercomputers will be announced. If recent trends are any indication, several systems dedicated to life science research will be near the top.

Vendors on the list get bragging rights — at least for six months, until the next list appears. But the real question is whether the information about the computers on the list can help life science organizations better select a system to address their high-performance computing (HPC) needs.

The answer, unfortunately, is no.

The Top 500 list (see top500.org) uses a test called Linpack to benchmark the peak performance of a computer, measured in billions of floating-point operations per second (gigaFLOPS). One of Linpack's strengths is that it is widely used, so performance results are available for a wide variety of systems.

However, a benchmark is only as good as what it measures. Linpack reflects the performance of a system in solving a dense set of linear equations. How applicable is that to a life science HPC system?


Complex Numbers 
For simulations of molecular interactions in protein-folding algorithms, the Linpack test is helpful in comparing one system to another. But for many applications, such as BLAST runs and database searches, the lone test provides little information about how fast a specific application will run on a particular machine. Moreover, a benchmark that produces one number ("peak performance," in Linpack's case) may be too simple to accurately characterize the capability of a complex system.

Thus, life scientists typically use other approaches to compare system performance.

"Linpack gives you a good idea of processing power," says Kumaran Kalyanasundaram, chairman of the Standard Performance Evaluation Corp.'s (SPEC) High Performance Group, an association of vendors and end-users that develops benchmarks. However, he says, the result provides little clue to the performance of memory, bandwidth, or input/output subsystems, all of which can affect an application's performance.

SPEC HPG is addressing this issue. Its tests also deliver an aggregate benchmark number for a system, but that number is based on running several applications that stress different aspects of an HPC machine. For example, SPEC CINT2000 includes a suite of individual tests that perform a variety of compute-intensive tasks, including data compression, ray tracing, compiling, and querying an object-oriented database. Life science organizations can go to spec.org and see the results of benchmark tests of various HPC systems.

Another benchmarking tool with a clear life science bent is the Informatics Benchmarking Toolkit (IBT) from The BioTeam. The IBT, the BioTeam says, answers the question "How fast does this machine execute my data analysis algorithms the way I use them?"

The toolkit provides benchmark results for the informatics applications BLAT, BLAST, HMMer, and GROMACS. Users are encouraged to post their results on the IBT site.

A new release of the Bioinformatics Benchmark System (BBS) might give a sense of the impact of specific system components on running informatics algorithms. BBS 3 includes new and updated benchmarks for a number of informatics applications, including mpiBLAST, HMMer, and NCBI BLAST.

"The idea is to give [life scientists] a test bed or framework that gives them a standard way to baseline application performance," says BBS developer Joe Landman, CEO of Scalable Informatics. The software tests subsystem-related aspects such as bus speed, memory configurations, and disk drive performance.


Speed Really Matters 
Benchmark tools are used in many ways today. "If we get a new system in, I will run a benchmark," says Jason Stajich of Duke University's Department of Molecular Genetics and Microbiology. "I like to see how my systems compare to others."

Benchmarks are useful, Stajich says, but like many of his colleagues, he's most concerned with how fast applications run. He supplements benchmarks with another approach. "Rather than running a specific benchmark, run a set of test data on a system and get a sense of how long the tests take to run," he says. "Then try the same set of tests on a different system."

"So instead of getting a single performance number [from a benchmark test], we know how long a job is going to take to run," Stajich says. Then it's possible to make a case for a system with a certain amount of processing capacity. * 







For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.