Lies, Liars, and Benchmarking
There is very little happening "inside the (IT) box" that members of bio-IT consulting firm The BioTeam haven't tackled for clients. They blend expertise in information technology and bioscience to solve complex informatics and IT infrastructure challenges. This month, Bio·IT World introduces Inside the Box, a regular column written by The BioTeam. It's intended to keep readers current on the fast-changing world of IT in life sciences. We've asked them to emphasize solutions gleaned from real-world engagements without compromising their clients.
This first column, on the merit (or lack thereof) of benchmarks, is by Bill Van Etten, a geneticist who veered into informatics. His colleagues Michael Athanas, a high-energy physicist turned large-scale-computing expert, and Chris Dagdigian, a Bioperl founder and cluster computing expert, will contribute to future columns. (For more information, see "The BioTeam: Riders of the Storm," March 2003 Bio·IT World, page 24.)
January 12, 2004 | THE RELEASE OF ANY new computer hardware is generally accompanied by performance benchmarks trumpeting the offering's advantage over competing solutions. It's all part of the product introduction game. These new benchmarks inevitably trigger a flame war on Slashdot.org (or the like), where riled-up contributors exclaim that the benchmarks are false, misleading, or not performed fairly.
Who's lying? The flame-throwers? Or the manufacturers?
The somewhat unsatisfying answer is neither and both. Hardware manufacturers don't so much lie as selectively reveal facets of the truth — usually their highest marks, sometimes their average grades, and rarely their poorer scores. The Slashdot crowd becomes upset when the particular facet of the truth revealed is not pertinent to them or their particular use. In the hardware manufacturers' defense, the large number of variables that contribute to the performance of a particular use of a particular machine makes it impossible to please everyone.
To get started with the Informatics Benchmark Tool, see PrintLinks.
As a scientist, my beef with benchmark analyses isn't so much with their results or methodologies as with the experimental question being asked or, more often, not being asked. How fast a machine executes a random benchmark test is not very useful to me. I don't care much about how a particular benchmarking application was compiled or the optimized libraries it was linked to. This sort of information is useful to hardware and software engineers — but not to me.
As an informatics researcher, I want to know, "How fast does this machine execute my data analysis algorithms the way I use them?" I want my research to be the benchmark test, and if it can be tweaked in some way on a particular machine to make it a little or even a whole lot faster, even better.
When our clients ask which hardware has the best price/performance (and they frequently do), we recommend they ask themselves the above question. The answer isn't the same for everyone.
Hardware manufacturers often ask us the opposing question. For example, AMD requested a cross-platform benchmark analysis of the scientifically meaningful use of the most important informatics algorithms. We responded by reminding them that "scientifically meaningful" and the "most important informatics algorithms" are relative to the observer, that benchmark tests reveal only a facet of the truth, and that a static benchmark result isn't the answer to a broadly meaningful question.
We suggested, instead, that we could build a cross-platform and extensible benchmarking tool that lets researchers answer the more important question themselves ("How fast does this machine execute my data analysis algorithms the way I use them?"). AMD must be reasonably confident that machines based on its microprocessors are competitive because it agreed to sponsor the development of such a tool and to make it open source.
By the time you read this, you should be able to download the open-source distribution of the Informatics Benchmarking Tool (IBT) from bioteam.net (development sponsored by AMD). The use of IBT requires a Unix operating system, a C compiler, GNU Make, and Perl 5.6.1 or later.
By default, IBT compiles NCBI Blast, Blat, Gromacs, and Hmmer from source code within the local environment, benchmarks the execution of several uses of each algorithm, and generates a Scalar Vector Graphics (SVG) document for viewing benchmark results and hardware runtime utilization (CPU, RAM, disk, and network).
"This is great," you say, "but IBT's use of tBlastx (or whatever) is significantly different than the way I use it." IBT uses the Test::Harness Perl module for the construction of a suite of tests of a particular application, making it easy to construct benchmark tests that model local use cases.
"Yeah," you respond, "but I use the application 'Foo,' which depends on the library 'Bar,' and IBT doesn't contain either of these." IBT uses GNU "make," "autoconf," and "libtool" to orchestrate the compilation and execution of informatics algorithms, making it readily extensible to the addition of other applications.
"OK, but what good is it if I can't compare my benchmark results to those of others on other hardware, operating systems, etc.?"
In addition to producing an SVG document that you can browse, IBT produces a BoulderIO document containing the benchmark results and runtime information that may be used for merging the results of many independent benchmark tests, permitting the direct comparison of hardware, operating system, compiler, compiler settings, and optimized libraries on the performance of your data analysis algorithms the way you use them.
You may also publish your benchmark results to bioteam.net. There, users are free to compare their results to the results of others — come and take a look.
So, what exactly is the fastest machine for executing your data analysis algorithms the way you use them? You can find out for yourself.
Bill Van Etten is a consultant for The BioTeam and can be reached at email@example.com.