Lies, Liars, and Benchmarking

By BIO-IT World

INside the Box - Bill van Etten

Lies, Liars, and Benchmarking


There is very little happening "inside the (IT) box" that members of bio-IT consulting firm The BioTeam haven't tackled for clients. They blend expertise in information technology and bioscience to solve complex informatics and IT infrastructure challenges. This month, Bio·IT World introduces Inside the Box, a regular column written by The BioTeam. It's intended to keep readers current on the fast-changing world of IT in life sciences. We've asked them to emphasize solutions gleaned from real-world engagements without compromising their clients.

This first column, on the merit (or lack thereof) of benchmarks, is by Bill Van Etten, a geneticist who veered into informatics. His colleagues Michael Athanas, a high-energy physicist turned large-scale-computing expert, and Chris Dagdigian, a Bioperl founder and cluster computing expert, will contribute to future columns. (For more information, see "The BioTeam: Riders of the Storm," March 2003 Bio·IT World, page 24.)


January 12, 2004 | THE RELEASE OF ANY new computer hardware is generally accompanied by performance benchmarks trumpeting the offering's advantage over competing solutions. It's all part of the product introduction game. These new benchmarks inevitably trigger a flame war on Slashdot.org (or the like), where riled-up contributors exclaim that the benchmarks are false, misleading, or not performed fairly.

Who's lying? The flame-throwers? Or the manufacturers?

The somewhat unsatisfying answer is neither and both. Hardware manufacturers don't so much lie as selectively reveal facets of the truth — usually their highest marks, sometimes their average grades, and rarely their poorer scores. The Slashdot crowd becomes upset when the particular facet of the truth revealed is not pertinent to them or their particular use. In the hardware manufacturers' defense, the large number of variables that contribute to the performance of a particular use of a particular machine makes it impossible to please everyone.

More Online 

To get started with the Informatics Benchmark Tool, see PrintLinks.
 
As a scientist, my beef with benchmark analyses isn't so much with their results or methodologies as with the experimental question being asked or, more often, not being asked. How fast a machine executes a random benchmark test is not very useful to me. I don't care much about how a particular benchmarking application was compiled or the optimized libraries it was linked to. This sort of information is useful to hardware and software engineers — but not to me.

As an informatics researcher, I want to know, "How fast does this machine execute my data analysis algorithms the way I use them?" I want my research to be the benchmark test, and if it can be tweaked in some way on a particular machine to make it a little or even a whole lot faster, even better.

When our clients ask which hardware has the best price/performance (and they frequently do), we recommend they ask themselves the above question. The answer isn't the same for everyone.

Hardware manufacturers often ask us the opposing question. For example, AMD requested a cross-platform benchmark analysis of the scientifically meaningful use of the most important informatics algorithms. We responded by reminding them that "scientifically meaningful" and the "most important informatics algorithms" are relative to the observer, that benchmark tests reveal only a facet of the truth, and that a static benchmark result isn't the answer to a broadly meaningful question.

We suggested, instead, that we could build a cross-platform and extensible benchmarking tool that lets researchers answer the more important question themselves ("How fast does this machine execute my data analysis algorithms the way I use them?"). AMD must be reasonably confident that machines based on its microprocessors are competitive because it agreed to sponsor the development of such a tool and to make it open source.

By the time you read this, you should be able to download the open-source distribution of the Informatics Benchmarking Tool (IBT) from bioteam.net (development sponsored by AMD). The use of IBT requires a Unix operating system, a C compiler, GNU Make, and Perl 5.6.1 or later.

By default, IBT compiles NCBI Blast, Blat, Gromacs, and Hmmer from source code within the local environment, benchmarks the execution of several uses of each algorithm, and generates a Scalar Vector Graphics (SVG) document for viewing benchmark results and hardware runtime utilization (CPU, RAM, disk, and network).

"This is great," you say, "but IBT's use of tBlastx (or whatever) is significantly different than the way I use it." IBT uses the Test::Harness Perl module for the construction of a suite of tests of a particular application, making it easy to construct benchmark tests that model local use cases.

"Yeah," you respond, "but I use the application 'Foo,' which depends on the library 'Bar,' and IBT doesn't contain either of these." IBT uses GNU "make," "autoconf," and "libtool" to orchestrate the compilation and execution of informatics algorithms, making it readily extensible to the addition of other applications.

"OK, but what good is it if I can't compare my benchmark results to those of others on other hardware, operating systems, etc.?"

In addition to producing an SVG document that you can browse, IBT produces a BoulderIO document containing the benchmark results and runtime information that may be used for merging the results of many independent benchmark tests, permitting the direct comparison of hardware, operating system, compiler, compiler settings, and optimized libraries on the performance of your data analysis algorithms the way you use them.

You may also publish your benchmark results to bioteam.net. There, users are free to compare their results to the results of others — come and take a look.

So, what exactly is the fastest machine for executing your data analysis algorithms the way you use them? You can find out for yourself.



Bill Van Etten is a consultant for The BioTeam and can be reached at bill@bioteam.net. 

White Papers & Special Reports

thomson reuters image
Biomarkers: An Indispensible Addition to the Drug Development Toolkit
Examining the Potential of Biomarkers
Sponsored by Thomson Reuters

Biomarkers are becoming an essential part of clinical development. In this white paper, Thomson Reuters provides insight from experts in industry and academia, and explores the role of biomarkers as evaluative tools in improving clinical research and the challenges this presents.

Discover the potential of biomarkers to:

  • Improve decision making
  • Accelerate drug development
  • Reduce development costs


BlueArc_Scientific Data
Scientific Data Lifecycle Management: Preparing for Storage in an Uncertain Future
Sponsored by BlueArc

Managing vast and overwhelming streams of gene sequencing data today requires ultra-high performance systems and processes. With continued rapid advancement and improvements in gene sequencing, expect tomorrow’s instruments to output quantities of genomic information that will dwarf current levels. Help your organization maintain data control and prepare for the future of sequencing through this informative paper that discusses:

  • The information technology challenges of gene sequencing
  • “Intelligent” methods for data management and customization
  • System survival tips... Deciding what data to keep or delete
  • New tools to keep scientists ahead of impending data torrents


SAS Managed image
Managed Innovation, Assured Compliance
Developing, executing and managing the transformation, analysis and submission of clinical research data with SAS® Drug Development
Sponsored by SAS
Get better products to market faster. Download this white paper to discover the top ten challenges facing life science executives and how to overcome them. See how SAS Drug Development transforms clinical data into true innovation.


Life Science Webcasts & Podcasts

Presented by Trade Commission of Spain

Spain Biotech: An Engine for Economic Change 

TCS podcastDiscover how Spain is focusing on biotechnology to be an engine for economic change through gradual internationalization, development and technology transfer.

Regional governments are actively investing in public and private biology research and promoting the creation of knowledge-based companies. Spain’s human capital combined with aggressive investment in biotech research and infrastructure has led to the creation of bio-clusters.

Today, there are nearly 700 Spanish companies engaged in biotechnology, with almost 50 percent growth in funding devoted to research. In fact, spending on internal R & D in biotechnology has grown 46 percent and is close to 300 million Euros.

Access the podcast 

 



More Podcasts

Job Openings

saic_logo

MANAGER, SCIENTIFIC COMPUTING & PROGRAMMING
(Bioinformatics Manager)
SAIC-Frederick, Inc has an exciting opportunity for a Manager, Scientific Computing & Programming - Core Genoytyping Facility in Gaithersburg, Maryland.  In this role, you will lead the Bioinformatics & Analysis Group.
Master’s or equivalent required.  PhD preferred. Six years experience in development of scientific programs in high-performance computing environment including five years supporting scientific research in computational chemistry, biology, or genetics, & two years supervisory experience.  View complete job posting & apply: www.saic-frederick.com. Position #146945.





For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.