Lies, Liars, and Benchmarking


INside the Box - Bill van Etten

Lies, Liars, and Benchmarking


There is very little happening "inside the (IT) box" that members of bio-IT consulting firm The BioTeam haven't tackled for clients. They blend expertise in information technology and bioscience to solve complex informatics and IT infrastructure challenges. This month, Bio·IT World introduces Inside the Box, a regular column written by The BioTeam. It's intended to keep readers current on the fast-changing world of IT in life sciences. We've asked them to emphasize solutions gleaned from real-world engagements without compromising their clients.

This first column, on the merit (or lack thereof) of benchmarks, is by Bill Van Etten, a geneticist who veered into informatics. His colleagues Michael Athanas, a high-energy physicist turned large-scale-computing expert, and Chris Dagdigian, a Bioperl founder and cluster computing expert, will contribute to future columns. (For more information, see "The BioTeam: Riders of the Storm," March 2003 Bio·IT World, page 24.)


January 12, 2004 | THE RELEASE OF ANY new computer hardware is generally accompanied by performance benchmarks trumpeting the offering's advantage over competing solutions. It's all part of the product introduction game. These new benchmarks inevitably trigger a flame war on Slashdot.org (or the like), where riled-up contributors exclaim that the benchmarks are false, misleading, or not performed fairly.

Who's lying? The flame-throwers? Or the manufacturers?

The somewhat unsatisfying answer is neither and both. Hardware manufacturers don't so much lie as selectively reveal facets of the truth — usually their highest marks, sometimes their average grades, and rarely their poorer scores. The Slashdot crowd becomes upset when the particular facet of the truth revealed is not pertinent to them or their particular use. In the hardware manufacturers' defense, the large number of variables that contribute to the performance of a particular use of a particular machine makes it impossible to please everyone.

More Online 

To get started with the Informatics Benchmark Tool, see PrintLinks.
 
As a scientist, my beef with benchmark analyses isn't so much with their results or methodologies as with the experimental question being asked or, more often, not being asked. How fast a machine executes a random benchmark test is not very useful to me. I don't care much about how a particular benchmarking application was compiled or the optimized libraries it was linked to. This sort of information is useful to hardware and software engineers — but not to me.

As an informatics researcher, I want to know, "How fast does this machine execute my data analysis algorithms the way I use them?" I want my research to be the benchmark test, and if it can be tweaked in some way on a particular machine to make it a little or even a whole lot faster, even better.

When our clients ask which hardware has the best price/performance (and they frequently do), we recommend they ask themselves the above question. The answer isn't the same for everyone.

Hardware manufacturers often ask us the opposing question. For example, AMD requested a cross-platform benchmark analysis of the scientifically meaningful use of the most important informatics algorithms. We responded by reminding them that "scientifically meaningful" and the "most important informatics algorithms" are relative to the observer, that benchmark tests reveal only a facet of the truth, and that a static benchmark result isn't the answer to a broadly meaningful question.

We suggested, instead, that we could build a cross-platform and extensible benchmarking tool that lets researchers answer the more important question themselves ("How fast does this machine execute my data analysis algorithms the way I use them?"). AMD must be reasonably confident that machines based on its microprocessors are competitive because it agreed to sponsor the development of such a tool and to make it open source.

By the time you read this, you should be able to download the open-source distribution of the Informatics Benchmarking Tool (IBT) from bioteam.net (development sponsored by AMD). The use of IBT requires a Unix operating system, a C compiler, GNU Make, and Perl 5.6.1 or later.

By default, IBT compiles NCBI Blast, Blat, Gromacs, and Hmmer from source code within the local environment, benchmarks the execution of several uses of each algorithm, and generates a Scalar Vector Graphics (SVG) document for viewing benchmark results and hardware runtime utilization (CPU, RAM, disk, and network).

"This is great," you say, "but IBT's use of tBlastx (or whatever) is significantly different than the way I use it." IBT uses the Test::Harness Perl module for the construction of a suite of tests of a particular application, making it easy to construct benchmark tests that model local use cases.

"Yeah," you respond, "but I use the application 'Foo,' which depends on the library 'Bar,' and IBT doesn't contain either of these." IBT uses GNU "make," "autoconf," and "libtool" to orchestrate the compilation and execution of informatics algorithms, making it readily extensible to the addition of other applications.

"OK, but what good is it if I can't compare my benchmark results to those of others on other hardware, operating systems, etc.?"

In addition to producing an SVG document that you can browse, IBT produces a BoulderIO document containing the benchmark results and runtime information that may be used for merging the results of many independent benchmark tests, permitting the direct comparison of hardware, operating system, compiler, compiler settings, and optimized libraries on the performance of your data analysis algorithms the way you use them.

You may also publish your benchmark results to bioteam.net. There, users are free to compare their results to the results of others — come and take a look.

So, what exactly is the fastest machine for executing your data analysis algorithms the way you use them? You can find out for yourself.



Bill Van Etten is a consultant for The BioTeam and can be reached at bill@bioteam.net. 



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 





For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .