Feb 15, 2006 | The first column we wrote in this space two years ago (see Lies, Liars, and Benchmarking, Jan. 2004 Bio•IT World, page 20) dealt with benchmarking. The caveats spelled out in that article are still firmly in effect:
• Benchmarks must be representative of the actual intended use of the compute system
• Manufacturer benchmarks tend to highlight the positive and omit the negative
• Generic measures of “computing power” have only a passing relationship to the real-world needs of life scientists
The BioTeam recently had the opportunity to perform a set of benchmarks on one of the new quad-chip, dual-core Xeon systems from Intel. Our complete analysis is available online at bioteam.net/intel_benchmarks.
The most interesting thing to me about multicore systems is that they represent a significant push back in the ever-present single system image versus compute farm debate. For the past several years, the most cost-effective way to build a personal compute server at the 8 or 16 CPU level was to build a small cluster out of dual CPU systems. The Intel system that we evaluated contained four physical chips, each of which contained two cores. This made it an 8 CPU system before even counting any virtual CPUs. In terms of performance, administrative overhead, and cost per CPU, a single system is clearly a win over a cluster. The only major downsides are the fact that a single machine is a single point of failure, and there is a high cost to add that ninth CPU. All of the major chip manufacturers have gone multicore, and all the roadmaps I’ve seen clearly call for quad core and beyond in coming years. This means that in pretty short order we will be seeing 16 cores and 32 cores in a single chassis. This will simplify the hardware aspects of parallel computing and swing the pendulum back in favor of developers who exploit both message passing and thread-based parallelism in their code.
Naturally, one application of these highly multicore systems will be to build clusters with ever-increasing CPU count. On the other hand, the market for a “personal” supercomputer exists at approximately the 4 to 16 CPU level, regardless of the exact configuration that gets the user there. Underlying this is a comforting reality: The same interface can be used to manage workflows on an SMP machine as is used on a larger cluster.
We’ve been talking about Web services interfaces to cluster tools for over a year now, and the second tier of services is finally starting to emerge. I recently learned that the University of Minnesota’s Center for Computational Genomics and Bioinformatics is making use of the Web services interface on their cluster to build “semantic services” involving adding information from the BioMOBY and Gene Ontology projects to raw cluster computations. In addition, they are publishing services integrated with legume genome annotation databases that they maintain. They recently demonstrated this technology at the Plant and Animal Genome conference. Tying systems together at well-defined, standard interfaces makes it possible (though still far from simple) to build a truly integrated computational universe for genomic information.
HAIL HANDHELDS: The Sony PSP
makes a "great mobile monitoring
platform for IT staff," says
BioTeam's Chris Dagdigian.
Getting the Most from Grid Engine
Last year, Chris Dagdigian used this space to talk about his open-source “xml-qstat” tool that is being used to transform raw Grid Engine XML status data into a variety of publishable forms including Web pages and syndicated XML feeds (RSS). (See Adventures in XML Transformation
, July 2005 Bio•IT World
, page 40.) Since then, xml-qstat has been entirely rewritten from the ground up and now plugs directly into the Apache Cocoon XML publishing framework. New features include sensible XML data caching to avoid stressing the Grid Engine subsystem, Atom 1.0-compliant XML syndication feeds, an XSL-template-driven documentation framework, and even automatic detection and special mobile device output for Sony PlayStation Portable (PSP) systems. Choosing to support the Sony handheld gaming device was not a joke or a design afterthought. According to Chris Dagdigian, “The Sony PSP has a large high-quality color display, built-in wireless networking, and a Web browser that supports almost all of the XHTML and CSS1/CSS2 Web publishing standards. It makes a great mobile monitoring platform for IT staff who need to keep a constant eye on grid and cluster status information.” In addition to continuing development of xml-qstat, Chris also recently launched a new Web site and community wiki for Grid Engine users that can be found at http://gridengine.info
. The new site aggregates and consolidates links, documentation, resources, and HOWTOs previously buried deep within mailing list archives and other hard-to-find locations.
Coming Soon: Server Virtualization Bakeoff
One of the newest arrivals in our hardware lab recently has been a fully loaded Rackable Systems 3118 Storage Server, which we plan on using as a platform for evaluating server and OS virtualization products including Xen, VMware, and Microsoft Virtual Server. Server virtualization is becoming more and more popular for a number of use cases including server consolidation, software development, QA testing, and training applications. The Rackable 3118 is well suited for testing virtualization products as the use of two multicore Opteron CPUs hits the pricing/licensing sweet spot for the commercial products and the pair. Sixteen 250GB disk drives and a pair of 3Ware SATA controllers will allow each virtual server access a dedicated block-level storage volume. Expect a full column covering the results of our virtualization trials in the future.