Notes from the Lab: Multicore and More


By Chris Dwan The BioTeam

Feb 15, 2006 | The first column we wrote in this space two years ago (see Lies, Liars, and Benchmarking, Jan. 2004 Bio•IT World, page 20) dealt with benchmarking. The caveats spelled out in that article are still firmly in effect:

•            Benchmarks must be representative of the actual intended use of the compute system

•            Manufacturer benchmarks tend to highlight the positive and omit the negative

•            Generic measures of “computing power” have only a passing relationship to the real-world needs of life scientists

The BioTeam recently had the opportunity to perform a set of benchmarks on one of the new quad-chip, dual-core Xeon systems from Intel. Our complete analysis is available online at bioteam.net/intel_benchmarks.

The most interesting thing to me about multicore systems is that they represent a significant push back in the ever-present single system image versus compute farm debate. For the past several years, the most cost-effective way to build a personal compute server at the 8 or 16 CPU level was to build a small cluster out of dual CPU systems. The Intel system that we evaluated contained four physical chips, each of which contained two cores. This made it an 8 CPU system before even counting any virtual CPUs. In terms of performance, administrative overhead, and cost per CPU, a single system is clearly a win over a cluster. The only major downsides are the fact that a single machine is a single point of failure, and there is a high cost to add that ninth CPU. All of the major chip manufacturers have gone multicore, and all the roadmaps I’ve seen clearly call for quad core and beyond in coming years. This means that in pretty short order we will be seeing 16 cores and 32 cores in a single chassis. This will simplify the hardware aspects of parallel computing and swing the pendulum back in favor of developers who exploit both message passing and thread-based parallelism in their code.

Naturally, one application of these highly multicore systems will be to build clusters with ever-increasing CPU count. On the other hand, the market for a “personal” supercomputer exists at approximately the 4 to 16 CPU level, regardless of the exact configuration that gets the user there. Underlying this is a comforting reality: The same interface can be used to manage workflows on an SMP machine as is used on a larger cluster.

Web Services
We’ve been talking about Web services interfaces to cluster tools for over a year now, and the second tier of services is finally starting to emerge. I recently learned that the University of Minnesota’s Center for Computational Genomics and Bioinformatics is making use of the Web services interface on their cluster to build “semantic services” involving adding information from the BioMOBY and Gene Ontology projects to raw cluster computations. In addition, they are publishing services integrated with legume genome annotation databases that they maintain. They recently demonstrated this technology at the Plant and Animal Genome conference. Tying systems together at well-defined, standard interfaces makes it possible (though still far from simple) to build a truly integrated computational universe for genomic information.

 Inside-SonyPSP.jpg
 

HAIL HANDHELDS: The Sony PSP
makes a "great mobile monitoring
platform for IT staff," says
BioTeam's Chris Dagdigian.

Getting the Most from Grid Engine
Last year, Chris Dagdigian used this space to talk about his open-source “xml-qstat” tool that is being used to transform raw Grid Engine XML status data into a variety of publishable forms including Web pages and syndicated XML feeds (RSS). (See Adventures in XML Transformation, July 2005 Bio•IT World, page 40.) Since then, xml-qstat has been entirely rewritten from the ground up and now plugs directly into the Apache Cocoon XML publishing framework. New features include sensible XML data caching to avoid stressing the Grid Engine subsystem, Atom 1.0-compliant XML syndication feeds, an XSL-template-driven documentation framework, and even automatic detection and special mobile device output for Sony PlayStation Portable (PSP) systems. Choosing to support the Sony handheld gaming device was not a joke or a design afterthought. According to Chris Dagdigian, “The Sony PSP has a large high-quality color display, built-in wireless networking, and a Web browser that supports almost all of the XHTML and CSS1/CSS2 Web publishing standards. It makes a great mobile monitoring platform for IT staff who need to keep a constant eye on grid and cluster status information.” In addition to continuing development of xml-qstat, Chris also recently launched a new Web site and community wiki for Grid Engine users that can be found at http://gridengine.info. The new site aggregates and consolidates links, documentation, resources, and HOWTOs previously buried deep within mailing list archives and other hard-to-find locations.

Coming Soon: Server Virtualization Bakeoff
One of the newest arrivals in our hardware lab recently has been a fully loaded Rackable Systems 3118 Storage Server, which we plan on using as a platform for evaluating server and OS virtualization products including Xen, VMware, and Microsoft Virtual Server. Server virtualization is becoming more and more popular for a number of use cases including server consolidation, software development, QA testing, and training applications. The Rackable 3118 is well suited for testing virtualization products as the use of two multicore Opteron CPUs hits the pricing/licensing sweet spot for the commercial products and the pair. Sixteen 250GB disk drives and a pair of 3Ware SATA controllers will allow each virtual server access a dedicated block-level storage volume. Expect a full column covering the results of our virtualization trials in the future.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

sas whitepaper92

Managed Innovation, Assured Compliance 
sponsored by SAS
Discovery organizations are identifying a lot of promising compounds, but clinical research processes haven't kept pace with timely testing of all those potential therapies. This white paper describes how SAS® Drug Development supports true innovation across the clinical trial process.

In this white paper you will learn how to:

  • Assemble data to foster better collaboration
  • Get up-to-date information during clinical trials
  • Make informed decisions earlier in the trial process Download now 


BlueArc white paper image 1

Addressing Life Sciences Constantly Growing Data Challenges Research Environments
sponsored by BlueArc
The continued explosion of raw experimental data, the increased use of video, the growing adoption of new data retention practices, and the move to high throughput computational workflows are all placing new demands on the way life sciences organizations store and manage their data.

Download this white paper to learn about:

  • Factors driving the data explosion in the life sciences
  • New data management issues that must be addressed
  • HPC trends that are placing new demands on storage
  • Storage solution attributes that address performance, manageability, and energy efficiency. Download now 


isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



Life Science Webcasts & Podcasts

Adobe

Hospital Paperwork No Longer Has to Be an In-patient Procedure 

Adobe podcast imageHow many times have you filled out that same patient registration form when visiting a doctor or the hospital? If you are a hospital administrator, nurse or registrar, you know that your patients and particularly your staff have managed hundreds of consent forms for medication, procedures, anesthesia, and HIPAA. Paperwork redundancy has become a significant bottleneck in the healthcare system. In this podcast, we’ll learn about how Adobe solutions for healthcare can help you streamline your paperwork and stop making paperwork an in-patient procedure.

Download Now 



More Podcasts

Job Openings

Oxford Nanopore Technologies, Oxford, UK
We seek a highly motivated individual to lead the administration, expansion and maintenance of our IT infrastructure, supporting our business operations and technological development of a DNA third generation sequencing system.  Includes administration and configuration of core corporate servers, high performance scientific computing and disk systems, security systems, network infrastructure and backups, maintenance of service levels, implementation of any IT related legal compliance issues and policies, and disaster recovery. to apply: www.nanoporetech.com/vacancies

For reprints and/or copyright permission, please contact The YGS Group, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.