IT's Alive! Notes from the Lab



 

October 14, 2004 | As an independent consulting practice, The BioTeam makes a concerted effort to stay on top of the tools and technologies that enable research computing. Admittedly, putting current and emerging technologies to work in our lab is an enjoyable task.

First up is storage, traditionally a key area for bio-IT practitioners, and an IT category where product acquisition costs range from a few thousand to hundreds of thousands of dollars. On the low to mid-range end of the scale, we've been working with Apple's XRAID product on both Linux- and OS X-based client systems. The direct-attached XRAID achieves high capacity at a good price point by combining inexpensive hot-swap IDE disks with dual RAID controllers and Fibre Channel host connectivity. XRAID can capably serve small workgroups and is also making an appearance as inexpensive scratch or temporary space within large computing infrastructures.

Rackable Systems' S3116 Storage Server is another low to mid-range system that has seen extensive use in our lab. The S3116 combines server, controllers, and hot-swap disks into a single dense 3U chassis suitable for use as a network file server (NFS). Available in Opteron or Xeon flavors with SATA or SCSI disks, the system we tested used dual Xeon CPUs and came with 16 Western Digital 250GB serial ATA drives split across a pair of Adaptec eight-channel SATA RAID controllers. Multiple gigabit ethernet cards connect the S3116 to the network. Performance was excellent using Suse Linux 9.1 as the server operating system.

At the higher end of storage, we were lucky enough to obtain a Panasas ActiveScale Storage Cluster shelf for several months of evaluation. Panasas can work as a traditional network-based mixed NFS or common Internet file system (CIFS) server, but it really shines in settings where client hosts are able to load the special DirectFLOW software stack. A Panasas shelf contains individual storage blades with fast network connectivity and a large internal cache. The storage blades are controlled by one or more "director" blades, resulting in a separation of control paths from data access paths. Client hosts running DirectFLOW software can directly access multiple storage blades concurrently for significant I/O throughput. We were unable to stress-test the Panasas shelf, as it easily saturated our gigabit ethernet switch without appearing to break a sweat. One potential downside is the range of officially supported platforms for the DirectFLOW client software. As a general rule, The BioTeam believes that scientific and research needs should be a primary driver for bio-IT-related decisions. Vendor requirements that inhibit or limit research computing flexibility should be carefully considered.


A Desktop Cluster and a New Favorite 
The most exciting piece of hardware to enter our lab recently is a pre-production Orion DT-12 "desktop cluster" system from Orion Multisystems. Orion, which exited from stealth-mode operation in September with huge coverage from the international business and technology press, is waging an uphill battle to redefine the high-end scientific workstation market. With a 12-node/12-CPU Linux cluster desktop product and a 96-CPU deskside model, Orion is able to ship functional, fully integrated clusters that require no special cooling and, incredibly, can be plugged into standard electrical outlets. We have extensively experimented with the pre-production unit and are quite impressed; benchmarks and real analysis will have to wait for the production unit.

Several BioTeam staffers have found a new favorite 32-bit computing platform. The Sun Microsystems V60 server is 1 U in size, sports dual Xeon CPUs, a pair of PCI-X buses, and support for 6 GB of memory. The systems ran fast and flawlessly in tests, and we are excited to see Sun finally offering its x86-based systems with highly competitive list pricing.

In this column in June, I addressed the search for alternatives to Red Hat Enterprise Linux. The BioTeam has continuously been impressed with Suse Linux, and has all but completed an internal Suse 9.1 migration on research and production server systems. Suse 9.1 was also used as the base OS for a major computational biology cluster deployment and integration project, with very positive results.

Other software currently seeing heavy use in the lab is the open-source Taverna distributed computing workflow suite. Internal experimentation in building cluster workflows and pipelines with Taverna tools has been successful — so successful, in fact, that Taverna will most likely be the focus of a future column.


What's with the Poky USB? 
Testing has not been all fun, though, and we do have a few persistent complaints aimed particularly at server manufacturers. One of the techniques we have the most fun with involves booting and deploying an Apple Xserve cluster using nothing but a bootable iPod attached to a FireWire port. We would like to do something similar with the wide range of non-Apple servers that ship with easily accessible USB ports. But we have yet to find a server system with external USB that can reliably handle anything higher than the USB 1.1 protocol, which is simply too slow for use with mass storage devices. Hour upon boring hour passed while developing a USB disk-based cluster-bootstrapping system. Note to server vendors: Get a clue — when you put a USB port on the front bezel of your system, your customers might want to attach something more interesting than a keyboard or a mouse.

What will be coming through the BioTeam lab next? Some of the tools and technologies that have caught our eye include the open-source Lustre cluster file system (recently branded and made available from HP), performance-optimized compilers from Intel, SGI's Altix product line, Apple's XSAN software, IBM's PowerPC-based servers, Western Digital's RAID Edition, Yellow Dog Linux for PowerPC, and systems management tools from OpenCountry. Stay tuned. * 

Chris Dagdigian is a self-described infrastructure geek currently employed by The BioTeam. E-mail: chris@bioteam.net. 




White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .