October 14, 2004 | As an independent consulting practice, The BioTeam makes a concerted effort to stay on top of the tools and technologies that enable research computing. Admittedly, putting current and emerging technologies to work in our lab is an enjoyable task.
First up is storage, traditionally a key area for bio-IT practitioners, and an IT category where product acquisition costs range from a few thousand to hundreds of thousands of dollars. On the low to mid-range end of the scale, we've been working with Apple's XRAID product on both Linux- and OS X-based client systems. The direct-attached XRAID achieves high capacity at a good price point by combining inexpensive hot-swap IDE disks with dual RAID controllers and Fibre Channel host connectivity. XRAID can capably serve small workgroups and is also making an appearance as inexpensive scratch or temporary space within large computing infrastructures.
Rackable Systems' S3116 Storage Server is another low to mid-range system that has seen extensive use in our lab. The S3116 combines server, controllers, and hot-swap disks into a single dense 3U chassis suitable for use as a network file server (NFS). Available in Opteron or Xeon flavors with SATA or SCSI disks, the system we tested used dual Xeon CPUs and came with 16 Western Digital 250GB serial ATA drives split across a pair of Adaptec eight-channel SATA RAID controllers. Multiple gigabit ethernet cards connect the S3116 to the network. Performance was excellent using Suse Linux 9.1 as the server operating system.
At the higher end of storage, we were lucky enough to obtain a Panasas ActiveScale Storage Cluster shelf for several months of evaluation. Panasas can work as a traditional network-based mixed NFS or common Internet file system (CIFS) server, but it really shines in settings where client hosts are able to load the special DirectFLOW software stack. A Panasas shelf contains individual storage blades with fast network connectivity and a large internal cache. The storage blades are controlled by one or more "director" blades, resulting in a separation of control paths from data access paths. Client hosts running DirectFLOW software can directly access multiple storage blades concurrently for significant I/O throughput. We were unable to stress-test the Panasas shelf, as it easily saturated our gigabit ethernet switch without appearing to break a sweat. One potential downside is the range of officially supported platforms for the DirectFLOW client software. As a general rule, The BioTeam believes that scientific and research needs should be a primary driver for bio-IT-related decisions. Vendor requirements that inhibit or limit research computing flexibility should be carefully considered.
A Desktop Cluster and a New Favorite
The most exciting piece of hardware to enter our lab recently is a pre-production Orion DT-12 "desktop cluster" system from Orion Multisystems. Orion, which exited from stealth-mode operation in September with huge coverage from the international business and technology press, is waging an uphill battle to redefine the high-end scientific workstation market. With a 12-node/12-CPU Linux cluster desktop product and a 96-CPU deskside model, Orion is able to ship functional, fully integrated clusters that require no special cooling and, incredibly, can be plugged into standard electrical outlets. We have extensively experimented with the pre-production unit and are quite impressed; benchmarks and real analysis will have to wait for the production unit.
Several BioTeam staffers have found a new favorite 32-bit computing platform. The Sun Microsystems V60 server is 1 U in size, sports dual Xeon CPUs, a pair of PCI-X buses, and support for 6 GB of memory. The systems ran fast and flawlessly in tests, and we are excited to see Sun finally offering its x86-based systems with highly competitive list pricing.
In this column in June, I addressed the search for alternatives to Red Hat Enterprise Linux. The BioTeam has continuously been impressed with Suse Linux, and has all but completed an internal Suse 9.1 migration on research and production server systems. Suse 9.1 was also used as the base OS for a major computational biology cluster deployment and integration project, with very positive results.
Other software currently seeing heavy use in the lab is the open-source Taverna distributed computing workflow suite. Internal experimentation in building cluster workflows and pipelines with Taverna tools has been successful — so successful, in fact, that Taverna will most likely be the focus of a future column.
What's with the Poky USB?
Testing has not been all fun, though, and we do have a few persistent complaints aimed particularly at server manufacturers. One of the techniques we have the most fun with involves booting and deploying an Apple Xserve cluster using nothing but a bootable iPod attached to a FireWire port. We would like to do something similar with the wide range of non-Apple servers that ship with easily accessible USB ports. But we have yet to find a server system with external USB that can reliably handle anything higher than the USB 1.1 protocol, which is simply too slow for use with mass storage devices. Hour upon boring hour passed while developing a USB disk-based cluster-bootstrapping system. Note to server vendors: Get a clue — when you put a USB port on the front bezel of your system, your customers might want to attach something more interesting than a keyboard or a mouse.
What will be coming through the BioTeam lab next? Some of the tools and technologies that have caught our eye include the open-source Lustre cluster file system (recently branded and made available from HP), performance-optimized compilers from Intel, SGI's Altix product line, Apple's XSAN software, IBM's PowerPC-based servers, Western Digital's RAID Edition, Yellow Dog Linux for PowerPC, and systems management tools from OpenCountry. Stay tuned.
Chris Dagdigian is a self-described infrastructure geek currently employed by The BioTeam. E-mail: firstname.lastname@example.org.