September 8, 2008
| Bio-IT World > XXX-Rated
XXX-Rated


By BIO-IT World


XXX-Rated

No, it's not what you might be thinking. The Xs stand for Xserve G5, Xserve RAID, and Xsan, which are server, storage, and software products from Apple Computer.

The BioTeam has had many opportunities to experiment with and deploy Apple's Xserve G5 (a 1U server with dual 2GHz G5 processors) and Xserve RAID (3U, 5.6TB Fiber Channel RAID storage device). However, through our recent efforts in the deployment of a 125-node Xserve G5 cluster at the University of Pittsburgh Department of Human Genetics, we had our first opportunity to experiment with a pre-release version of Xsan, Apple's new cluster storage area network (SAN) file system. What we found may surprise and delight you.

Informatics clusters generally have a shared file system, providing all cluster nodes with a common place to read input from and write output to. However, when scaling a cluster beyond even a modest size of eight to 16 nodes and particularly when executing I/O- bound applications, this shared file system quickly becomes the bottleneck, throttling overall cluster performance.

The challenge becomes providing all cluster nodes with fast, reliable, affordable, and concurrent read/write access to common data.

Although compute and network hardware for cluster building has been commoditized and become relatively inexpensive, large-scale shared storage systems have not. Physical disks and RAID devices have become dramatically less expensive, but sharing this storage among tens and hundreds of machines simultaneously requires a completely different class of storage device. The standard solution to this shared storage problem has been to purchase a six-figure file server, rivaling if not surpassing the total cost of the compute and network hardware that contributes to a cluster.

Xsan is Apple's solution to this problem. Xsan enables the combination of one or more Xserve RAIDs ($2 per GB) with one or more (64 practical limit) Xserve G5s to assemble a scalable, shared file system for as little as $9,000 that is scalable to a practical maximum storage capacity in the petabyte range.


Remedy for Sluggish I/O
The cluster at UPitt consists of 121 Xserve G5 cluster nodes and four specialized Xserve G5 "head nodes." Each head node is connected by Fibre Channel to a common Xserve RAID that appears to the node as local disk. However, as with other SAN devices, each head node can have access to only its own dedicated portion of the RAID. The head nodes provide the many integrated, shared network services needed to transform this collection of discrete computers into a single virtual compute resource. One such network service is the network file system (NFS). NFS permits each head node to make its local file system (including its portion of the RAID) available for read/write access to the cluster. However, given a cluster of this size, the resulting NFS client/server ratio is 120:1 and would result in unacceptably poor I/O performance. Alternative solutions might be:

  • Locate different NFS shares on separate NFS servers (however, the NFS client/server ratio would still be 120:1 for each share).

  • Replicate the data within NFS shares on each of the NFS servers (this works for read-only data but results in a concurrency problem for data that are modified over time).

  • Break out the big bucks and buy a beefy file server.


STEELY SAN: University of Pittsburgh's cluster consists of 121 Xserve G5 nodes and four G5 "head nodes" linked to RAID storage.

With Xsan, the options improve significantly. Xsan manages concurrent disk access of many machines connected to a common set of Fibre Channel storage devices. This means that all four head nodes "see" all of the disks on the Xserve RAID as local disks, rather than only their dedicated portion. Since Xsan manages concurrency at the file level, rather than the volume level (as many SANs do), all four head nodes can share the RAID by NFS to the cluster simultaneously. This permits fractions of the cluster compute nodes to gain access to the common shared file system through independent NFS servers, in this case reducing the NFS client/server ratio by a factor of four.


Putting Xsan to the Test
In our experimental tests of this pre-release version of Xsan, we performed I/O benchmarking tests using "bonnie" (a common I/O benchmarking tool) executed on the cluster compute elements against a common NFS share point. Executions of bonnie were launched simultaneously on all 120 cluster nodes using the multi-threaded version of dsh (distributed secure shell). This resulted in a measure of total I/O throughput of one or more NFS servers in the presence of 120 simultaneous clients competing for read/write access to the same physical disk. (That is, we tested 120 machines reading/writing from one disk through one NFS server, 120 machines reading/writing from one disk through two NFS servers, and so on.)

As a baseline, we measured the I/O performance of all 120 cluster nodes accessing data from a single NFS server without Xsan and observed an I/O performance of 1X. Performing the same benchmark tests in the presence of Xsan with NFS server load distributed over two Xserve G5s, we observed an NFS I/O performance a little better than 2X, over three servers a little better than 3X, and over four servers a little better than 4X. This is big news. This means that for the price of one or more additional Xserve G5s (not much when compared to a six-figure file server), you can distribute NFS server load over as many servers as you want (assuming performance remains linear over scale).

This is great, but you might be asking yourself, "Am I locked into using Apple's triple-X product offering to deploy this sort of technology on my cluster?" Although Apple might prefer it if you did, the answer is no. Apple has taken an aggressive open-technology stance within this architecture. The underlying Fibre Channel technology is standards-based. You can mix and match Fibre Channel cards and switches from Brocade, QLogic, Emulex, and so on. You can use Fibre Channel storage devices other than the Xserve RAID. Xsan's underlying cluster file system is compatible with ADIC's StorNext file system, so you can even mix in Linux, Solaris, Windows, etc., clients with corollary client software from ADIC.

We're pleased with our first experience with this pre-release version of Apple's Xsan software and look forward to the finished product.

Bill Van Etten is a consultant for The BioTeam. E-mail: bill@bioteam.net.


PHOTO BY PATRICIA NAGLE /UNIVERSITY OF PITTSBURGH

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



metaminer image(1)

MetaMiner™ Cystic Fibrosis Report,  Sponsored by GeneGo
This paper discusses the MetaMiner™ (CF) data analysis platform for a broad range of CF researchers designed to: 1. Easily assemble important biological and chemical experimental data available today in cystic fibrosis research. 2. Visualize key mechanisms leading to the disease through pathway maps and network models 3. Provide the CF community a “one stop shop” tool for uploading and analyzing experimental data in a disease-centered interface.  Download now 



Life Science Webcasts & Podcasts

Storage for Science
Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments

Sponsored by Isilon

Isilon webcast1

Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics environments. Determining how best to manage those data stores has become a significant challenge for the Researchers and IT Professionals that support them.

This webcast is intended to: 

  • Provide guidance on the many storage requirements common to Life Science research; 
  • Explain the evolution of modern data storage architectures; 
  • Summarize the major data storage architectures currently in use;
  • Present the Isilon IQ clustered storage product as a strong and flexible solution to those needs.

    Download this webcast

More Podcasts

Job Openings

Isilon Systems ~ Senior Marketing Communications Manager
Isilon Systems is the worldwide leader in clustered storage systems and software for digital content and unstructured data. We seek an experienced marketing communications professional/writer expert in creating and delivering effective and persuasive business communications. The ideal candidate can think at the strategic and conceptual level and act, simultaneously, as a highly-effective and productive individual contributor. The position is based in Seattle, WA. For additional information click here:
 

Lilly Singapore Center for Drug Discovery (LSCDD) - Associate Director of Informatics
Lead and mentor a strong team for the Bioinformatics group at the Integrative Computational Sciences (ICS) department at LSCDD towards the development of novel algorithms, data analysis methods and software tools for drug discovery. Work closely with the Software Engineering group at ICS, and collaborate with the Discovery IT organization in Europe and USA. For additional information, or to apply visit: LSCDD 

For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125 or via email to bio-itworld@theygsgroup.com.