December 15, 2004
| No, it's not what you might be thinking. The Xs stand for Xserve G5, Xserve RAID, and Xsan, which are server, storage, and software products from Apple Computer.
The BioTeam has had many opportunities to experiment with and deploy Apple's Xserve G5 (a 1U server with dual 2GHz G5 processors) and Xserve RAID (3U, 5.6TB Fiber Channel RAID storage device). However, through our recent efforts in the deployment of a 125-node Xserve G5 cluster at the University of Pittsburgh Department of Human Genetics, we had our first opportunity to experiment with a pre-release version of Xsan, Apple's new cluster storage area network (SAN) file system. What we found may surprise and delight you.
Informatics clusters generally have a shared file system, providing all cluster nodes with a common place to read input from and write output to. However, when scaling a cluster beyond even a modest size of eight to 16 nodes and particularly when executing I/O- bound applications, this shared file system quickly becomes the bottleneck, throttling overall cluster performance.
The challenge becomes providing all cluster nodes with fast, reliable, affordable, and concurrent read/write access to common data.
Although compute and network hardware for cluster building has been commoditized and become relatively inexpensive, large-scale shared storage systems have not. Physical disks and RAID devices have become dramatically less expensive, but sharing this storage among tens and hundreds of machines simultaneously requires a completely different class of storage device. The standard solution to this shared storage problem has been to purchase a six-figure file server, rivaling if not surpassing the total cost of the compute and network hardware that contributes to a cluster.
Xsan is Apple's solution to this problem. Xsan enables the combination of one or more Xserve RAIDs ($2 per GB) with one or more (64 practical limit) Xserve G5s to assemble a scalable, shared file system for as little as $9,000 that is scalable to a practical maximum storage capacity in the petabyte range.
Remedy for Sluggish I/O
The cluster at UPitt consists of 121 Xserve G5 cluster nodes and four specialized Xserve G5 "head nodes." Each head node is connected by Fibre Channel to a common Xserve RAID that appears to the node as local disk. However, as with other SAN devices, each head node can have access to only its own dedicated portion of the RAID. The head nodes provide the many integrated, shared network services needed to transform this collection of discrete computers into a single virtual compute resource. One such network service is the network file system (NFS). NFS permits each head node to make its local file system (including its portion of the RAID) available for read/write access to the cluster. However, given a cluster of this size, the resulting NFS client/server ratio is 120:1 and would result in unacceptably poor I/O performance. Alternative solutions might be:
- Locate different NFS shares on separate NFS servers (however, the NFS client/server ratio would still be 120:1 for each share).
- Replicate the data within NFS shares on each of the NFS servers (this works for read-only data but results in a concurrency problem for data that are modified over time).
- Break out the big bucks and buy a beefy file server.
STEELY SAN: University of Pittsburgh's cluster consists of 121 Xserve G5 nodes and four G5 "head nodes" linked to RAID storage.
With Xsan, the options improve significantly. Xsan manages concurrent disk access of many machines connected to a common set of Fibre Channel storage devices. This means that all four head nodes "see" all of the disks on the Xserve RAID as local disks, rather than only their dedicated portion. Since Xsan manages concurrency at the file level, rather than the volume level (as many SANs do), all four head nodes can share the RAID by NFS to the cluster simultaneously. This permits fractions of the cluster compute nodes to gain access to the common shared file system through independent NFS servers, in this case reducing the NFS client/server ratio by a factor of four.Putting Xsan to the Test
In our experimental tests of this pre-release version of Xsan, we performed I/O benchmarking tests using "bonnie" (a common I/O benchmarking tool) executed on the cluster compute elements against a common NFS share point. Executions of bonnie were launched simultaneously on all 120 cluster nodes using the multi-threaded version of dsh (distributed secure shell). This resulted in a measure of total I/O throughput of one or more NFS servers in the presence of 120 simultaneous clients competing for read/write access to the same physical disk. (That is, we tested 120 machines reading/writing from one disk through one NFS server, 120 machines reading/writing from one disk through two NFS servers, and so on.)
As a baseline, we measured the I/O performance of all 120 cluster nodes accessing data from a single NFS server without Xsan and observed an I/O performance of 1X. Performing the same benchmark tests in the presence of Xsan with NFS server load distributed over two Xserve G5s, we observed an NFS I/O performance a little better than 2X, over three servers a little better than 3X, and over four servers a little better than 4X. This is big news. This means that for the price of one or more additional Xserve G5s (not much when compared to a six-figure file server), you can distribute NFS server load over as many servers as you want (assuming performance remains linear over scale).
This is great, but you might be asking yourself, "Am I locked into using Apple's triple-X product offering to deploy this sort of technology on my cluster?" Although Apple might prefer it if you did, the answer is no. Apple has taken an aggressive open-technology stance within this architecture. The underlying Fibre Channel technology is standards-based. You can mix and match Fibre Channel cards and switches from Brocade, QLogic, Emulex, and so on. You can use Fibre Channel storage devices other than the Xserve RAID. Xsan's underlying cluster file system is compatible with ADIC's StorNext file system, so you can even mix in Linux, Solaris, Windows, etc., clients with corollary client software from ADIC.
We're pleased with our first experience with this pre-release version of Apple's Xsan software and look forward to the finished product.
Bill Van Etten is a consultant for The BioTeam. E-mail: email@example.com.
PHOTO BY PATRICIA NAGLE /UNIVERSITY OF PITTSBURGH