YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Going Green for 1000 Genomes


Baylor data center deploys Rackable systems for power savings and performance.

By Kevin Davies

Sept. 5, 2008 | As the NIH 1000 Genomes project kicks into top gear, data center managers are racing to come to terms with the glut of data that they will have to manage from the next-generation sequencing instruments. “This is a problem for all the [genome] centers to deal with, as well as the repositories such as NCBI,” says David Parker, Baylor College of Medicine Human Genome Sequencing Center systems analyst.

Parker says the 1000 Genomes project not only represents a dramatic difference in the scale of the data that are being generated and collected, but “it also changes the compute characteristic.” Parker says that the old Sanger sequencing instruments (“We loved those!”) had a relatively low requirement for power processing. “We could typically have small, slow processors. We never bought the fastest processor because we didn’t need that much CPU time. We could run on single-core machines, without much RAM.”

The image analysis requirements with the new sequencing platforms change all that. The Baylor genome center chiefly uses 454 machines (as it did in completing the sequence of James Watson last year—See, “Project Jim,” Bio•IT World, June 2007), as well as Illumina and most recently Applied Biosystems SOLiD machines.

Now, says Parker, “We’re buying the fastest multi-core processors we can get with all the RAM we can jam in them.” Network performance is a much bigger issue now, as Parker tries to squeeze every ounce of performance from the system.

In preparing for the 1000 Genomes project, Parker says, “We’ve really rethought our entire architecture and basically started from scratch.” In a way, the timing was perfect: having run out of physical space, Parker’s team was expanding the center in a remote location. For once, space is not a problem.

It’s not much more than a back-of-the-envelope calculation, but Parker estimates that the center’s storage requirements are set to expand by 150 Terabytes (TB) a quarter. “That’s just for the original data, not workspace. Workspace is typically 50-100% of that. So roughly 250 TB/quarter,” says Parker.

But there is some good news on the primary data front. Researchers are actually disposing of it. “We’re making progress on it; it’s always a fun subject!” says Parker. “I was buying beer for everyone in the center last week because we actually deleted 15 TB of old data!”

Rack ‘Em
On the storage side, Parker says the platforms have to be more robust, scalable, and able to meet “a constant demand by the researchers for more storage.” When it comes to storage, Parker isn’t terribly picky and says he’s considered just about everything. “We’ve looked at Sun… We’ve done pilots with Isilon, IBRIX, Panasas, BlueArc, NetApp, RapidScale (the old TeraScale stuff that Rackable bought)… There’s no such thing as storage you can’t use.”

He still uses Hitachi SAN-based storage with its virtualization capabilities. “But it’s expensive. I don’t know that’s going to be financially wise as we get into the multi-petabyte range. So we’re looking at alternatives, like LUSTER and CLUSTERFS that can do the same for a lot less money.”

One vendor that Parker is sold on is Rackable Systems, a Fremont, Calif. server/storage manufacturer that prides itself on its “ecological” equipment. Parker first started using Rackable servers about five years ago, after contacting “every cluster vendor on the planet,” including IBM, HP, and Western Scientific, to assess the costs, benefits, and alternatives. “They have the kind of density for floor space I can get from blade servers, yet there is no cost premium for that density. Blade servers still cost more money than 1-U servers. And you don’t have the flexibility. Blade servers are certainly wonderful things, but I can buy the same number of processors in the same footprint with Rackable, pay a lot less money and use a lot less electricity and generate a lot less heat.”

The reason for Rackable’s “green” reputation is that the servers use DC power—the servers feature rectifiers at the top of the cabinet. That improves efficiency and saves a lot of trees. “We estimate it saves 30% on the power and on the cooling,” says Parker.

Parker says he’s not sacrificing anything by giving up a blade server. “If you buy a blade server the motherboards are proprietary,” he says. For example, if he buys from HP, he can only buy HP replacements. Rackable, on the other hand, uses off-the-shelf parts. “If I don’t like the mother boards they ship me, I can buy them myself.”

Aside from flexibility, lower cost, and a good number of processors per rack, Parker says, “I also like them because they just do excellent work. It rolls in; it’s wired the way I want it; it has the network switches I want. It’s just beautiful!”

Before joining Baylor, Parker had a consulting company with his brothers. But in 2001, he laughs, “Our customers went out of business!” He calls the past five years “a great adventure.” It sounds like the fun is just beginning. 

___________________________________________________

This article appeared in Bio-IT World Magazine.

Subscriptions are free for qualifying individuals.  Apply Today.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.