YouTube Facebook LinkedIn Google+ Twitter Xingrss  

TIGR Unplugs HP, Switches to Sun for Genome Assembly


By Kevin Davies

Sept. 18, 2006 | The Institute for Genomic Research (TIGR), one of the most prestigious genome research centers in the United States, has hauled out its long-serving HP Alpha servers and switched to a new architecture from Sun Microsystems.

Two months ago, TIGR staff finally removed the 15 historic HP Alpha servers from the nonprofit in Rockville, Md. It was a “happy event,” TIGR’s director of IT, Vadim Sapiro, told Bio-IT World. “There was something really finite about that.”

Vadim Sapiro 

“It was a happy event,” says
Vadim Sapiro, TIGR director
of IT, of removing 15 HP Alpha
servers and replacing them
with three Sun Fire servers.
 

 
The replacements — three Sun Fire servers running the 64-bit Linux OS — had been running alongside the Alpha servers for some time before the HP servers were finally switched off six months ago. These servers are used to power TIGR’s complex genomic assembler for sequence research. In addition to significant economies of size, efficiency, energy utilization, and maintenance, Sapiro says the new system’s increased performance is boosting research speed and quality.

TIGR was founded by J. Craig Venter in 1992 and claimed many landmarks in its early years, including the compilation of hundreds of thousands of expressed sequence tags and the first microbial genome sequence in 1995. The shotgun sequencing approach pioneered in that assembly propelled Venter to launch Celera Genomics in 1998.

In the meantime, TIGR has become best known for the sequencing, assembly, and analysis of dozens of microbial genomes, which currently require 80 terabytes of data storage. Since 1999, that process had relied on HP’s Alpha clusters. But as those clusters grew less reliable and economical to operate, especially in terms of services support and cooling requirements, Sapiro and colleagues sought other solutions.

Sapiro is in his second stint at TIGR, having started in 1994 as a “classic nerdy” UNIX systems administrator. Since 1999, he’s been in charge of managing the IT department of about 20 staff, supporting more than 300 employees.

HP Dependency
The dependency on HP’s Alpha servers traces back to the work of Gene Myers and Granger Sutton at Celera Genomics on the Celera Assembler pipeline. Those efforts were performed on the proprietary Alpha platform. “TIGR was allowed to use the software,” says Sapiro. “Our computational needs were increasing, so we started using the software, but had to build the Alpha architecture.”

Sapiro says the HP architecture “served us well for the first three to four years, but after that, we realized that the HP infrastructure is very expensive to buy and operate. Plus, refreshing the technology and replacing the servers was a very expensive proposition.”

Aside from operating cost, sluggishness, and reliability, another motivation to change was sharing. As a nonprofit, Sapiro says TIGR’s mantra is that “everything belongs back to the public — not just the data, but distributing the tools. That [HP] proprietary platform was not an exportable tool.”

Sapiro wanted to migrate to an open-source standard. “When we were ready to take the project, we needed to identify infrastructure — 64-bit chips — and have open-source. AMD Opteron was somewhat of a logical choice.” Sapiro says they looked at IBM servers but decided against migrating from one proprietary platform to another.

Given the specs — 32GB RAM servers with AMD architecture — Sapiro says that Sun was a logical choice, especially given its relationship with AMD: “We ran a proof of concept on a Sun-AMD server. We looked at other AMD-based servers, but we enjoy a very special relationship with Sun. Sun treats nonprofits like they would a university campus.”

The level of Sun’s support organization is first rate. “Engineering support is quite good,” says Sapiro. “We needed that as we were embarking on something a little scary!” The engineering team was able to deal with numerous technical issues, including overheating CPUs, during the proof-of-concept trials, which began in 2004. Once the first set of three Sun servers was installed, the code porting was completed. “But for the next 18 months, we had the two systems running side by side, working out kinks and bugs,” says Sapiro.

About six months ago, the HP cluster was unplugged. “Give all the kudos to Alpha — it did serve us for six years and is responsible for many scientific breakthroughs,” says Sapiro.

Compare and Contrast
Sapiro says the three Sun servers were “orders of magnitude” cheaper to acquire, “and a lot cheaper to operate.”

Contrast TIGR’s first Alpha servers from 1999 with the new Sun system. The HP servers (model 4100), which cost more than $100,000, had four CPUs at 500 MHz, 5 GB RAM, and took up half a normal datacenter rack. Moreover, being a proprietary HP platform, “you needed IT people trained in the intricacies of that [Unix-based] operating system,” says Sapiro.

By contrast, each of the three new Linux-running Sun Fire servers has four CPUs at 2.4GHz, 32 GB RAM, and occupies three rack units in a normal datacenter rack. “Each cost less than $30,000, with three years’ maintenance included,” says Sapiro. Moreover, Sapiro conservatively estimates power savings of about 70 percent, and probably the same efficiency for cooling. “Datacenter floor space is very expensive. Freeing up this much space is a big deal as well,” Sapiro adds.

Since the initial Sun installation, TIGR has added two more Sun servers to the internal grid. “This architecture lends itself to being expanded,” says Sapiro. “As our computational demands grow, there’s no mystery or huge cost problem to resolve it. I just buy another AMD server from Sun, put it in the grid, and it’s up and running in less than a day.”

Sapiro says that TIGR scientists have noticed the improved performance. “Aside from vastly improved reliability, the ability to turn out much greater number of genomes has been increased. By combining high-performance and high-throughput computing into one infrastructure, you can run more assemblies, and we’re able to dig deeper and provide better-quality data.”

Email Kevin Davies at: kevin_davies@bio-itworld.com.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1


For reprints and/or copyright permission, please contact  Terry Manning, 781.972.1349 , tmanning@healthtech.com.