TIGR Unplugs HP, Switches to Sun for Genome Assembly



The Institute for Genomic Research (TIGR), one of the most prestigious genome research centers in the United States, has hauled out its long-serving HP Alpha servers and switched to a new architecture from Sun Microsystems.

Two months ago, TIGR staff finally removed the 15 historic HP Alpha servers from the non-profit in Rockville, Maryland. It was a “happy event,” TIGR’s director of IT, Vadim Sapiro, told Bio-IT World. “There was something really finite about that.”

The replacements—three Sun Fire servers running the 64-bit Linux OS—had been running alongside the Alpha servers for some time before the HP servers were finally switched off six months ago. These servers are used to power TIGR’s complex genomic assembler for sequence research. In addition to significant economies of size, efficiency, energy utilization and maintenance, Sapiro says the new system’s increased performance is boosting research speed and quality.

TIGR was founded by J. Craig Venter in 1992, and claimed many landmarks in its early years, including the compilation of hundreds of thousands of expressed sequence tags (ESTs) and the first microbial genome sequence in 1995. The shotgun sequencing approach pioneered in that assembly propelled Venter to launch Celera Genomics in 1998.

In the meantime, TIGR has become best known for the sequencing, assembly, and analysis of countless microbial genomes, which currently require 80 terabytes of data storage. Since 1999, that process had relied upon HP’s Alpha clusters. But as those clusters grew less reliable and economical to operate, especially in terms of services support and cooling requirements, Sapiro and colleagues sought other solutions.

Sapiro is in his second stint at TIGR, having started in 1994 as a “classic nerdy” UNIX systems administrator. Since 1999, he’s been in charge of managing the IT department of about 20 staff, supporting over 300 employees.

HP Dependency
The dependency on HP’s Alpha servers traces back to the work of Gene Myers and Granger Sutton at Celera Genomics on the Celera Assembler pipeline. Those efforts were performed on the proprietary Alpha platform. “TIGR was allowed to use the software,” says Sapiro. “Our computational needs were increasing, so we started using the software, but had to build the Alpha architecture.”

Sapiro says the HP architecture “served us well for the first three to four years, but after that, we realized that the HP infrastructure is very expensive to buy and operate. Plus, refreshing the technology and replacing the servers was a very expensive proposition.”

Aside from operating cost, sluggishness and reliability, another motivation to change was sharing. As a non-profit, Sapiro says TIGR’s mantra is that “everything belongs back to the public—not just the data, but distributing the tools. That [HP] proprietary platform was not an exportable tool.”

Sapiro wanted to migrate to an open-source standard. “When we were ready to take the project, we needed to identify infrastructure—64-bit chips—and have open-source. 

AMD Opteron was somewhat of a logical choice.” Sapiro says they looked at IBM servers, but decided against migrating from one proprietary platform to another.

Given the specs—32 GB RAM servers with AMD architecture—Sapiro says that Sun was a logical choice, especially given its relationship with AMD: “We ran a proof of concept on a Sun-AMD server. We looked at other AMD-based servers, but we enjoy a very special relationship with Sun. Sun treats non-profits like they would a university campus.”

The level of Sun’s support organization is first-rate. “Engineering support is quite good,” says Sapiro. “We needed that as we were embarking on something a little scary!” The engineering team was able to deal with numerous technical issues, including overheating CPUs, during the proof-of-concept trials, which began in 2004. Once the first set of three Sun servers was installed, the code porting was completed. “But for next 18 months, we had the two systems running side by side, working out kinks and bugs,” says Sapiro.

About six months ago, the HP cluster was unplugged. “Give all the kudos to Alpha—it did serve us for six years and is responsible for many scientific breakthroughs,” says Sapiro.

Compare and Contrast
Sapiro says the three Sun servers were “orders of magnitude” cheaper to acquire, “and a lot cheaper to operate.”

Contrast TIGR’s first Alpha servers from 1999 with the new Sun system. The HP servers (model 4100), which cost more than $100,000, had 4 CPUs at 500 MHz, 5 GB RAM, and took up half a normal datacenter rack. Moreover, being a proprietary HP platform, “you needed IT people trained in the intricacies of that [Unix-based] operating system,” says Sapiro.

By contrast, each of the three new Linux-running Sun Fire servers has 4 CPUs at 2.4GHz, 32 GB RAM, and occupies three rack units in a normal datacenter rack. “Each cost less than $30,000, with three years’ maintenance included,” says Sapiro. Moreover, Sapiro conservatively estimates power savings of about 70 percent, and probably the same efficiencies for cooling. “Datacenter floor space is very expensive. Freeing up this much space is a big deal as well,” Sapiro adds.

Since the initial Sun installation, TIGR has added two more Sun servers to the internal grid. “This architecture lends itself to being expanded,” says Sapiro. “As our computational demands grow, there’s no mystery or huge cost problem to resolve it. I just buy another AMD server from Sun, put it in the grid, and it’s up and running in less than a day.”

Sapiro says that TIGR scientists have noticed the improved performance. “Aside from vastly improved reliability, the ability to turn out a much greater number of genomes has been increased. By combining high-performance and high-throughput computing into one infrastructure, you can run more assemblies, and we’re able to dig deeper and provide better quality data,” he says.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 
Apply at http://jobs.tessella.com   

oxford nanopore logo 


Early Access Collaborations ManagersClick here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Click to  Apply  

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .