By Salvatore Salamone
July 14, 2004 | MSKCC, for example, settled on an SGI Altix 3000 system to accelerate research on modeling and simulating genes in cancer pathways. “Our focus is to provide bridges between basic and clinical research,” says Chris Sander, head of the Computational Biology Center at MSKCC. “Computational technology of this type is a key enabling factor.” (For some background on the goals of the MSKCC’s Computational Biology Center, see Made in Manhattan, Feb. 2003 Bio-IT World, page 28.)
MSKCC’s new Linux-based Altix definitely packs plenty of computational punch. It is comprised of 12 Intel Itanium 2 64-bit processors and has 48 gigabytes of system memory. And that’s the point.
Moving Up
Today, there is a growing need to install systems that can support a mix of applications that have widely varying processing, data handling (input/output), and memory needs. Whereas two or three years ago the dominant algorithms dealt with sequence matching and alignment, now researchers run floating-point-intensive applications such as structural modeling and molecular interaction simulations.
“Years ago, if you were given a choice between an abacus or a slide rule, you’d pick the abacus for addition and subtraction -- integer math -- and the slide rule when doing multiplication and division – floating-point math,” says Arthur Conte, an independent IT consultant who specializes in scientific computing. Increasingly, researchers need both, he says.
Companies frequently picked systems to support specific applications. It is still quite common to see a mix of Linux clusters running things like BLAST, HMMer, and FASTA, and proprietary 64-bit computing systems running simulations and modeling algorithms. But few companies do only one kind of computation any longer, and that’s spawning the growth of mixed high-performance systems.
Celera is the poster child for computational change. When its massive sequencing business slowed and the company’s business model morphed to drug discovery and diagnostics development, Celera was forced to change its computational infrastructure. (For more details about Celera’s choice of systems, see: Buying Power, July 2003 Bio-IT World, page 32.)
“In the genomic era, one could specify a small number of algorithms needed to characterize the majority of computational load; general drug discovery platforms must [now] embrace a far more diverse universe of applications,” says John Reynders, vice president of informatics at Celera.
“[Today], it is difficult to predict the job mix that we will have from quarter to quarter. The computational load is driven by projects that evolve over time and have varying CPU/memory/IO mix and job scheduling requirements,” Reynders says. “So, one is driven to general scientific computing platforms that are able to sustain a broad mix of applications.”
Reynders and others are quick to note that cost remains a factor when selecting a system. And “although one might consider a menagerie of architectures each suited to particular applications domains, the [total cost of ownership] would be prohibitive -- especially considering the diverse skill sets required to care and feed each beast,” he says.
For that reason, many life science companies are often standardizing on a single architecture (e.g., Linux-based systems) with an appropriate combination of processing power, memory, storage capacity, and data handling capabilities suitable to take on the mix of life science applications being run today.