By Salvatore Salamone
July 11, 2002 | Sun Microsystems Inc., TimeLogic Corp., and the San Diego Supercomputer Center (SDSC) announced in May the completion of a proof-of-concept benchmark test that shows off the power of bioinformatics hardware accelerators.
In creating the benchmark, the group demonstrated the processing power that a Sun-TimeLogic hardware combination can deliver to bioinformatics analysis. The test results also highlighted the often hard-to-define cost efficiency issues of bioinformatics computing, whether it’s evaluating a specialized system like the Sun-TimeLogic hardware or the more familiar processing approach of Linux clusters.
The use of bioinformatics acceleration hardware is just beginning to take hold. TimeLogic touts recent customers that include Affymetrix Inc., Bristol-Myers Squibb Co., and Incyte Genomics Inc. And most recently, Syngenta AG used the TimeLogic DeCypher accelerators to run a search algorithm called FrameSearch, in which the output of the calculations helped identify genes within the rice genome.
“Bioinformatics accelerators offer an interesting alternative [to Linux clusters],” says Norm Davidson, an independent pharmaceutical industry consultant who previously worked at a Phoenix pharmaceutical research company. “For some analysis routines, the accelerators could reduce the time to perform the calculations, sometimes doing a run in about one-tenth the time compared to doing a similar analysis without the dedicated hardware.”
Others have found similar improvements when using the DeCypher accelerators. For instance, the Ohio Supercomputer Center, a state effort to provide a shared computing facility for Ohio’s state universities and private companies, reports that its Sun 6800 computer generates a speed improvement of between one and three orders of magnitude when using DeCypher to run bioinformatics algorithms like BLAST, FrameShift and ClustalW.
The benchmark test created by Sun, TimeLogic, and the University of California at San Diego’s SDSC used a Sun Fire 6800 server with eight UltraSPARC III Cu central processing units (CPUs). The server contained a TimeLogic DeCypher XD-4G FPGA accelerator board that runs common sequence analysis programs like BLAST and HMMER on the board’s dedicated hardware.
The test drew on the National Center for Biotechnology Information’s NR75 protein database of 372,119 sequences as the query set and the SDSC’s Hidden Markov Model database of 19,192 model sequences as the search target. The analysis sought to determine which proteins and targets matched up --- a common process carried out by pharmaceutical and genomic research companies.
The DeCypher-accelerated Sun Fire machine completed the analysis in 41 hours and 46 minutes, or about one-tenth the time it would have taken without the accelerator.
To put these results into perspective, the group ran the same tests on a Linux cluster of 32 PCs, each with a 1GHz Pentium III CPU. A cluster of this size delivers about 27 gigaflops of processing power, according to a calculator on the Web site of distributed computer software vendor Entropia Inc.
Using the full capacity of the Linux cluster, the analysis required 144 straight days of uninterrupted processing time, according to SDSC researchers, who use Linux clusters at their facility.
That 144 day-run time was about 82 times longer than the Sun-TimeLogic hardware’s cycle. Or put another way, researchers would need a Linux cluster with about 2,600 PCs to do the same analysis in the same time.
The researchers noted that the specific analysis done by both the Sun-TimeLogic hardware and the Linux cluster was looking for what they called deep statistical relevance --- thus, the analysis is much more calculation-intensive than, say, a quick BLAST search. A less stringent analysis would still have taken about 20 times longer on the Linux cluster (or about 640 PCs), according to the benchmark group.
Are Low Initial Costs Offset?
Whether the analysis requires 640 or 2,600 PCs, a major point of the benchmark test is cost.
One of the reasons Linux clusters are popular among bioinformatics researchers is the low cost of setting up a cluster. The individual PCs within a Linux cluster are typically inexpensive, Linux is freeware, and there are several open-source distributed computing programs that let managers aggregate the processing power of the individual machines into a clustered system.
However, the researchers behind the Sun-TimeLogic test say that these initial Linux cluster costs can ramp up quickly, especially as more computers are added and long-term operating costs expand. “Server farms play an important role in life science,” said Sia Zadeh, Sun’s group manager for life sciences, in a statement. “Yet server farm operators face daunting administrative workload, scaling and load balancing challenges, and per-CPU licensing costs as farms scale to huge sizes.”
The group contends there are other long-term operational costs that companies should consider when evaluating Linux clusters, such as the office space to house the large number of computers, battery backup, cooling, equipment racks, networking equipment, and operating system license upgrade costs.
But a true cost comparison for this benchmark test is difficult. Because Sun systems are usually highly customized, pricing can be hard to peg and covers a wide range --- experts say costs can start at $125,000 and run as high as $1 million. Neither Sun nor TimeLogic would divulge the cost of the hardware used in their benchmark test.
Arguing that one computing approach is better than another based on total cost of ownership issues is tricky since there are not many quantitative models a manager can use. But there does seem to be some work that supports the benchmark group’s assertions.
“Over the last five or 10 years there have been a number of studies done by the large consultancies like the Gartner Group and IDC that have shown that for networked products, the initial equipment price is only 15 to 20 percent of the total cost of ownership of a product over its three- to five-year lifetime,” says Raymond Lopez, an independent networking consultant. “The other 80 to 85 percent of the cost of owning a product goes toward things like the staff time required to manage and maintain the equipment.”