Hardware options speed up processing, save resources.
By Martin Gollery
January 20, 2010 | Expert Commentary | While computational biologists scramble to develop tools to support the analysis of protein sequence, structure, and interaction data, the hardware side of the equation has been sadly neglected. Sequencers frequently ship with small clusters, which are useful only for the most basic processing tasks. The standard processing system in use today is a cluster of Linux servers, but this has led to a bottleneck of a different type. Research groups are finding difficulties in feeding all those CPUs, as they tend to be quite hungry for floor space, air conditioning, power and maintenance. According to some estimates, computer clusters have a greater carbon impact than all the SUVs on the road today. Clearly, the need to save money and the need to save energy go hand in hand to drive the search for better solutions.
Fortunately, the bio-IT executive now has many alternatives to the old CPU cluster technology. These alternatives provide greater scalability, lower costs, and higher performance per watt of power used than commodity Linux clusters. No single solution will be the answer for all purposes, but here I discuss some of the possibilities.
Field Programmable Gate Arrays (FPGAs)
FPGAs have been used in bioinformatics for over a decade. These are reconfigurable processors that provide an incredible speedup over CPU’s, but the difficulty in programming them has limited the number of applications that are available. Several vendors, including CLC bio, Progeniq and TimeLogic sell preprogrammed bioinformatics FPGA solutions with a variety of applications, throughput levels and price points.
Other vendors have worked on the development of tools that promise to make it simpler to port applications to FPGAs. The term ‘Hybrid Computing’ is used to refer to the fact that the processing is partially completed on the FPGAs and partially on the server CPUs. Connection to the FPGAs can be made over USB 2.0, PCI-e, or directly through a plug-in processor socket. This last method yields a remarkably high throughput level, which can produce great speed improvements over other connection methods. Several companies such as Mitrionics, Impulse, and Convey Computers have produced development environments that promise to simplify the task of porting applications to various FPGA platforms. The openfpga.org project and Mitrion-C Open Bio project provide platforms to accelerate key bioinformatics applications. Individual researchers can tune the applications or add modules as they please, as the programs are open-source. These automated methods will not produce the greatest speedups, as there is still an advantage to using FPGA design experts, but they can free people from electronic circuit design to spend more time on Biology.
How much performance is possible with FPGA-based acceleration? Pico computing recently connected 112 of their commodity devices and demonstrated a 5000x speedup on a dot-plot algorithm. Perhaps most impressive about this achievement is that this monster system can fit in a 4U server chassis, and only consumes about 300 Watts of power!
General Purpose Graphics Processing Units (GPGPU’s)
Graphics Processing Units are powerful processors that have been tuned for high-throughput graphics. Programming languages such as CUDA and OpenCL allow the use of these processors for general purpose computing. In the bioinformatics field, a port of the MUMmer sequence alignment tool showed more than a 3X speedup over a CPU. Several other bioinformatics projects are ongoing.
One benefit of GPGPU computing is the low barrier to entry. Development can be done on an inexpensive graphics card in any computer or server. If greater power is needed, the code can be moved to a high-end card or Tesla server. The Tesla is a system designed with high-throughput computing in mind. Some of the Tesla servers have a reported capacity of 4 teraflops in a single system.
GPU’s have many well-known limitations in comparison to CPU’s. The lack of ECC memory and relatively poor double-precision performance are chief among these. The Fermi chip, due to arrive from Nvidia in the Spring of 2010 will address both of these issues.
Multiple groups help promote the use of GPGPU’s in the sciences. The Prometheus Alliance (http://www.prometheusalliance.org/ ) is bringing together members from industry and academia to develop a purpose-built platform for bioinformatics. The gpucomputing.net website (http://gpucomputing.net/) is a collaboration between eight CUDA Centers of Excellence, and has a number of communities for different branches of research.
IBM, Sony and Toshiba developed the Cell processor with a main processor and 6 vector processors. The resulting speed has been measured as several times faster than Opteron or Xeon CPUs. Mercury computer Corp has built a wide range of hardware with dual Cell processors, blade computers and other servers for high-throughput computing. Yellow Dog Linux has been developed and is sold for the Sony Playstation 3 by Fixstars.
The first supercomputer to reach a speed rating over 1 petaflop, the IBM Roadrunner, is made with 12,960 cell processors and 6,480 Opteron processors, proving the value of this architecture for high-throughput computing. In the bioinformatics world, the hmmsearch algorithm has shown speedups of 100X over standard CPU. Not bad for a $300 game machine.
The ClearSpeed CSX700 coprocessor contains 192 Processing Elements yielding 96 GFLOPS while drawing only 12 watts of power. The ClearSpeed coprocessors maintain full speed even at 64 bit precision and tend to benefit algorithms that are not as suitable for FPGA acceleration. While some claim that it is difficult to port software code to the ClearSpeed boards, Matlab and Mathematica code can run with no modifications.
Single Instruction Multiple Data (SIMD)
Even conventional clusters have power that is not utilized. The SIMD capabilities of modern processors offer a built-in accelerator for each CPU. The Smith-Waterman algorithm has been accelerated by hundreds of times with SIMD until it nearly reaches the same speed as BLAST. Others have optimized hmmsearch by 30X with these optimizations. Clearly the potential for the acceleration of additional algorithms would have a great impact on cluster loads around the world. If your server farm seems oversubscribed, perhaps you are simply not running it to its full potential.
The hype surrounding Cloud Computing (see, “The C Word,” Bio•IT World, Nov 2009) easily equals that of the grid technologies a few years ago. So far, no other company enjoys the number of users as the Amazon EC2 cloud. Many common biological databases are already uploaded onto Amazon, making it simpler and less expensive to use this service. The entire BioLinux suite of tools is available as an image, which means that users can get up and running quickly.
The average compute cluster is woefully under-subscribed most of the time, and yet always seems to be booked solid when you need it most. The ability to purchase computing time on an ‘as-needed’ basis is extremely valuable benefit, especially if the job may be broken up nicely. One hundred servers used for one hour will cost the same as one server for one hundred hours. When a deadline comes around, this is extremely valuable.
How much does it cost to use a cloud service for real analysis? Crossbow was used by Salzberg’s team to analyze data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. While this is obviously less expensive than owning and maintaining a cluster of this size for occasional use, fans of GPGPU computing will point out that a graphics card with 320 cores will cost about this much, and then the system will be paid for.
“Supercomputers can no longer focus only on raw performance. The era of simply adding more processors is coming to a close,” says David Turek, vice president, deep computing, IBM. “Clients need to be able to run supercomputers anywhere, not only places that have cheap power.”
The choice of technology must be based on the throughput per watt consumed, the throughput per dollar spent and the availability of the algorithms necessary for the research objectives of the lab. The development of a multi-hybrid computer might someday utilize GPGPUs for floating point image processing, FPGAs for Integer-based analysis such as HMM’s, and CPU’s for the final output formatting.
Through the appropriate selection of acceleration technology the demanding job of keeping up with the analysis of data from next-generation sequencers can be accomplished at a reasonable cost and without requiring an enormous server room or a dedicated power plant.
Martin Gollery is senior bioinformatics scientist at Tahoe Informatics. He can be reached at firstname.lastname@example.org.
This article also appeared in the January-February 2010 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.