SimBioSys with IBM, Sony to speed drug discovery.
By Mike May
July 14, 2008 | Last year, SimBioSys in Toronto, Canada, hit a software snag. Although its eHITS software produced great 3-D ligand structures, such as a potential drug docking with a receptor, the program needed to run faster. “A pharmaceutical company conducted a test comparing results from eHITS and other available packages,” recalls Zsolt Zsoldos, chief scientific and technology officer at SimBioSys. “eHITS provided the best docking accuracy and enrichment, but it was slow—too slow to purchase.” So Zsoldos and his colleagues set out in search of speed.
They tried field-programmable gate arrays (FPGAs) and graphics processing units (GPUs), but rejected both. “For the FPGAs, it would have required a lot of work to bring the complex algorithm down to the hardware-switch level. Also, practical throughput is much lower than the theoretical value on GPUs, because programming does not allow direct control of memory transfers and vector operations, and the performance gains are less than optimal,” explains Zsoldos.
Undeterred, they tried the Cell Broadband Engine, which was developed jointly by IBM, Sony, and Toshiba. Even though this processor powers Sony’s Playstation PS3, the Cell goes beyond games.
The Cell relies on two layers of hardware working in parallel. The power processor element—similar to an IBM Power 970—makes up one layer. The other consists of eight synergistic processing elements, which each contain a 128-bit vector processor with single-instruction, multiple-data architecture, a direct memory access controller, and 256 kilobytes of on-chip memory. In essence, this creates nine CPUs, all connected through an element interconnect bus with a bandwidth of 300 gigabytes per second. In all, the Cell turns out sustained power of 210 gigaflops (billion floating point operations per second)—70 times faster than an Intel dual-core 2.4 GHz processor.
Reorganizing the Algorithm
Beyond speed, other features enticed SimBioSys. For one thing, notes SimBioSys consultant Antony Williams, “There are multiple GPUs and FPGAs on the market, but the future of the Cell processor is stabilized by IBM’s multiyear development plan.” Zsoldos adds that pharmaceutical companies want to connect an accelerator to a server, and the IBM server blade, which includes two Cells, is ideal for that. Moreover, that piece of hardware pumps out 400 gigaflops.
Zsoldos and his colleagues can port their algorithms to the Cell by simply recompiling the C or C++ code. “That’s the easy part,” says Zsoldos, “but that doesn’t really get very high acceleration.” The speed comes from reorganizing an algorithm, essentially modifying data structures to run in parallel on the multiple processors, and doing so with the limited on-chip memory. “Traditional techniques for speed up include branching, and that is not good for the Cell,” says Zsoldos, “because there is no branching-prediction hardware on these processors.”
To speed up eHITS, Zsoldos and colleagues optimized about 10 percent of the total code, which he says, “was responsible for over 98 percent of the run-time.” On the IBM QS20 blade, which runs at 3.2 GHz, eHITS Lighting—the new version—ran up to 117 times faster than the original eHITS.
In developing eHITS Lightning, the SimBioSys programmers also built an extensive toolkit. These pieces can be used to handle steps in future programs that could be developed to run on a Cell system. Zsoldos adds, “There’s a learning curve, but once you’ve learned the tricks, you know what to watch out for the second time.”
The next Cell-optimized package from SimBioSys could be the ARChem, or automating retrosynthetic chemistry-route designer. “This provides retrosynthetic analysis for the best route of synthesis on a molecule,” says Williams. “That can be very time consuming, and we could speed that up by a few factors.”
Despite the Cell’s blazing features, it might not always be the best choice of accelerating hardware. According to Stanford structural biologist Vijay Pande, the best tool depends on the job. “If one does not need floating-point calculations—and especially if one only needs bit operations—then FPGA’s are very appealing,” he says. For lots of flops, he says that a GPU or the Cell works well, depending on the exact problem at hand.
“It’s like any case with different tools: You wouldn’t use a jigsaw to cut down a big tree or a chainsaw to make a fine piece of furniture.”
When it came to docking simulations, though, the Cell turned out to be just the right tool for SimBioSys.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.