By John Russell
January 13, 2003
| Listening to CEO Fred Hausheer talk about supercomputers, it's not always clear if BioNumerik Pharmaceuticals Inc. is a drug discovery company or high-performance computing play. Even its name suggests a mixed identity.
But that's the point, says Hausheer, a biophysicist, Johns Hopkins University-trained physician, and National Cancer Institute clinical investigator. "We're tackling biological questions with numerically intensive tools," he says, eloquently justifying why BioNumerik needs not one but two Cray supercomputers to develop cancer therapeutics.
This is what Hausheer calls "mechanism-based" drug discovery — reduced dependence on traditional modeling techniques (such as Monte Carlo) in favor of calculating individual atom-atom interactions using the basic laws of physics.
"If you use a simulation approach that is distinctive from molecular modeling, that is not just looking at the image, but is computing physical properties like dipole moments or thermodynamic outcomes, you can assign the probability that this [event] could actually occur," he says. Simulations are then quickly tested in the lab and the observations reincorporated into the simulation. This iterative process, Hausheer says, has the potential to speed drug discovery by a third, reduce costs (despite needed computing resources), and ultimately make BioNumerik successful.
Now, after a decade of work, Hausheer's brand of in silico research shows signs of paying off.
Formed in 1992, BioNumerik has 10 small-molecule drugs under development. One, BNP7787, has been
|Bench chemists synthesized the compounds and tested them. Combining additional simulations with spectroscopic measurements, re-searchers proved their hypothesis and predicted that a group of sulfur compounds could inactivate the platinum.
fast-tracked by the FDA and is in Phase III trials (see "Building BNP7787", right). Two others, Karenitecin BNP1350 and MDAM, are in earlier stages. Karenitecin was named after Karen Wood, a BioNumerik employee who died of breast cancer.
By relying on true supercomputing, the company has chosen the road less taken among drug discovery firms. Supercomputers aren't cheap, and BioNumerik currently has two Cray SV1s, worth about $1 million apiece and capable of 57 gigaflops (billion floating-point operations per second) of processing power. Add the roughly 150 networked PCs (including many linked to lab instruments), numerous Sun and SGI servers, about 2.5 terabytes of onsite storage, and you have a mega-IT infrastructure for a 50-person company.
BioNumerik could be one of the biggest computer companies you've never heard of.
"I really don't like to do too much talk about what we're doing," Hausheer says, "but I've been encouraged to do this. Cray said this is important work, and you should talk more." No doubt difficult times in the supercomputing market and Cray's desire to raise its profile in life sciences is one reason. The lure of discounted access to Cray's next-generation machines is another. Hausheer won't detail the costs for the IT equipment and support staff, but when asked why more drug discovery firms don't jump on the supercomputer bandwagon, he says simply, "Most people have never had the experience of running on a true supercomputer, and it's also easier to buy a $100,000 or $200,000 system than a $2-million to $20-million system."
Michael Maloney, vice president for systems and operations technology, oversees this computing empire. The IT staff is split into two groups. One group has 10 people, including Maloney, and is focused on basic infrastructure and knowledge management application development. Besides keeping BioNumerik's systems humming, this group has developed a clinical trials and ERP (enterprise resource planning) program that Maloney insists is superior to offerings from Oracle Corp. or Phase Forward Inc.
"We came up with a concept of storing all our clinical data in one database, whether it's Study A or Study Y, so it could help us make better decisions going forward," Maloney says. "There weren't commercial packages that do that, [and] a contract research organization would charge millions of dollars to do that. We felt if we did this ourselves, we could use this as a competitive tool."
Maloney hopes to link the clinical trials application with a homegrown ERP package to track material and manufacturing requirements. "Our clinical trials program allows online real-time randomization. I don't know of anyone else who's doing that," he says.
Creating high-value proprietary software seems to be a core competency. The company has developed more than 100 proprietary algorithms and has roughly 350 patents (some pending), according to Hausheer. "There has been discussion at the board level about licensing these assets to others," but he doesn't favor it, he says.
The other group is the computational biology group, which pairs scientists (chemists, physicists, biologists, and chemical engineers) with programmers, including a seasoned FORTRAN programmer Hausheer lured from Cray. Interestingly, modern FORTRAN — far from the relic programming language of days past — is the software of choice for supercomputers.
BioNumerik also uses commercial software products, such as a modeling package from Accelrys Inc. "The mix is probably 60 percent developed inside the company and 40 percent commercial," says Shijie Yao, a senior scientist and software engineer at BioNumerik. He says FORTRAN, C++, and Java lead the list of languages used by BioNumerik.
Yao has a Ph.D. in biophysics, reflecting the quantitative bent of most BioNumerik researchers. One challenge, he says, is getting up to speed on bioscience. "We have a lot of cross-learning processes here to deal with that. So studying things like cell biology, that's kind of fun for me and also a challenge."
|BioNumerik's IT Systems at a Glance
|Computers, Networking, Storage and more...
The Crays generally run around the clock. Hausheer is the chief algorithm developer, having cut his supercomputer teeth at the National Cancer Institute in 1986. "When the NCI installed the first supercomputer ever dedicated to biomedical research, I was thinking, 'Wow, maybe we could do something with this,'" he recalls.
Hausheer has since worked on successive generations of Cray machines (XMP, YMP, C90, Trident, and SV1). Not surprisingly, the computational biology group has developed many proprietary applications, including Superfold, used to simulate the enzyme DNA methyltransferase.
"DNA methyltransferase is over 1,600 amino acids long, and it took us two and a half years of real-time vector processing," Hausheer says. "There are actually several of these [enzymes], and we have done them all. We used a true particle physics approach, mimicking the peptide chain growth, amino acid by amino acid."
BioNumerik has been able to synthesize compounds from computer-determined structures and achieve activity six months after the first simulations. "It's not an experimental validation of the structure, but it is an implicit validation," Hausheer says.
What makes BioNumerik distinctive is its multidisciplinary approach and clever exploitation of supercomputers. After all, Linux clusters could provide sheer gigaflop capacity more affordably. Rather, this is about complex vector processing, in which the underlying application code is parallelized and carefully mapped to the computer's architecture.
Consider this example: "When some algorithms for time simulation — for example, molecular dynamics, energy perturbation, energy minimization — are run on a distributed memory system such as a cluster, we have found them to be numerically unstable," Hausheer says.
He says people don't "check for numerical precision and accuracy when they get code and run these on these systems. They're making many approximations ... There's asynchrony among the processors, and when the summations are made of pair-wise nonbonded atoms, there's a problem with the order of operations. [The solution is to] put instructions in the code that would synchronize all the processors for all steps and that would also be updating all of the atoms' pair lists and all of the local memory on each processor.
"Talk about overhead — this would bring any cluster system to its knees. These kinds of problems are best addressed on a shared memory, symmetric multiprocessor system, and they are highly amenable to vector processing" — hence the Cray architectures. The bottom line, Hausheer says, is that supercomputers produce a 20-fold increase in performance and vastly improved reliability when exploiting the Cray vector processing architecture.
Hausheer is convinced that supercomputing is poised for another major leap forward, with petaflop (quadrillion floating-point operations per second) computing becoming available within a decade. He'll be ready.
BioNumerik's Brutes: (1) Fred Hausheer, CEO and M.D.; (2) Shijie Yao, Ph.D., biophysics; (3) Pavankumar Petluru, Ph.D., chemistry; (4) Michael Maloney, vice president of systems and operations technology.