By Arielle Emmett
Locus Discovery Inc.'s drug discovery predictions are based on a proprietary software algorithm developed at Sarnoff Corp. in the late 1990s by Frank Guarnieri, although many other scientists, including John Kulp, contributed to its implementation. The algorithm employs quantum and statistical mechanics and knowledge of molecular interactions to predict small-molecule, ligand-binding behavior at active protein sites.
Locus says it is applying secondary data on drug "likeness" (biomolecular properties of related drug compounds, including pharmacokinetics, dynamics, and potency data) not found in the original Sarnoff program to ascertain the drug properties of novel compounds. This information will help winnow drug candidates from the myriad possibilities generated by the supercomputer.
The Locus algorithm makes trillions of calculations to predict the optimal binding affinity and geometry of various ligands that can exhibit drug action as both agonists and antagonists. The building-block ligands can be relatively simple (alcohols, benzene rings, and water) or highly complex (aromatic compounds that combine together to produce novel drug candidates).
Guarnieri says the Sarnoff algorithm is based on four underlying mathematical postulates that describe the interactions of complex proteins and drug molecules, predicting which chemical fragments will bind at higher affinity to a particular site on a protein. "Most of the other [cheminformatics] algorithms in use today are based on a lot of experimental data, [which] are used as a 'training set' to develop an empirical mathematical model that requires a lot of a priori knowledge of drug chemistry," Guarnieri says.
"My goal," he continues, "was to come up with a small number of mathematical postulates [describing] the nature of protein-protein interactions. Although the postulates can't be proven by themselves, you can go through a series of mathematical steps to take you out to prediction [of drug interactions]. So when you input the organic fragments, as well as the 3-D structure of the known protein, the algorithm will automatically compute where the binding sites are and virtually all the chemistry that can interact at that binding site for that given protein structure."
To accomplish this, Locus has harnessed the power of 2,048 individual 1GHz Pentium III boxes linked in parallel, which at peak performance collectively deliver 2.06 teraflops (trillion floating-point operations per second). Locus says its C++ platform, based on the four mathematical postulates, is unique.
Two years ago, as a test case, Guarnieri's team first used the algorithm to predict the structure and binding affinities of two small-molecule erythropoietin (EPO) mimetics targeted to the EPO receptor. The drugs, one an agonist (mimicking EPO's ability to generate red blood cells), the other an antagonist, were successfully synthesized in the laboratory. "[They] worked exactly as predicted," Guarnieri says.
As it turned out, the first batch of compounds was unstable, but the experiment was sufficient to prove the concept. "In the early days I had a milestone and a time limit," Guarnieri says. "So I had to make the simplest molecules that would have the basic properties we were looking for."
In November 2000, Locus and Sarnoff researchers tested whether the algorithm could accurately
|"In general, we have too much chemistry that will bind. The major outstanding issue is that we still have to show that these active molecules can be developed into real drugs."
FRANK GUARNIERI, SARNOFF CORP.
predict the structures and binding sites of known compounds to known protein disease targets. "We found we could recapitulate the known drug structures within their active binding sites using the Sarnoff algorithm," says Locus Chief Scientific Officer William Moore Jr.
Since then, the algorithm has been expanded to accommodate 150 chemical fragments, creating a massive data crunch, says Matthew Clark, Locus' director of scientific computation. "We now investigate ligands in a molecular weight range from 18 to a couple of hundred," including fragments with scaffolds and linkers, alcohols, aromatic compounds, heterocyclic compounds, and aliphatics. The numbers generated by going to 150 fragments are so huge it couldn't be done without the supercomputer, Clark says.
Seventeen IT specialists run Locus' supercomputer and software modules — the largest supercluster listed in the Institute of Electrical and Electronics Engineers Task Force on Cluster Computing database. Despite the reliability of the Linux cluster and software written to work in parallel so that failure at one node won't cripple an entire calculation, "our biggest problem is scalability," Clark says. "There are few people who have clusters as large as we have, and there's no book on how to make clusters this large. We get a lot of node failures — especially when you have 2,000 computers running at any one time."
Locus' latest challenge is to narrow down rich cheminformatics data to a manageable number of testable compounds. With active programs to develop drugs that target HIV (the gp41 protein), caspases 3 and 8 (enzymes triggering apoptosis, or cell death), and p38 MAP kinase (an enzyme implicated in arthritis and other inflammatory reactions), the company is working on new algorithms to determine drug likeness and to identify molecules that show promise of efficacy.
"In general, we have too much chemistry that will bind," Guarnieri observes. "The major outstanding issue is that we still have to show that these active molecules can be developed into real drugs."
Back to Locus Focus