Popular debut for predictive scientific challenge for industry, academic groups.
By Vicki Glaser
May 12, 2008 | SANTA FE, NM - At this year's OpenEye Scientific Software annual customer and user group meeting* (CUP IX), the organizers presented the results of the SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) challenge, a new tournament to allow computational drug developers from industry and academia to compare tools and strategies for drug docking.
"Ongoing, independent, blind tournaments," such as SAMPL "are of great value," said keynote speaker Paul Labute of Chemical Computing Group (CCG) (See sidebar, "Chess Lessons").
Anthony Nicholls, president and CEO of OpenEye, cringes if one calls SAMPL a "competition," or compares it to other tournaments such as CASP (Critical Assessment of Techniques for Protein Structure Prediction). SAMPL is a prospective evaluation, but it differs in that "failure really is an option," he says. "The industry does not need a competition," Nicholls contends. "We already have that - it's called pharma."
SAMPL provides a forum for prospective science, in which participants can test and compare a variety of algorithms and modeling strategies, hone their intuitive skills, and succeed or fail without economic or professional consequences. Key attributes of SAMPL include exposure to never-before-seen structures and datasets and the opportunity to submit multiple solutions to a given problem, enabling comparisons to be made not only between participating teams and their strategies, but also allowing for intra-group comparisons in which the same person or group may apply more than one approach or software product to a single dataset.
In 2007, Peter Guthrie (University of Western Ontario) first challenged Nicholls, offering to provide a set of compounds not seen before for a blind test. With these 17 structures, OpenEye and a group from Vijay Pande's lab at Stanford compared computational methods for predicting small molecule solvation free energies. This exercise, designated SAMPL0, also included the application of simulation tools to predict protein-ligand affinities for eight complexes provided by GlaxoSmithKline.
The goal of SAMPL is not simply to solve a problem; rather, it is to understand the strengths and weaknesses of current tools and identify gaps and inconsistencies in the ability to predict how a ligand binds to a target with properties desirable for drug development.
Let the Games Begin
In November 2007, OpenEye sent invitations to its six main competitors, 16 academic and government labs, and 12 major pharma companies. News of the SAMPL1 challenge spread via word of mouth and was open to anyone interested in participating.
The challenge had three main predictive components: virtual screening/pose prediction, binding affinity, and solvation free energy. Overall, the 54 groups, evenly split between academia and industry, contributed 205 predictions. OpenEye did not participate in SAMPL except to run controls, so it was blinded when evaluating the results. Participants received only their own results and how they compared to the median.
Two sets of protein-ligand binding data for the virtual screening and binding affinity evaluations emanated from corporate contributors, with Vertex supplying 52 molecular structures for JNK3 kinase and Abbott Labs providing 27 structures for the urokinase target. Participants had access to 20-60 active compounds for each set of protein-ligand data, and all had crystal structures.
Throughout the challenge, OpenEye disclosed additional information about each target, beginning with a set of active and inactive compounds, then a list of the actual actives, followed by a list of poses. Each level of information corresponded to a particular test - virtual screening, followed by pose prediction, and then affinity estimation. Inactive compound collections included 12,000 decoys for JNK and 8,000 for urokinase.
The third component involved predicting hydration free energies, based on data sets contributed by Peter Guthrie and CCG. Participants received about 60 SMILES strings and attempted to derive three-dimensional coordinates, conformations, tautomers, charge states, and charge distributions. OpenEye provided 3D coordinates and charges on request.
OpenEye has not yet published the results of SAMPL1 (the results of SAMPL0 in 2007 appeared in J Med Chem 2008;51:769-779), but the company provided a sneak preview of the outcomes at CUP IX.
The participants "did very well overall," noted Geoff Skillman, VP research at OpenEye, who was still crunching the data when he presented the preliminary findings. For virtual screening and binding affinity predictions with JNK3, for example, participants generated "a good set of actives, and all looked like kinase inhibitors." For pose prediction he concluded that "the hand docking approach was clearly the best" - better than any of the algorithms - and that leaving the protein target as is was the best strategy.
"There was not a consistent 'winner' among the methods," he said. GLIDE, ROCS, FRED, and GOLD all fared well. "The best in one area could be the worst in another." Skillman did note differences between the submissions of users with varying experience - amateurs versus professionals. Furthermore, whereas it might seem logical that the vendors of modeling and simulation software packages would be more adept at applying their software to solve the SAMPL problems, the results demonstrated otherwise, according to Skillman.
He remarked on the substantial amount of intra-method variance when there were multiple submissions for a particular method; in such cases, success hinged on what parameter set was used. Additionally, Skillman noted the large range in computing time devoted to the evaluations, from a few seconds or minutes of CPU time to upwards of 400,000 CPU hours for affinity prediction. Interestingly, more computing time did not necessarily correlate with better results.
Matt Geballe, a graduate student from Emory University, noted the large range in CPU times for the four methods they used. In general, "the quicker methods did better... some methods got more compounds right, they also got more very wrong." Working with JNK3, the docking programs "performed well with active inhibitors but did not do very well overall with decoy sets."
Martha Head, director of computational chemistry at GlaxoSmithKline, who did particularly well using a hand-docking approach said: "Knowing one good answer is a huge help. Knowledge of things I have seen in the past largely drives my decisions about docking models, especially for a kinase. The challenge is to make tools that exploit what an experienced modeler already knows."
Stanford's Vijay Pande noted that crystal structures are often not as helpful as they might be, because they are just a snapshot in time and cannot capture conformation changes or allosteric effects. Computational methods that enable reliable predications "are at least ten years away," Pande says. "The targets [in SAMPL1] were much more challenging than in SAMPL0," he said.
After reviewing the results of the transfer energy component of SAMPL, Skillman concluded that despite the use of "very varied prediction methods, everyone did about the same and had similar problems in similar areas."
Enrico Purisima, from the National Research Council of Canada, concluded that, "the model captures the trend, but not the magnitude of the solvation free energies."
SAMPL Take 2
Planning for SAMPL 2009 sparked a lively debate. Pat Walters of Vertex proposed that OpenEye should strategically select a subset of the data or compounds to disclose to the participants at some interval after the start of the challenge. "This might let people tune their methods." Ajay Jain (UCSF) countered that the goal of SAMPL is "to reward good methods, not good gaming," to test computational methods and not the art of modeling.
"I couldn't disagree more," responded Andy Good of Bristol-Myers Squibb. "That is not how we work. Participants should simply disclose the different strategies they use."
Chess Lessons for Computer Modeling
When a computer first defeated a human in chess in 1958, the excitement suggested that a new era had dawned and computers might actually be smarter than humans (at least when it came to chess). Yet, nearly 40 years passed before IBM's Deep Blue was able to defeat Garry Kasparov, in 1997. By 2005, computers were gaining ground on mere mortals, as Hydra, Deep Fritz, and then more crushed individuals and teams of humans alike.
The more mundane worlds of computational drug design and molecular modeling have seen substantial strides over the years, yet their ability to predict accurately protein-ligand interactions has failed to live up to the hype that heralded their introduction, said Paul Labute of Chemical Computing Group (CCG).
Labute's message echoed many presentations at CUP IX, in which modelers, software developers, and computational chemists from industry and academia tried to balance touting progress made in predictive modeling, and virtual screening, with the glaring deficiencies that question the utility of these tools - and even past successes. "How do we know [computer models] work?" asked Labute. "The literature contains mostly anecdotal validation, reproducibility is almost non-existent, and crude diagnostics are easily manipulated."
Labute emphasized that, like computer chess, computational life science will take many years to realize its potential. Along the way, industry would benefit from opportunities to make a critical assessment of what is working and what is not, to compare competing methods, and periodically to rethink the way forward. -- V.G.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.