OpenEye Views SAMPL as Molecular Modeling Stimulus



Popular debut for predictive scientific challenge for industry, academic groups.

By Vicki Glaser

May 12, 2008 | SANTA FE, NM - At this year's OpenEye Scientific Software annual customer and user group meeting* (CUP IX), the organizers presented the results of the SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) challenge, a new tournament to allow computational drug developers from industry and academia to compare tools and strategies for drug docking.

"Ongoing, independent, blind tournaments," such as SAMPL "are of great value," said keynote speaker Paul Labute of Chemical Computing Group (CCG) (See sidebar, "Chess Lessons").

Anthony Nicholls, president and CEO of OpenEye, cringes if one calls SAMPL a "competition," or compares it to other tournaments such as CASP (Critical Assessment of Techniques for Protein Structure Prediction). SAMPL is a prospective evaluation, but it differs in that "failure really is an option," he says. "The industry does not need a competition," Nicholls contends. "We already have that - it's called pharma."

SAMPL provides a forum for prospective science, in which participants can test and compare a variety of algorithms and modeling strategies, hone their intuitive skills, and succeed or fail without economic or professional consequences. Key attributes of SAMPL include exposure to never-before-seen structures and datasets and the opportunity to submit multiple solutions to a given problem, enabling comparisons to be made not only between participating teams and their strategies, but also allowing for intra-group comparisons in which the same person or group may apply more than one approach or software product to a single dataset.

In 2007, Peter Guthrie (University of Western Ontario) first challenged Nicholls, offering to provide a set of compounds not seen before for a blind test. With these 17 structures, OpenEye and a group from Vijay Pande's lab at Stanford compared computational methods for predicting small molecule solvation free energies. This exercise, designated SAMPL0, also included the application of simulation tools to predict protein-ligand affinities for eight complexes provided by GlaxoSmithKline.

The goal of SAMPL is not simply to solve a problem; rather, it is to understand the strengths and weaknesses of current tools and identify gaps and inconsistencies in the ability to predict how a ligand binds to a target with properties desirable for drug development.

Let the Games Begin
In November 2007, OpenEye sent invitations to its six main competitors, 16 academic and government labs, and 12 major pharma companies. News of the SAMPL1 challenge spread via word of mouth and was open to anyone interested in participating.

The challenge had three main predictive components: virtual screening/pose prediction, binding affinity, and solvation free energy. Overall, the 54 groups, evenly split between academia and industry, contributed 205 predictions. OpenEye did not participate in SAMPL except to run controls, so it was blinded when evaluating the results. Participants received only their own results and how they compared to the median.

Two sets of protein-ligand binding data for the virtual screening and binding affinity evaluations emanated from corporate contributors, with Vertex supplying 52 molecular structures for JNK3 kinase and Abbott Labs providing 27 structures for the urokinase target. Participants had access to 20-60 active compounds for each set of protein-ligand data, and all had crystal structures.

Throughout the challenge, OpenEye disclosed additional information about each target, beginning with a set of active and inactive compounds, then a list of the actual actives, followed by a list of poses. Each level of information corresponded to a particular test - virtual screening, followed by pose prediction, and then affinity estimation. Inactive compound collections included 12,000 decoys for JNK and 8,000 for urokinase.

The third component involved predicting hydration free energies, based on data sets contributed by Peter Guthrie and CCG. Participants received about 60 SMILES strings and attempted to derive three-dimensional coordinates, conformations, tautomers, charge states, and charge distributions. OpenEye provided 3D coordinates and charges on request.

Early Results
OpenEye has not yet published the results of SAMPL1 (the results of SAMPL0 in 2007 appeared in J Med Chem 2008;51:769-779), but the company provided a sneak preview of the outcomes at CUP IX.

The participants "did very well overall," noted Geoff Skillman, VP research at OpenEye, who was still crunching the data when he presented the preliminary findings. For virtual screening and binding affinity predictions with JNK3, for example, participants generated "a good set of actives, and all looked like kinase inhibitors." For pose prediction he concluded that "the hand docking approach was clearly the best" - better than any of the algorithms - and that leaving the protein target as is was the best strategy.

"There was not a consistent 'winner' among the methods," he said. GLIDE, ROCS, FRED, and GOLD all fared well. "The best in one area could be the worst in another." Skillman did note differences between the submissions of users with varying experience - amateurs versus professionals. Furthermore, whereas it might seem logical that the vendors of modeling and simulation software packages would be more adept at applying their software to solve the SAMPL problems, the results demonstrated otherwise, according to Skillman.

He remarked on the substantial amount of intra-method variance when there were multiple submissions for a particular method; in such cases, success hinged on what parameter set was used. Additionally, Skillman noted the large range in computing time devoted to the evaluations, from a few seconds or minutes of CPU time to upwards of 400,000 CPU hours for affinity prediction. Interestingly, more computing time did not necessarily correlate with better results.

Matt Geballe, a graduate student from Emory University, noted the large range in CPU times for the four methods they used. In general, "the quicker methods did better... some methods got more compounds right, they also got more very wrong." Working with JNK3, the docking programs "performed well with active inhibitors but did not do very well overall with decoy sets."

Martha Head, director of computational chemistry at GlaxoSmithKline, who did particularly well using a hand-docking approach said: "Knowing one good answer is a huge help. Knowledge of things I have seen in the past largely drives my decisions about docking models, especially for a kinase. The challenge is to make tools that exploit what an experienced modeler already knows."

Stanford's Vijay Pande noted that crystal structures are often not as helpful as they might be, because they are just a snapshot in time and cannot capture conformation changes or allosteric effects. Computational methods that enable reliable predications "are at least ten years away," Pande says. "The targets [in SAMPL1] were much more challenging than in SAMPL0," he said.

After reviewing the results of the transfer energy component of SAMPL, Skillman concluded that despite the use of "very varied prediction methods, everyone did about the same and had similar problems in similar areas."

Enrico Purisima, from the National Research Council of Canada, concluded that, "the model captures the trend, but not the magnitude of the solvation free energies."

SAMPL Take 2
Planning for SAMPL 2009 sparked a lively debate. Pat Walters of Vertex proposed that OpenEye should strategically select a subset of the data or compounds to disclose to the participants at some interval after the start of the challenge. "This might let people tune their methods." Ajay Jain (UCSF) countered that the goal of SAMPL is "to reward good methods, not good gaming," to test computational methods and not the art of modeling.

"I couldn't disagree more," responded Andy Good of Bristol-Myers Squibb. "That is not how we work. Participants should simply disclose the different strategies they use."

Chess Lessons for Computer Modeling

When a computer first defeated a human in chess in 1958, the excitement suggested that a new era had dawned and computers might actually be smarter than humans (at least when it came to chess). Yet, nearly 40 years passed before IBM's Deep Blue was able to defeat Garry Kasparov, in 1997. By 2005, computers were gaining ground on mere mortals, as Hydra, Deep Fritz, and then more crushed individuals and teams of humans alike.

The more mundane worlds of computational drug design and molecular modeling have seen substantial strides over the years, yet their ability to predict accurately protein-ligand interactions has failed to live up to the hype that heralded their introduction, said Paul Labute of Chemical Computing Group (CCG).

Labute's message echoed many presentations at CUP IX, in which modelers, software developers, and computational chemists from industry and academia tried to balance touting progress made in predictive modeling, and virtual screening, with the glaring deficiencies that question the utility of these tools - and even past successes. "How do we know [computer models] work?" asked Labute. "The literature contains mostly anecdotal validation, reproducibility is almost non-existent, and crude diagnostics are easily manipulated."

Labute emphasized that, like computer chess, computational life science will take many years to realize its potential. Along the way, industry would benefit from opportunities to make a critical assessment of what is working and what is not, to compare competing methods, and periodically to rethink the way forward. -- V.G.

 

___________________________________________________

 This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals.  
Apply Today.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .