July | August 2006 | QSAR/QSPR (quantitative structure activity/property relationship) modeling approaches to drug design are hardly new, and increasing experience with them along with readily available computing power has expanded their use. Yet QSAR-based models remain problematic and are often poorly predictive.
One unavoidable problem is that no single approach will always yield an accurate model, argues Alex Tropsha of the Department of Pharmacology, University of North Carolina at Chapel Hill. Another is the absolute necessity to use external data when validating models to determine their accuracy. These two problems, he says, suggest it’s time to change the way QSAR is used. Tropsha was one of several distinguished speakers in the cheminformatics track of Cambridge Healthtech Institute’s World Pharmaceutical Congress, held in late May in Philadelphia.
One speaker, Robert Sheridan, senior investigator, Merck Research Laboratories, reviewed a variety of practical “tweaks” for improving QSAR predictions, pointedly calling QSAR an “art” while focusing on extrapolation and transformation issues. Even with a large diverse training set, said Sheridan, extrapolation is never perfect, and sometimes what the “extrapolation curves show is how poor your models really are.” One of his slides emphasized, “When someone wants to sell you a model, you should ask for the training set.”
To some extent, Tropsha picked up the thread of model evaluation in his second-day talk, “Robust Computational Framework for Predictive ADME-Tox Modeling.” He asked “Why can’t we get it right?” He argued there are “plenty of descriptors” and many reasonable methods but said that training set statistics are deeply flawed. “Change the success criteria,” he said. Recognize that QSAR is an empirical data modeling exercise: Choose any method you like but validate on an independent data set.
Tropsha’s fascinating remarks were expounded in a recent paper* in which Tropsha and colleagues write: “Despite many years of research and a large variety of approaches, there exists no ‘gold standard’ QSAR approach that guarantees the best model for every data set. Recently, we began to advance the combinatorial QSAR approach that explores various combinations of optimization methods and descriptor types and includes rigorous and consistent validation.”
They examined 195 diverse substrates and nonsubstrates of P-glycoprotein and studied methods included k-nearest neighbors (kNN) classification, decision tree, binary QSAR, and support vector machines. In total, 16 combinations of methods and descriptor types were used to develop QSAR models.
They conclude: “Our studies emphasize that the exploratory nature of the combinatorial QSAR approach helps in identifying highly predictive models for a particular data set, whereas a conventional approach to QSAR studies using only one method and one type of descriptors has a higher chance to fail.”
*de Cerqueira Lima, P. et al. “Combinatorial QSAR modeling of P-glycoprotein substrates.” J Chem Inf Model 46, 1245-54; 2006.