Why Are We Focused on Simulation-based Methods for Predicting Binding Affinity?

Contributed Commentary by Matthew Segall, Ph.D., and Himani Tandon, Ph.D., Optibrium

November 14, 2025 | Early computational drug discovery relies on accurate predictions of binding affinity. If we can accurately predict the affinity of molecule designs, we can dramatically reduce the number of compounds that we need to synthesize and test when optimizing new drug candidates.

However, this continues to be a challenge, particularly in the absence of high-resolution protein structures, with the industry continuing to rely on computationally expensive simulation methods when protein structures are available, despite emerging alternatives.

Physical Simulation-based Methods: The Dominant Structure-based Approach

Simulation-based methods have gained prominence because, for protein targets with known structure, a significant fraction is amenable to methods, such as free energy perturbation (FEP), which predict relative binding free energies. These methods are widely trusted as they directly model the physical interactions between proteins and ligands at the atomic level. While such methods have been in development for decades, they have seen a surge in utilization due to recent advances in accurate force-field energetics combined with huge increases in computing power.

Despite their power, there are limitations of such methods, including high computational cost, the requirement for a high-quality protein structure, and limited applicability to a narrow window of structural changes around a reference ligand. Perhaps most importantly, target-to-target variation in prediction accuracy is high.

That said, physical simulation methods are receiving great investment in methodological research, and expectations of significant improvements are high. A promising avenue is the development of absolute binding free energy calculations, which would enable affinity predictions without a closely related reference ligand. However, we are likely at least several years away from the accuracy needed for practical use, and even further from general accessibility, due to the significant computational expense involved.

So Why This Reliance on Simulation?

Despite their current limitations, simulation methods dominate the field of affinity prediction, in part due to a widespread lack of confidence in alternative methods, such as various Quantitative Structure-Activity Relationship (QSAR) models.

The fundamental problem with 2D QSAR methods for affinity prediction is the mismatch between the physical reality of protein-ligand binding and the correlative nature of machine learning (ML) models whose input representations and parameters often bear little or no connection to the underlying physics. These statistical ML methods assume that the collection of objects to be predicted upon is drawn from the same population as the training set, which is rarely the case in drug discovery, where novel chemical matter is sought.

Some pioneering 3D QSAR methods like CoMFA and CoMSIA moved towards physically meaningful molecular representations, capturing 3D shape, charge, and stereochemistry. However, they remained brittle, dependent on assumptions about how molecules are aligned in 3D space and continue to depend on black-box parameters that constrain their reliability and broader applicability.

Therefore, when given a choice between a physics-driven method that clearly makes predictions in a physically sensible manner, and an ML method that ignores most or all explicit physical domain knowledge, it is no wonder that most researchers would choose the former.

But It’s a False Dichotomy

Over the same decades that saw advances in methods such as FEP, ML approaches capable of respecting and embedding physics domain knowledge to predict binding affinity were also being developed. A key breakthrough was multiple-instance machine learning, which overcomes the need to make assumptions regarding ligand conformations and alignments (poses). Instead, the induced model dynamically identifies and refines optimal ligand poses as its parameters evolve, effectively learning both structure and physical interactions simultaneously. Ultimately, the model functions analogously to a protein pocket, allowing new molecules to be fitted into it using a process directly akin to molecular docking and scoring.

In the latest incarnations, these physically grounded ML models approach being causal predictors: they explicitly model the physical factors that govern molecular recognition by accounting for ligand shape, electrostatics, directional hydrogen-bonding preferences, and conformational strain, while automatically solving the problem of molecular pose. They capture the physical interactions driving affinity, rather than relying solely on statistical correlations. Importantly, the parameters underpinning such models have clear physical interpretations rather than being purely statistical or black-box parameters.

These methods achieve accuracy comparable to FEP and are roughly 1000x less computationally expensive to run on candidate ligands, while having a broader domain of applicability to new chemical scaffolds and enabling their use much earlier in discovery projects.

The Path Forward: Synergy, Not Replacement

The choice doesn’t have to be either/or. Instead, the focus should be on how we can take advantage of all available tools most effectively. For researchers already using FEP to predict binding affinity, adding a complementary physics-informed ML method with equivalent accuracy at roughly 0.1% of the computational cost presents a compelling opportunity.

Because direct physical simulation and physically motivated ML methods make largely orthogonal assumptions, their prediction errors tend to be uncorrelated. Using the two in parallel and averaging their predictions has been shown to improve accuracy (Cleves et. al., 2021).

We can also see dramatic efficiency improvements using the two methods sequentially; physics-informed ML methods can first screen larger or more chemically diverse compound libraries at high throughput, then more computationally intensive FEP methods can be applied to the top candidates. This approach allows us to evaluate significantly more compounds and explore wider chemical space using the same computational resources.

Finally, physics-informed ML extends the reach of predictive modelling beyond what FEP can achieve: it can be applied in the absence of protein structures and used in active learning to guide exploration of novel chemical space, dramatically improving the likelihood of discovering structurally diverse active compounds.

Matthew Segall, CEO, Optibrium, has a Master’s in Computation from the University of Oxford and a Ph.D. in Theoretical Physics from the University of Cambridge. He has led teams developing predictive models and intuitive decision-support and visualization tools for drug discovery and has published over 40 peer-reviewed papers and book chapters. In 2009 he founded Optibrium, which develops ground-breaking AI software and services, that improve the efficiency and productivity of drug discovery. He can be reached at matt@optibrium.com.

Himani Tandon, Principal Scientist, Optibrium, works in the research division at the Company, developing cutting-edge software solutions that support small-molecule and macrocycle design in drug discovery. Her work focuses on applying 3D structure-based and ligand-based design strategies for lead discovery and optimisation. Himani holds a PhD. in Computational Structural Biology and Bioinformatics from the Indian Institute of Science, and completed her postdoctoral research at the MRC Laboratory of Molecular Biology. She can be reached at himani@optibrium.com.