Oxford's ultra-fast shape recognition program speeds lead discovery.
By Vicki Glaser
July 14, 2008 | Virtual screening of compound libraries for lead discovery can exact a high computational cost. The push to screen larger databases more rapidly to uncover hits with well-defined biological activity has led scientists at Oxford University to develop a shape-based database search method that is thousands of times faster than existing tools for drug discovery.
Pedro Ballester, working in the laboratory of Professor Graham Richards at the National Foundation for Cancer Research (NFCR) Centre for Computational Drug Discovery, created the Ultrafast Shape Recognition (USR) method. The key attribute of USR is its speed and accuracy for performing shape-similarity searching in molecular databases.
Ballester and collaborators have completed a retrospective virtual screening validation study (currently under review) on the DrugBank database (see “DrugBank Database in Commercial Partnerships,” Bio-IT World, February 2008), with some 700,000 3-D molecular conformations expanded from nearly 4,000 chemical structures. Importantly for this evaluation, each compound in the database was linked to activity data.
USR identified retrieved molecules from the database with similar shapes to a known query compound. They next collated these molecules to evaluate USR’s virtual screening performance. They repeated this process with ten diverse biological targets.
Ballester reported that USR’s performance equaled several current shape-similarity techniques, while being three orders of magnitude faster. In fact, he has shown that USR is more than 1,500 and 2,000 times faster than ESshape3D (the shape-similarity technique within Chemical Computing Group’s MOE software suite), and Shape Signatures, from Randy Zauhar’s group at the University of the Sciences in Philadelphia, respectively.
The first real-life application of USR is being conducted in collaboration with Oxford’s Department of Pharmacology. It incorporates a database containing approximately 690 million molecular conformations built from more than 5 million drug-like molecules from the ZINC database. The USR tool searches for similarly shaped compounds to a potent inhibitor for a particular biological target and the top hits are tested in the laboratory.
“We have obtained an excellent hit rate in this first prospective study,” reports Ballester. “It is in databases of this size where USR shows its full potential.” USR was capable of completing 100 queries on this huge database in less than 90 minutes using a single processor. In contrast, OpenEye Scientific Software’s ROCS superposition method (based on reported efficiency data) would take more than two years under the same conditions.
As molecules with similar shapes are likely to have similar biological activity, a USR run begins with a molecular template and searches a database for similarly shaped molecules. USR defines shape—regardless of a molecule’s size, position, or orientation—by a discrete set of values that represent the full complement of inter-atomic distances. This eliminates the need for computation-intensive superposing and alignment algorithms.
USR’s speed mainly derives from its concise description of a molecule’s shape. It selects four widely distributed reference sites and determines the distances between those atoms. These four distributions are representative of all the atomic distances in the molecule. For each distribution, the software calculates the first three moments—a statistical derivation to characterize a distribution. Together, the four distributions and three moments for a molecule yield 12 1-D descriptors that define the 3-D shape of the molecular conformation.
To date, Ballester and colleagues have applied USR to molecules ranging from 10 atoms up to 100 atoms; the size of the drug-like molecule does not appear to affect the predictive capabilities of the software. Although conceptually different from fragment-based search methods (see “The Search for Unusual Suspects,” Bio-IT World, February 2008), USR could be used to enrich the list of fragments submitted for testing with fragments similar in shape to a known active fragment, thereby increasing the likelihood of finding additional active fragments.
A U.S. patent for USR is pending, and Isis Innovation, Oxford University’s technology transfer arm, is seeking licensing opportunities. Ballester has ongoing collaborations with Pfizer and Bristol-Myers Squibb. His research plans include incorporating additional chemical information into USR, devising new algorithms capable of clustering molecular databases in terms of shape, and studying the implementation of USR-based virtual screening methods in realistic scenarios.
Ballester PJ, Richards WG. Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem. 2007; 28:1711-1723.
Ballester PJ, Richards WG. Ultrafast shape recognition for similarity search in molecular databases. Proc Royal Soc A. 2007; 463:1307-1321.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.