April 12, 2007 | Medicinal chemistry is fast evolving from an "art" to a "science." Driven by advances in chemical synthesis, instrumentation, and high-throughput and high-content screening technology, this transition is also benefiting from a wealth of new software products, spanning both bio- and cheminformatics. Sophisticated and increasingly user-friendly computational tools are empowering medicinal chemists (MCs) to employ database filtering and predictive modeling solutions, apply visualization and advanced data analysis tools into their workflow, and integrate a broader variety of experimental data and in silico resources into earlier stages of drug discovery.
"Medicinal chemists are under pressure to do more planning upfront - synthesis planning, patent planning, and understanding why a molecule will be active, selective, and druggable," says Frank Brown, CSO at Accelrys.
Among the innovative software products at the MC's disposal are structure-based analysis and design tools and modeling software. Data mining and visualization solutions can aid in defining the physicochemical and biological implications of alterations in chemical or 3D structure and in understanding the factors affecting when and how a drug binds to its target. Filtering strategies are helping drug discovery groups integrate ADME-tox and pharmacodynamic parameters earlier in lead discovery and optimization, culling through compound libraries to exclude structures more likely to fail if selected solely on their target-binding properties. Other data-mining solutions ensure that a promising lead compound or technique is not subject to patent restrictions.
There is a diminishing distinction between expert computational tools and applications targeted directly for use by MCs. The development of increasingly intuitive, "intelligent" applications is changing the historical model, in which computational groups within pharma would build a toolbox of algorithms and in silico applications for use by the med chem groups based on in-house or purchased software.
Historically, "in the area of software, other than chemistry drawing packages and research databases such as Bielstein and SciFinder, medicinal chemists had been completely left out," observes Sanji Bhal, technical marketing specialist at ACD/Labs. Software vendors have recognized that they were missing applications for a large market and developed tools that required less computational expertise and could support the workflow of medicinal chemists. The ideal software package for MCs must be able to identify and manipulate compounds and fragments that meet drug-likeness requirements, expose them to new chemical space, and incorporate synthetic feasibility.
"With the implementation of pipelining tools, some of these more expert tools are now available to medicinal chemists as protocols they can apply as a push-button function," says Dirck Lassen, senior director, scientific marketing at Tripos.
Accelrys' Brown sees a trend toward "one-click science," that allows MCs to design, synthesize, analyze, modify, screen, optimize, and register compounds in an in silico environment. The concept of "data pipelining," at the core of Accelrys' SciTegic Pipeline Pilot platform, is a process that streamlines the integration and analysis of large amounts of data and allows the organization to validate and deploy standardized computing processes (see Working Out the Flow, Bio-IT World, Sept. 2006).
For example, software that links a registration system with an inventory system, as well as reagent, synthesis, and patent content, allows for better design of a chemistry plan. Electronic registration systems are valuable for capturing the intellectual property (IP) of an invention, facilitating the patenting process, and avoiding IP conflicts. Ease of access to patent-related information and considerations would help MCs integrate IP issues in their decision making earlier in discovery.
CambridgeSoft recently introduced a Patent Database portal based on codevelopment partner Reel Two's SureChem patent chemistry database. The database gives users access to all the chemical compounds named in a patent and allows them to search by chemical structure, keyword, or name. Although initial access is limited to U.S. patents and applications, the database will expand to include European and Patent Cooperative Treaty (PCT/WO) patent and application data.
Accelrys' DS Catalyst - a collection of pharmacophore and 3D database management tools in the company's Discovery Studio suite - is an example of a solution that combines pharmacophore modeling tools with data pipelining technology to enable automation of large scale virtual activity profiling. Discovery Studio includes tools for generating protein models, simulating binding interactions, performing docking analyses, and evaluating pharmacokinetic, ADME-tox, and other drug-like properties of compounds.
In med chem, the two main types of in silico methods are tools for data analysis and predictive modeling, and tools that facilitate access to and visualization and utilization of available data, including programs for data management, sharing, manipulation, and integration. The greatest benefit of the first category - computational tools - is their ability, says Lassen, "to expand the manageable chemistry space from a few million compounds to about 1020 synthetically accessible compounds" that can be generated and screened in silico.
The ability of computers to analyze abstract representations of molecular structures allows MCs to "break out of established thought processes," Lassen adds. Tripos's Topomer technology can "calculate the shape of a molecule in combination with its potentially interacting features," enabling lead or scaffold "hopping" to search for molecules with a similar shape-and, ostensibly, a similar biological activity - but with a different chemical scaffold that may offer advantages in terms of ADME-tox properties, off-target binding, or IP. New products from Tripos will include Topomer CoMFA (comparative molecular field analysis), a 3D-QSAR program that overcomes the problem of molecular alignment by fragmenting compounds into fairly rigid components, and Topomer Search, which allows users to create and screen databases of 108 compounds based on morphological characteristics.
The past five years have seen "a shift away from the computational expert as the only person using modeling software," notes Chris Williams, principal scientist at Chemical Computing Group (CCG). CCG is the provider of the Molecular Operating Environment (MOE), a suite of small-molecule and protein modeling, structure-based design, cheminformatics, and high-throughput discovery applications. Of particular interest is the presentation of 3D chemical structures for use by chemists not proficient with modeling tools. These tools apply computational methods to depict and align structures and link them to activity data to facilitate recognition of patterns, all in a format understood by MCs.
"Modeling is becoming more of a service than a be-all, end-all," says Williams. Such applications can help bridge the gap between complex modeling tools and lead discovery and enable MCs to incorporate SAR and ADME-tox information early in the design of novel compounds. In this rapidly evolving field, new expert methods are continuously becoming available that chemists must be able to integrate into discovery research. Examples include assay data for hERG potassium channel cardiotoxicity and measures of blood-brain barrier permeability.
From a med chem perspective, software advances have come mainly in the form of improved 2D and 3D rendering tools (making it easier to draw, store, and visualize structures), user friendly analytical applications for extracting information from computationally complex methods such as in silico docking studies, and integration of functions within a single toolbox so chemists only have to become proficient with one interface. CCG recently collaborated with BioSolveIT to introduce a graphical interface to BioSolveIT's FlexX docking program within the MOE modeling software to aid in predicting binding affinity from docked structures.
At MedChem Europe earlier this year, BioFocus DPI (part of Galapagos) launched two upgrades to its Admensa suite of desktop optimization tools. Glowing Molecule is a visualization tool that highlights regions within a compound likely to confer undesirable ADME properties and identifies chemical functionalities that could improve the ADME profile. Auto-Modeler generates predictive models based on experimental data. For novices, training, validation, and testing of models are automated, while the program offers expert users the capability to fine-tune and customize descriptors and model-building parameters. This represents a prime example of the dual functional capabilities of today's in silico tools, meeting both the needs of the computational community while recognizing the emerging desire among medicinal chemists for plug-and-play capabilities. (See sidebar: Doing It By the E-Book.")
Enabling Predictive Science
"It is a challenge to design [software] for medicinal chemists," says Frederic Eyber, head of business development at Ariana Pharma. They need a tool for their workflow that will take raw datasets, run an analysis, and help them decide the next steps and what compounds to synthesize.
Ariana applies its multiparametric decision support technology to its in-house drug discovery program and recently commercialized KEM (Knowledge Management and Extraction). This aids in the identification and optimization of small molecule and biological therapeutics by applying a rule-based machine-learning system that automatically extracts an ontology and predicts profile improvements. KEM can take an incomplete dataset and extract rules corresponding to the minimum motifs that fully represent the whole dataset, analyzing relationships between the data. Its ability to identify "conflicting contexts" - feature combinations that would prevent a compound from complying with multiple objectives - can help guide lead optimization, according to the company. It can also identify "innocent bystanders," or chemical groups that have been tested in unfavorable chemical contexts and could prove of value if evaluated under different conditions.
In about 70 percent of cases, the pharma industry is working on targets with no known protein structure, notes Lassen. He describes "a flow back to more ligand-based (vs receptor protein-based) screening methods." Tripos recently introduced Surflex-Sim, a molecular alignment algorithm integrated with the SYBYL modeling environment, that can virtually screen a compound collection based on information derived from a ligand or competitive drug molecule, without knowledge of the target protein. This tool performs molecular alignments based on their morphological similarity and interacting features and, by correlating biological activity, can predict off-target effects without the need to detect an actual binding site.
IDBS released a new version (6.2) of PredictionBase, its in silico modeling application, which gives MCs the ability to build their own predictive models based on chemistry and experimental data. Glyn Williams, VP marketing and product management, sees increasing integration of predictive science in early stage drug discovery, "especially around chemistry." These predictive capabilities are guiding decisions about what compounds to make and test and warning of potential safety and pharmacokinetic problems. For example, if a predictive tool were to identify a potential problem for a specific compound crossing the blood-brain barrier, MCs might test for this property earlier in the lead identification process or select analogues that avoid the problem.
Williams says it is important to understand, "how applicable a training set is for predictive modeling within a defined chemical space and set of structures," emphasizing the advantages of being able to customize models within an organization. "As the model is only as good as the quality of data you're putting in, PredictionBase's advanced methods emphasize model quality to ensure that scientists not only get the best value out of the data but also identify its limits."
Although predictive tools have been available for some time, and are mandated in some companies, "actually getting them onto the desktops has been slower than anticipated," observes Williams. This is partly due to hesitancy in the industry to devote time and money to a tool that provides predictions rather than guarantees of improved efficiency and productivity. However, Williams predicts over the next 12 months, "we will see greater acceptance and wider deployment of predictive tools for biological endpoints."
Innovating Med Chem Tools
Innovation across the spectrum of med chem applications has spawned a wealth of new software tools. These deliver interactive visualization capabilities, bring computational might to chemical design and structural modification steps, and provide user-friendly informatics. "Chemists are visual people and you have to let them draw and see," says Alex Allardyce, the Scottish director of marketing at Budapest-based ChemAxon. ChemAxon recently integrated its MarvinView chemical visualization technology with Genedata's Hit Profiler software tool for prioritizing hits to leads. Marvin is a collection of Java tools for drawing and characterizing chemical structures, substructures, and reactions. Because Marvin is a Java-based technology, "it does not need to be installed like a plug-in technology and is much easier to deploy reliably over most operating systems and web environments."
With ChemAxon's Fragmenter tool, scientists can create chemical building blocks by fragmenting larger molecules while retaining related assay data and linkage information to generate analog libraries. The tool can also be used to decompose combinatorial libraries to identify substituents of a common scaffold.
Designed to identify lead compound structural modifications that produce analogs with targeted physicochemical properties, ACD/Labs' Structure Design Suite allows users to specify a particular region of a compound they want to modify and the specific properties, such as solubility, they want to improve. Version 10 expands integration of spectroscopic and other analytical techniques, including improvements to NMR spectroscopy, mass spectrometry, and physicochemical property prediction tools.
Bringing informatics to the desktop, ChemAxon's Instant JChem manages chemical and non-chemical information in local and remote databases. (ChemAxon provides JChem's core functionality free to users.) It combines the capabilities of Marvin and JChem Base, a search engine that uses 2D hashed fingerprints to search datasets based on substructure, similarity, and superstructure, with the ability to populate columns with structure-based properties.
In March, Eidogen-Sertanty announced partnerships with software vendors Elsevier MDL (joining the MDL Isentris Alliance), Accelrys-Scitegic, and Daylight CIS. These vary from product access to codevelopment focused collaborations and include access to content reader and calculation components within Scitegic's Pipeline Pilot, further development of and access to Eidogen-Sertanty's ligand-based Kinase Knowledgebase and target-based Target Informatics Platform (TIP). Later this year the company will release a behind-the-firewall version of TIP supporting Oracle and MySQL.
This sampler of recent additions to the medicinal chemist's computational toolbox illustrates that never have MCs enjoyed a selection of software resources as they have at their disposal today. Ultimately, MCs must be able to apply these tools efficiently and coherently to advance high quality drug candidates along the developmental pipeline.
Sidebar: Doing It By the E-Book
When Millennium Pharmaceuticals first became interested in changing to an Electronic Laboratory Notebook (ELN) system in 2001, it was not satisfied with the options available at that time and did not initiate a pilot process with Symyx Technologies products until 2004. By November 2005, after several months running in hybrid mode, Millennium's chemistry operation was fully electronic - no more paper.
Two factors figured prominently during this transition, according to David Sedlock, director of research systems at Millennium. The first was simply the recognition that this represented a major change in workflow and a break from relying on what was familiar. All Information associated with chemical reaction design, execution, and analysis of product results would have to be captured electronically. Sedlock credits much of the success to the approach taken - engaging the chemists in the adoption process from day one and empowering a "pilot team" to evaluate the product and participate in decision making.
The second factor involved taking a holistic approach and considering how the technology would be used in the laboratory environment and what types of procedural changes would be needed. The pilot team emphasized the potential benefits, including enhanced search capabilities, a minimal paper trail to manage, and elimination of the need to witness and countersign every entry. When, during the interval between the pilot evaluation stage and a full-scale switch to ELNs, the chemists asked to be able to use the e-notebook, Sedlock knew they were hooked.
For chemists, "the number one advantage of an e-notebook system is synthetic schema replication," says Sedlock. "The ability to clone a reaction is the main time-saving element, because you cannot do that on paper." Other key timesaving features include expedited reagent shopping and the ability to manage data without going back and forth between applications.
Furthermore, ELNs dramatically reduce the time needed for IP/patent preparation. In a paper system, Sedlock estimates that filing a single patent consumes about 1.5 person-months of a chemist's time, or 1 full-time equivalent (FTE) chemist a year to file eight patents. With e-notebooks, "the time is inconsequential," he says.
Sidebar: Leveling the Field
Earlier this year, Symyx released version 4.0 of Symyx Software Discovery Notebook, which includes ELN applications tailored to chemistry protocols from discovery through process, analytical, and formulations functions. New features include a multi-variable search window and multi-panel layout to facilitate searches of existing datasets.
Michael Tomasic, chairman and CEO of CambridgeSoft, views the evolution from paper notebooks to electronic notebooks as the "biggest change in med chem and drug discovery." The adoption of e-notebook technology has important implications across drug discovery and development, with particular advantages for communication and sharing of research protocols and data both within an organization and between a biopharma and its outsourcing partners.
E-notebooks, in a sense, level the playing field, standardizing the recording of protocols and data, making information recorded at the lab bench or desktop accessible across an organization and to collaborators with access to a centralized server, and even eliminating problems with deciphering poor handwriting or reading what lies underneath a coffee stain. But the real value may lie in the historical and legal implications of electronic data entry. For example, information is not lost or misplaced when a scientist leaves the company. From a legal and patent perspective, the ability to record data and discoveries on a global system that definitively documents the date, time, location, and circumstances surrounding a discovery is invaluable for establishing intellectual property and patent protection.
IDBS's E-WorkBook suite of ELN applications includes BioBook, with application areas such as oncology, neuroscience, and metabolic disorder, and ChemBook, which includes functions such as reaction and structure drawing and searching, stoichimetry, and parallel synthesis support. - V.G.
Subscribe to Bio-IT World magazine.