Informatics Black Boxes ... Not!


CONVERSATION | Vertex's chief technical officer, Mark Murcko, discusses informatics' bad reputation, buying vs. building, open-source tools, and ROI on IT Interview by John Russell


May 19, 2004 | Wringing value from bio-IT tools is a challenge. Sometimes the tools simply stink. Other times they supply the critical answer for determining whether to develop a compound. Vertex Pharmaceuticals' Mark Murcko, VP, chief technical officer, and chair of the scientific advisory board, is charged with making sure Vertex's toolbox helps produce profits, not pungency. Executive editor John Russell recently talked with Murcko about what he calls Vertex's "competitive advantage in bio-IT."


Q: Let's start with how the Vertex IT & Informatics effort organized?

A: The informatics and IT effort at Vertex is highly integrated with R&D. We have a team of around 30 dedicated programmers, curators, and modelers developing and supporting new methods and tools. Almost all of these folks have very strong science backgrounds, which is a huge benefit. Our developers sit on drug discovery project teams, side by side with bench scientists, and understand how the software will be used on a daily basis to solve real problems. Our data curators have the expertise to make sense of the information. Together, these 30 people form the core of our competitive advantage in bio-IT. Given our substantial internal effort, we've been fairly aggressive in keeping external licensing costs to a minimum and making sure we operate on a highly efficient IT platform.


 Given such a substantial effort, what role do you see these tools playing? 

The role of discovery informatics is to provide what we call decision support. We believe strongly that there are no "black boxes" — software tools that can automatically take reams of data and turn the data into drugs. We focus on two tasks: making sure that all relevant information is accessible in easy-to-digest form, and developing tools that help derive insights from that data.

Scientists, lawyers, and other pharma professionals are notoriously critical of informatics tools — for good reason. They have been fed so much garbage over the years that they have developed a healthy degree of, shall we say, skepticism. We know when we've done something well and we know when we've failed miserably, and if a tool we deploy isn't working, we hear about it very quickly.

Staying ahead of the curve in discovery informatics is an important factor in Vertex's competitive advantage as a drug company. We are early adopters of many new technologies and approaches and we put those tools (both commercial and home-grown) through a rigorous gauntlet of scientific validation. If the new tool works, we adapt our processes and organization to capture its full potential. If it doesn't work, we throw it away.


Can you give us an example of a tool you dumped?

Back in the late 1990s we wanted to help the chemists do their own routine modeling. At that time, programs like Quanta or Insight (from Accelrys) ran only on Silicon Graphics hardware, so we put SGI workstations in all the chemistry labs and made sure the chemists knew how to use them. This approach failed miserably. The chemists found the software hard to use, not particularly good at helping them understand their problems, not robust — the tools crashed frequently — and of course the scientists hated Unix.

From this experience we "re-learned" a few important lessons. First, ease of use, reliability, and convenience remain critical factors. Second, if the software doesn't directly and clearly address the problems of greatest importance to the scientists — in the way that the scientists are thinking about those problems — it will not help them. Our development of VERDI (Vertex Research Database Interface) and Gene Family Central (GFC) were strongly influenced by these lessons.


To what extent does Vertex develop its own tools and purchase commercial software? And do you help commercial software providers improve their tools?

We always try to buy rather than build. Can you simply write a check and get everything you need? Sadly, the answer is no. What's worse, I don't see any signs of that changing. We have spent considerable time working very closely with many of the major vendors, providing them with voluminous feedback and suggestions, but this has never provided us with much benefit. For whatever reason, it is still not possible to get what we need from the commercial vendors and so we have written quite a lot of tools ourselves.

Our willingness to write a lot of our own software is, I think, somewhat unusual. My observations suggest that large-scale construction of high-quality tools is beyond the scope of what most pharmaceutical companies are willing to do. That makes the industry highly dependent on commercial providers of software. But this is sub-optimal, for several reasons. First, commercial software is difficult to use in the way that people would like to; the tools do not enable scientists to draw connections among otherwise disintegrated pieces of information. Second, the algorithms, the robustness, and the speed of most commercial tools are not very strong. The methods simply do not work well, most of the time, on real-world problems. Finally, even with very good tools, the underlying science is, of course, still highly complex and it will always be very challenging to "extract" all the lessons from the data.

A number of small companies provide quite useful, specific components that complement our internal efforts. For example, ChemDraw, the NuGenesis SDMS system. We try to find the best components from those smaller outfits that allow us to stay focused on the bigger questions of scientific validation and integration. So I see a thriving industry for smaller companies that provide components or "widgets" that we can plug into our systems.


What is the Vertex view of open-source tools? Do you use them extensively?

We're big fans of the open-source movement and use a number of open-source components to help ensure we truly have a flexible, scalable, platform-independent and modular software architecture. Our entire discovery informatics platform runs across three global research sites (Cambridge, Mass., San Diego, and the United Kingdom) in a mixed PC, MacOS, and Linux environment.

"We're big fans of the open-source movement and use a number of open-source components."

Mark Murcko, Vertex Pharmaceuticals 

A good example of a tool we've developed is the GFC proprietary knowledge system. [It] was developed to support our chemogenomics platform — our gene family approach that started with kinases and proteases, and has now expanded to ion channels and GPCRs. The system was first released to our scientists in 1999 and is used daily by several hundred people.

GFC is intended to provide "one-stop shopping" for scientists, lawyers, business folks, and other professionals, where information can be found about a project, therapeutic area, pathway, or gene family. For example, a scientist can read about a kinase in the literature, go to GFC, and get back a curated, annotated "snapshot" of that kinase — a list of key references, data about the target, drugs that hit the target, gene sequences, chemical structure-activity data, the patent landscape, etc.

That information connects back to our chemoinformatics tools, including VERDI, our central clearinghouse for chemical and biological data. VERDI contains both our own compound data and that found in the scientific literature and in patents. Between GFC and VERDI, anyone in the company can get access to quite a lot of information.

Automated data capture approaches simply do not work on their own accord. Vertex has Ph.D.-level scientists from multiple disciplines actively involved in data curation. You need smart people dedicated to making sense of the data and presenting the most relevant information in a manner that actually facilitates scientific inquiry. The human element of effective knowledge management simply cannot be ignored.


Can you describe the VERDI system in a little more detail?

VERDI is a modular, customizable enterprise informatics architecture and framework for scientific data and application integration. It is the platform on which Vertex has built and deployed a number of integrated, task-specific applications supporting the research enterprise. It is scalable and allows us to swap in and out of best-in-class, proprietary and commercial applications. VERDI is a Java/XML program and is comprised of a several components.

The VERDI-QueryEngine is a complete system for querying, viewing, and performing analyses linking chemical structure and biological assay information. VERDI-CompoundFinder is an application for simultaneously searching both internal reagent storerooms and vendor catalogs and managing purchase orders. VERDI-CompoundRegister is an application for analyzing properties and registering compounds to research databases. VERDI-AssayDataRegister is an application for analyzing and registering assay data. VERDI-AssayProtocol is an application for registering and managing assay protocols.

In addition to these workflow components, VERDI also serves as the backbone for delivering advanced scientific analysis methods directly to bench scientists. VERDI contains a number of proprietary analysis modules for predictive ADME, clustering, and SAR analysis. In addition, we have recently deployed a series of "properties alerts" based on both overall chemical structures and the presence of specific substructures. These are designed to give scientists a "heads up" on potential issues with their compounds early in discovery.

On the structure-based design front, we have VERDI-StructureBrowser, which is a proprietary database of curated protein and protein-ligand crystal structures from the PDB as well as our internal labs. These searchable structures are delivered through VERDI to give chemists "live" access to 3-D crystal structure information. Also, there's VERDI-AnalogBuilder, which is a scientist-friendly application allowing researchers to manipulate and refine inhibitor structures within the binding domains of targets in real time.


Once you've bought or developed a tool, how do you determine if it is providing real ROI?

In developing quantitative metrics, it is important to think about the key questions that help evaluate the success of a discovery informatics project. Are the compounds our chemists are synthesizing, on average, more soluble than before we released the new solubility model? Has the average time it takes to register a batch of compounds decreased substantially since the deployment of the chemistry electronic lab notebook? Are we filing patents faster? Are we capturing a greater share of chemical IP space around our targets of interest? Is a greater percentage of "hits" advancing to leads, advancing into animal models, advancing to development candidates?

We invest a substantial amount of time and effort into tracking these quantitative variables and extrapolating that information into financial value to our organization.

One major key for us has been the ability to drill down into the physical attributes of what makes a successful drug. With the acquisition of Aurora Biosciences [in 2001], we acquired capabilities in ultra-high-throughput screening and cellular assay development. Vertex has been using the capabilities in San Diego to advance proprietary research in membrane-bound gene families (ion channels, GPCRs) and generate high-quality data to track the success of our global research efforts.

Every compound synthesized gets assigned an experimentally measured solubility and is run through a gamut of proprietary, high quality, standardized biochemical and cell-based assays. These assays provide experimental readouts on properties such as membrane permeability, Cytochrome P450 binding, hERG channel activity (a marker of cardiotoxicity), and hepatotoxicity. We've also developed a set of more than 250 proprietary "drug-likeness" filters. Every compound gets scored for drug-likeness.

This systematic effort generates a tremendous volume of high-quality data enabling a parallel property prediction modeling effort. It is only possible to build truly predictive models if the data used to build the models is developed in standardized conditions to the highest levels of quality control. Most commercial models currently available are based on experimental data pulled out of the literature. Such models simply do not work in a real-world setting.

Vertex has developed and validated several predictive ADME-toxicology models that, along with several proprietary SAR and clustering applications, are deployed companywide through the VERDI chemoinformatics system. These efforts have contributed to a statistically significant rise in the quality of compounds being synthesized in our labs over the past several years.


Given Vertex's propensity to try new technologies, what are your thoughts on systems biology's potential?

Systems biology provides a quantitative, engineering-style approach to how the body functions, how cells and tissues work. It's exciting and innovative, but this is a hugely complex problem that will not be solved overnight. Every technology has a lifecycle, and when technologies are brand new they always seem incredibly exciting — and people get too enthusiastic. Then realism sets in, and then cynicism, at which point everyone overreacts and decides the technology is worthless!

Ultimately, if the technology is truly useful, we come to appreciate its actual strengths and weaknesses. In the 1980s, SBDD [structure-based drug design] was going to solve all the world's problems. But by the 1990s many people thought it was a waste of time. Now people seem to have a very realistic idea of what it's all about. HTS, combinatorial chemistry and genomics have also gone through that cycle. Proteomics is going through this right now and systems biology will, too; it's the same with any new technology. We're 20 years or more away from widespread, practical utility [for systems biology.]

The key advances we need are on the clinical side. I am not saying that research is perfect by any means, but the greater needs in pharma are in development. We need to understand how to reduce attrition in the clinic, we need to do a better job of knowing which patients will benefit from a particular drug, and we need to shorten clinical development timelines.


What tools do you think are closer to delivering value?

Two innovative disciplines I think are closer to paying off than systems biology are predictive toxicology and what I'd call properties research. We are starting to see evidence that molecular "fingerprints" — patterns of changes in genes or other biochemical markers — can predict various kinds of toxicity. Similarly we are beginning to see progress in understanding the physical properties of would-be drugs. When we look at any new chemical structure we ask, "Do we think this molecule is drug-like? Does this molecule have enough solubility? Will it be bioavailable? Will it be able to get across cell membranes? We are beginning to understand the properties of drugs — something that should have a huge impact on clinical attrition profiles of new compounds going forward — if applied appropriately.



PHOTO CREDITS: MURCKO BY MARK ALCAREZ










For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359, jmulhern@healthtech.com.