YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Special Report:
Resolving Bottlenecks II


By Mark D. Uehling 

April 15, 2003 | "Target" and "lead" may be the most absurd terms in the life sciences. In the dictionary, a target is something to aim at. At drug companies, a target or lead may be one of 500 genes, 5,000 proteins, or 500,000 chemical structures that pour out of a computer program, lab instrument, or database day after day, burying the doomed souls paid to analyze them. Aiming at such targets and leads is like shooting into a flock of birds. You will hit something — the question is what.

It's not news that technology has created a deluge of targets. Real libraries of molecules, created with combinatorial chemistry, may contain hundreds of thousands or millions of entries. Virtual libraries may have a billion molecules.

Being able to identify the most promising drug candidates — and eliminate the rest — presents billion-dollar opportunities. But rummaging through a mountain of potential biological targets or new drugs to neutralize them remains a daunting industry bottleneck, a challenge both scientific (with no peer-reviewed research to prove that a particular ranking technique works) and informatic (despite IT to store, search, and analyze both chemicals and their biological impact). Naturally, the first step is to find drugs that work. The next is to select drugs without side effects.

Chemists in preclinical labs can easily reject a wide variety of molecules known to cause direct toxicological damage, harming an organ or an >>

individual. But other side effects of a compound can be harder to predict — and, undetected, result in billion-dollar lawsuits and withdrawals of drugs like Baycol or Phen-Fen. So scientists use cells and laboratory animals to study which compounds are not absorbed, distributed, metabolized, or excreted properly. The art of assessing such ADME-tox properties is stuck in the slow lane, with some animal studies taking years. Could computers speed up ADME-tox assessment? Can instruments, software, and databases be as good at prioritizing potential new drugs as they are at churning them out?

First, the bad news: Digital ADME profiling cannot yet predict a drug's safety or efficacy as well as studies in cell culture can. "Drug developers are generally rather realistic people and use these tools judiciously within their limits," says Olavi Pelkonen, professor of pharmacology at the University of Oulu in Finland. "There are no examples of approved medicines validated with 'e-ADME' techniques, nor any coming very soon."

The good news? An appraisal of the technological tricks of preclinical drug discovery suggests the word "lead" could soon become meaningful. New technologies, including microarrays, databases, and software programs, are providing drug companies with far better intelligence on which molecules to steer toward clinical trials — and which to abandon. There are also nascent efforts in the federal government to coordinate industrywide ways to ascertain drug safety. The field thus could benefit from a convergence of identical tools for the life sciences and its regulators.

Fewer Dead Rats 
Of course, any self-respecting scientist will recoil from futuristic predictions. But with the right caveats, the FDA does have lofty expectations for the impact of new technology to expedite rigorous assessments of drug safety.

FDA Talks Tox 

A new database will teach the FDA how to manage and evaluate toxicological data about drugs from microarrays.

Read More 
"In 10 years," says Frank Sistare, director of the FDA's division of applied pharmacology research (see "FDA Talks Tox"), "we are likely to be able to reduce very much the longevity of the tests we require currently. Whether that means carcinogenicity studies, I don't know. But a lot of the long-term toxicology studies we do may not be necessary. One could argue that a 24-hour gene expression experiment may suffice for many compounds. Other studies could require as long as a month."

For now, the most glaring bottleneck is the glut of data to be analyzed with garden-variety IT. Digital libraries of chemical structures are so vast that they cannot be assessed even in silico, much less by a chemist working an 80-hour week. "These libraries are so large that you cannot virtually enumerate them," says Dimitris K. Agrafiotis, executive director of informatics at 3-Dimensional Pharmaceuticals. "You cannot afford to do virtual synthesis, let alone real synthesis."

It's premature to think that computers will quickly supplant wet-lab testing. "It's the usual hype that follows after the introduction of new computational techniques," Agrafiotis says. "Biology is a complex phenomenon, and our tools look at simplified models of that reality." But that hasn't stopped Johnson & Johnson from acquiring his company this spring. The company makes the DirectedDiversity platform, which screens large libraries of chemicals for those that can actually be synthesized. "It's an industrialization of early-stage drug discovery," Agrafiotis says. "We couldn't afford to make one compound at a time."

Agrafiotis stresses the fact that mundane IT plumbing — getting the right information to the researchers — often takes a back seat. Not in his shop. "We have been able to provide the discovery scientist the informatics tools that allow them to retrieve all the information necessary for them to make a decision," he says. "Which compounds have been synthesized? Do we have crystal structures? Being able to deliver all that information readily to the desktop of every user in the company has been very helpful."

Chemical Intuition 
Another pioneer at mining the chemical realm with computers is MDL Information Systems. Seth Pinsky, the company's senior vice president for research, concedes there are limits to what preclinical drug-optimization technology can do, but he believes informatics to model drugs has become indispensable. "The last 100 drugs [approved by the FDA] would have been unlikely to appear without some informatics capabilities," Pinsky says. "You could not have made any of these drugs without databases."

Pinsky says the MDL tools are used in all but one of the top 10 pharmaceutical companies, and even the holdout recently requested a demo. "Chemists' intuition is a little more fabled than real," he says. "If the chemist's intuition is that good, why would they have to make thousands and thousands of compounds? They're not as good at predicting as some of our software."

Target Practice 
U.S. and European companies in the lead-optimization business include startups and established names in the life sciences. Large pharmaceutical companies are starting to form tighter partnerships. But no industry-leading tool has yet been proven by peer-reviewed research to find drugs faster.

Read More 
The MDL software allows scientists to consolidate 50 different wet-lab assays and digital chemical data in a single computer view. "We see some people who do amazing and creative things with our tools and databases," Pinsky says. "Some just run it out of the box. The companies that do creative things are getting a competitive advantage."

An obstacle to doing more work on the computer is not the quality of the algorithms that might compare a safe, effective drug to various prospective compounds. Rather, it is the underlying data sets, compiled on a company-by-company basis, that may be problematic. Even the biggest of big pharmas might have exhaustive toxicology data on only 1,000 molecules. "A thousand is such a tiny number," Pinsky says. "It's laughable."

Pinsky floats the idea that a variety of chemical and safety data from many drug companies could be blinded or otherwise anonymized and pooled into a larger database. Strangely enough, that's something on the to-do list of Richard Paules, toxicogenomics facilitator and director of the microarray group at the National Center for Toxicogenomics (NCT). A division of the National Institute for Environmental Health Sciences at the NIH, the NCT is working with two companies and five universities to build a massive public database of toxicogenomic data, similar to federal repositories for genes and protein structures.

This database will include a diverse array of data on tissue specimens, gene expression, proteomics, chemical structures, and toxicological data about dosages and responses. "We are in the process of trying to generate signatures — combinations of genes that will be informative of an adverse health endpoint," Paules says. "The goal is to be able to pick up changes early on that are predictive of changes in response to the environment."

Microarrays, Paules says, are ideal ways to explore how thousands of genes respond to a chemical exposure. The NCT's goal is to craft a federal database that can accept equivalent, comparable test results from several microarray manufacturers.

Paules doesn't underestimate the obstacles. It could take a decade to get the NCT knowledge base up and running in its full glory. Once that happens, however, the reliance on in vivo testing of potential new drugs on whole animals could shift again, just as it did in the 1990s, when labs explored every possible way to minimize the use of living creatures in response to animal-rights activists. "One clear goal is to reduce the number of the animals and duration of the studies," Paules says. "We are guardedly optimistic. We think the technology has great potential if it's done right."

Let's Go to the Microarray 
In industry, Paules and the NCT have two early corporate allies: Science Applications International, to design the database, and Paradigm Genetics, to manage data analysis and work with Agilent to create both a standard and a custom microarray.

"People are using microarrays further down the drug discovery pipeline — on molecules and organ systems and whole animals. The microarray is going to be an important way to understand the impact of drugs on organs."

John Hamer, Paradigm Genetics

John Hamer, chief scientific officer at Paradigm, notes that drug-related adverse effects are the fifth leading cause of death in the United States and the most widely prescribed drugs may work on only 40 percent to 60 percent of patients. Because of cost, even the largest companies are limited in the numbers of compounds that can be tested. "Large rat studies can be very expensive," Hamer says. "People are using microarrays further down the drug discovery pipeline — on molecules and organ systems and whole animals," Hamer says. "The microarray is going to be an important way to understand the impact of drugs on organs."

For its own proprietary program, Paradigm's scientists will use mass spectroscopy to analyze metabolites, or biological byproducts of a drug, which can be analyzed in urine and blood. "One doesn't have to take a tissue biopsy," Hamer says of metabolomics. "We think that's attractive."

Rather than testing hundreds of compounds against a battery of cell cultures or a building full of rats, Paradigm hopes its technology will yield early hints that one molecule affects metabolism or breaks down into a metabolite that causes problems in the kidney or liver.

Another company working closely with the government is Iconix Pharmaceuticals, where CEO Jim Neal just trained 11 FDA scientists how to use his tools. Neal would like to market a broad test — one that reveals a chemical's effect on every gene in a rat — affording a biopharma company a comprehensive preview of a potential new compound. "We do need to understand biologically what is happening genomewide," he says.

Iconix is working with a rat gene chip from Amersham Biosciences to build a database of gene-expression "signatures" of various types of toxic effects. The FDA will access that database, and Iconix will license it. Says Neal: "You can look at the gene-expression profile and what's happening on the DNA level from day one to day five and make good predictions about what is going to happen in toxicology 30 days from now."

This is hardly a standalone tool, being connected to offerings from the likes of MDL, MDS Pharma Services, Incyte Genomics, and Spotfire. Neal predicts that users of this technology "will have a lower attrition rate and a higher success rate, because [they] will know a lot more."

Another tiny California company, Libraria, aims to improve the design of the vast chemical libraries from which new drugs are plucked. Such libraries, according to Libraria, can be prudently shrunk using IT. That's economical, since one molecule may cost at least $500 to synthesize. The company's forte is information on structure/activity relationships (SAR), combining biological and chemical data.

Libraria has mined public data, curating 65,000 SAR data points about one gene family. Its database now contains 14,000 generic chemical reactions from high-throughput chemistry. "That gives them a good starting point of which reactions to use," says Kristina Kurz, Libraria's associate director for business development.

Kurz notes other commercial software, even from MDL, isn't as adept at handling and archiving data about chemical transformations. "We create an e-screen that helps you to screen your library in silico," she says. "We combine the method you need to design and enumerate your library and screen it for the characteristics you're looking for."

Faster Kinase Inhibitors 
Development on the program will continue courtesy of a two-year, $2-million grant from the U.S. National Institute of Standards and Technology. The program runs on any desktop, via the Web, and accesses an Oracle database. Libraria has already helped the Novartis Research Foundation find several novel chemotypes that inhibit an important kinase target. "The efficiencies discovered in this collaboration with Libraria resulted in faster and more accurate prediction of kinase inhibitors," Novartis' Peter Schultz says. "The approach should be applicable and scalable for other kinase targets of interest to the biopharmaceutical industry."

As often as not, the heart of shortening the preclinical process is a database. At GeneTrove, President Richard Brown concedes that moving from genes to drugs is going slower than the genome project hype suggested: "Going from raw sequence to a validated gene to a drug discovery target turned out to be much more difficult than anyone imagined. That is the bottleneck that Gene- Trove is trying to address."

At GeneTrove, it is difficult to tease apart the computers and what happens in the wet lab. "We're giving customers a faster way to knock out a gene and look at its therapeutic effect than they could do any other way," Brown says. The assays are run for companies including Lilly, Amgen, Merck, Pharmacia, and Celera. "We digitize that information, and it goes into a database in what we call a tube formation score. Each one of the assays has some sort of scoring. You can mine Science or Nature, or you can mine our database."

Some customers are analyzing their data through GeneTrove's MetaGraph platform, software that helps to analyze what the company has done in the lab. (Another application — Spotfire — can also mine the GeneTrove data.) As Brown explains: "You're starting with genes where you have information, as opposed to a cold start where you have nothing more than an article in the literature."

At Inpharmatica, in London, the approach is superficially similar — but computationally more intensive. CFO Patrick Banks notes the company's prize application and database, PharmaCarta, has 2,500 CPUs correlating 3.6 billion relationships between protein sequences and structures in the scientific literature as well as proprietary data (see "Profiting from the Proteome," Jan. 2003 Bio·IT World). The company is addressing the bottleneck of disparate types of preclinical research data by uniting data on genes, chemical structures, SARs, expressed sequence tags (ESTs), and proteins. "People won't pay anything for just a few novel targets," Banks says. "Why should they pay for five kinases rather than the other 900?"

Inpharmatica is looking for drugs itself, while also licensing its technology to Serono, Genentech, Pfizer, and others. "Pharma companies have filled their boots with novel targets," Banks says. "They've had enough. We can help them prioritize what they're looking on."

The company uses two basic approaches to pare targets down to six percent or seven percent of the original list. First it ascertains whether a target is from a "precedented" family of proteins: that is, one that is known to be affected by drugs already on the market. The second test looks at the binding site within the molecule, assessing whether the protein "lock" can be picked by a pharmaceutical "key."

Explains Banks: "The in silico stuff is a lot less expensive than sitting down in the lab and generating thousands and thousands of molecules. You're not going to waste 15 years for a thrombin drug because your binding site is going to be very difficult to 'drug.'" As an example, Inpharmatica recently looked at a Rosetta data set of 25,000 proteins. "We ran them through PharmaCarta. In about a week, we were able to narrow it down to the 11 most interesting," one of which is being pursued.

Resolving Bottlenecks 
This is the second of three reports in a Bio•IT World series on bottlenecks in life science research. • Next month, Part III of this series will examine bottlenecks in clinical trials. • The first installment, on the discovery phase, can be found on our Web site at
As Banks notes, "There is an enormous amount of data, just enormous, in pharma." For now, that mountain remains immense. But thanks to technology, it is about to be whittled down to a more practical size. The not-so-distant promise of PharmaCarta and other preclinical tools is that they could provide insights that could ripple through to the next (and most lengthy) phase of drug development, helping to shape or even shorten the pivotal studies of new drugs in human beings. The best tools could ultimately optimize not only drugs but also the entire research process.*


For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,