By Michael Goldman
May 13, 2003 | SANTA CLARA, Calif. -- Nearly 2,000 scientists convened here for Cambridge Healthtech Institute’s Molecular Medicine Marketplace conference*. The meeting, also attended by entrepreneurs and financiers, has evolved over the years from a relatively small gathering on the Human Genome Project in one of San Francisco’s venerable hotels to a slick four-track event in a stark Silicon Valley convention center.
Sydney Brenner, of the Salk Institute for Biological Studies in La Jolla, Calif., once quipped that when we know the complete cell lineage of the nematode Caenorhabditis elegans, we will (merely) know the complete cell lineage of C. elegans. So with the sequence of the human genome essentially complete, it wasn’t surprising to hear Brenner say, “Forget the genome!”
According to Brenner, we have the white pages of the telephone book but now must reconstitute the daily lives of everyone in the city. “The genome is an inventory of function, but not function,” he said. How do we translate data into knowledge?
Brenner, who shared last year’s Nobel Prize in physiology or medicine (see First Base, Dec. 2002 Bio-IT World, page 6), tried to simplify the monstrous mound of data. The 20,000 or so genes expressed in a single cell are too complex a problem to solve, and examining orders of magnitude more protein-protein interactions does not help.
Instead, Brenner advocates studying functional assemblages of proteins, such as the spliceosome, the molecular machine that splices RNA messages. He says that only about 2,000 such machines exist in the cell, a more manageable number.
Further confusion vanishes if we think in terms of topographical regions of the cell, such as organelles, the plasma membrane, and the nucleus. Brenner says about 10 such regions are easier to grasp, although neurons may have greater complexity. He sees cells as “computing engines” that are “full of gadgets,” and discovering how the genome maps to cellular function as an “instantiation problem.” The term instantiation is rooted in philosophy, but what it means for molecular biology is a rather simple view of a complex situation. Brenner’s CellMap project seeks to describe the gadgets that comprise a cell and to compute what happens to them.
Brenner views genes as occurring in different states or instantiations. Examples range from whether a gene is turned on or off in different cell types to a host of alternative splice variants. While some differences in gene expression depend on environmental conditions, “noncontingent instantiations” define different fundamental cell types in a way more refined than can be achieved in traditional biology textbooks.
Brenner estimates there are approximately 100,000 to 200,000 instantiations of the roughly 35,000 human genes. “Instantiomics” will produce much excitement in the years to come. However, with characteristic humor, he hesitates to define his work as a systems biology approach, “mainly because I don’t know what it is.”
Rules of Engagement
Today’s drug armamentarium is composed mostly of small molecules that can be taken orally. These compounds have a surprisingly small number of key targets -- about 120 for all marketed drugs -- of which only 43 account for the 100 top-selling drugs of 2001. The genome sequence should reveal how many new potential targets are available and which of them are the most “druggable” and useful.
Pfizer’s Chris Lipinski developed the classic “rule of five” (Ro5) for drug molecules (Adv Drug Deliv Rev 23, 3; 1997). The rule aims to exclude drugs that are expected to have limited oral bioavailability -- for example, compounds that have more than five hydrogen-bond donors, a molecular weight over 500, or a lipophilicity over 5.
If small-molecule drugs adhere to principles such as the Ro5, then one can predict which properties make a “beautiful” target, contended Andrew Hopkins of Pfizer’s Sandwich Laboratories in England. Hopkins estimates that there are about 3,000 small-molecule druggable targets and about 3,000 disease-relevant targets. However, large-scale mouse gene knockout experiments, performed by companies including Lexicon Genetics and Deltagen, point to only a 10 percent overlap between the two groups -- about 300 disease-relevant druggable targets. Nevertheless, Hopkins noted, this is still a significant increase over the current number. And as Lipinski pointed out, future opportunities may lie more in the depth and quality of drug targets rather than in sheer numbers.
The number of druggable targets should grow beyond the 300 identified so far. Hopkins assumes that the discovery of new regulatory sites on protein surfaces, and new gene families, could in time double the number of disease-relevant druggable targets. “However, it would be far more costly to discover these new targets outside of the identified druggable genome,” he said. A further doubling, to about 1,200 targets, might be achieved when wider phenotypic screening reveals larger numbers of disease-modifying genes.
“Based on the current data and our understanding of drug binding and genetic redundancy,” Hopkins said, “the number of clinically effective oral drug targets in the genome may be in the order of a few hundred rather than the tens of thousands predicted only a few years ago.”
Novel genes, such as the increasing number of noncoding RNAs, and novel drug approaches, such as quadruplex nucleic acids and RNAi, could change the picture radically.
Haplotype Mapping on a Wafer
Fully exploiting the genome sequence will require correlating DNA variations among individuals with disease traits. But the idea of cataloging genetic variation by sequencing the genomes of 30 to 50 more people is impractical. David Cox, chief scientific officer at Perlegen Sciences, said he intends to change that. Perlegen uses an Affymetrix-based microarray technology to study variations in genomes of individuals from varying ethnic backgrounds.
To simplify analysis, Perlegen relies on the fact that genes are generally inherited in segments as long as 50,000 bases. This means that dozens of genes and gene markers (SNPs) are inherited together in blocks, greatly reducing the number of SNPs that must be typed for analysis. These blocks are relatively constant across entire populations, allowing the “pooling” of case versus control samples to test hundreds of individuals on wafers, rather than single chips, representing about 60 million probes simultaneously.
The potential of a haplotype map in studies of complex diseases and drug response has not escaped notice. The cost of such a screen has been reduced from $80 million to $2.5 million -- well within the capability of a healthy NIH grant or the budgets of some of Perlegen’s clients, including Bristol-Myers Squibb, Pfizer, Eli Lilly, and Unilever.
*Molecular Medicine Marketplace: Cambridge Healthtech Institute, Santa Clara, Calif., March 17-21, 2003.