Broad chief summarizes the last ten years of the HGP.
By Kevin Davies
February 1, 2011 | WASHINGTON, DC—If anyone was capable of distilling the lessons learned in the ten years since the first draft of the Human Genome Project (HGP) in 2000, it was Broad Institute director Eric Lander. Opening the annual American Society of Human Genetics (ASHG) convention in Washington, D.C., in early November, Lander made a series of startling comparisons of geneticists’ knowledge around the time of the HGP in 2000 and today.
In 2000, for example, only four eukaryotic genomes (yeast, fly, worm, and Arabidopsis) had been sequenced, as well as a few dozen bacteria. Today, those numbers stand at 250 eukaryotic genomes, 4,000 bacteria and viruses, metagenomic projects and many hundreds of human genomes. In 2000, Lander and his HGP consortium colleagues estimated there were about 35,000 protein-coding genes, with a few classical non-coding RNAs. Repetitive DNA elements called transposons were just parasites and junk. “Today, we know all that was completely wrong.”
Studying patterns of evolutionary conservation in some 40 sequenced vertebrates, the human gene count is “21,000, give or take 1,000,” said Lander. “There are many fewer genes than we thought. Much more information is non-coding... 75% of the information that evolution cares about is non-coding information.”
The study of 29 mammalian genomes shows some 3 million conserved non-coding elements in the genome, covering about 4.7% of the genome. Some of these have regulatory functions, he said. Another exciting area was the generation of genome-wide 3-D maps, which has revealed that the genome resides in ‘open’ and ‘closed’ compartments.
In 2000, the genes for about 1,300 Mendelian genetic disorders had been identified. Today, that number is about 2,900, leaving “another 1,800 Mendelian disorders to go,” said Lander. He noted the success of some whole-genome sequencing projects in identifying rare Mendelian disease genes, although the approach was not trivial. “We all have about 150 rare coding variants,” he said, in other words glitches in about 1% of a person’s genes.
Genome Wide Disagreements
Lander also broached the progress in genome-wide association studies (GWAS) for common inherited disease, where Lander says “an entire village came together” to develop the array tools, haplotype maps, and a catalogue of more than 20 million single nucleotide polymorphisms (SNPs). “The vast majority of common variation is known,” said Lander. The numbers are 1,100 loci associated with 165 common diseases/traits. For diseases such as inflammatory bowel disease and Crohn’s disease, 70-100 loci have been mapped, a pattern that Lander showed exists for lipid disorders, type 2 diabetes, height, and many other conditions.
Lander addressed the oft-publicized disappointment expressed by some prominent geneticists, including ASHG president-elect Mary-Claire King, in the “missing heritability” and the net value extracted from GWAS papers. One widely-voiced concern is that the effect size of individual GWAS “hits” is small. “I think that’s nonsense,” said Lander. “Effect size has nothing to do with biological or medical utility.”
The “missing heritability” could be explained by rare DNA variants. Not so fast, said Lander. For one thing, the proportion of heritability explained in disorders such as Crohn’s and diabetes is increasing. Population genetics theory suggests that for many common diseases, rare variants will explain less than common variants.
Lander also said that geneticists must take into account epistasis, the effects of modifier genes. Such effects cannot be found statistically in GWAS, he argued. Rather than moving from mapped loci to explaining heritability to understanding biology, Lander said we must understand biology first, and then explain the models of heritability.
In 2000, Lander said some 80 cancer-related genes were known. The tally is now 240 genes, with genome sequencing studies revealing mutational hotspots in colon, lung, and skin cancers with therapeutic implications. As an example, Lander said his Broad Institute colleague Todd Golub, studying multiple myeloma tumors, had discovered mutations in four well known cancer genes, but more excitingly, implicated a handful of new biological pathways, including protein synthesis and an extrinsic coagulation pathway.
Lander concluded by presenting what he called “the path to the promise.” If the HGP provided the raw tools, scientists were still translating basic genome discoveries into more medically directed research. That’s how far we’ve progressed in ten years. But that still leaves the daunting tasks of clinical interventions, clinical testing, regulatory approval and widespread adoption. •
This article also appeared in the January-February 2011 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.