By Kevin Davies
November 16, 2010 | First Base | There was big news in the world of genome research in October, and no, we’re not talking about the Ozzy Osbourne genome, first details of which were reported in a British Sunday newspaper.
It turns out that researchers are also pretty intoxicated about the first published sets of data from the international 1000 Genomes Project, which has far-reaching implications for cataloguing and understanding the range of genomic variations associated with human disease, behavior, and evolution. The results of the pilot phase of the 1000 Genomes Project were reported in Nature and Science.
Researchers now have the ability to undertake a survey of genome variation that could barely have been imagined when President Clinton declared victory in the Human Genome Project a decade ago. The 1000 Genomes pilot consisted of low-coverage sequencing of 179 individuals, deep sequencing of two families (trios) and exome sequencing in nearly 700 individuals. One of the major conclusions from the pilot is that half of the single nucleotide polymorphisms (SNPs) catalogued—the total inventory currently exceeds 15 million—have never been seen before. Moreover, 1 million short insertions and deletions (indels) and 20,000 structural or copy number variants (CNVs) were also described. Remarkably, for any individual person, more than 95% of the 3 million SNPs in their genome will already sit in this catalogue. Contrast that to just 5% ten years ago, or about 40% five years ago. Indeed, the Broad Institute’s David Altshuler suggests that when the project is complete, fully 98-99% SNPs uncovered in any newly sequenced personal genome will have been observed previously.
Another eye-catching result is that each individual carries deleterious or loss-of-function mutations in 250-300 genes. Thankfully, we usually have a spare functional copy as back-up. The family studies suggest a de novo germline mutation rate of 10-8/base pair/generation, essentially confirming the conclusions of the Complete Genomics Miller Syndrome study earlier this year, and calculations made by J.B.S. Haldane in the 1930s.
Writing in Science, Evan Eichler and colleagues have catalogued larger swathes of genomic variation, the CNVs. Eichler’s team remapped data using homegrown computational methods looking at NGS read depth and unique sequence tags to distinguish copies. “We think the veil has been lifted on a whole new level of genetic diversity,” says Eichler. Among the findings, most variable genes map to duplicated regions of the genome like accordions of the genome. There is new insight into the extent of the CNV between different human populations. And comparison with the great apes “identify gene families that have expanded in the human lineage since we separated from chimpanzees,” says Eichler.
The success of the 1000 Genomes Project increasingly puts the spotlight on the “rare” variants—the 1-2% SNPs that are unique to that individual, that might have arisen spontaneously or very recently in their family history. “Those will never be contained in a catalogue,” says Altshuler, but together with environment, behavior and pure chance, hold the key to understanding human diseases. The 1000 Genomes Project provides a tremendous “foundational tool” for future study. We can all drink to that.
As announced elsewhere in this issue, we have issued our Call for Entries for the Bio•IT World 2011 Best Practices Awards. On the evening of April 13, 2011, during the Bio-IT World Conference & Expo, we will announce and hand out some handsome hardware to select organizations for what a blue-ribbon jury considers to be the most outstanding and innovative examples of technology deployment and sharing across research, development and clinical trials.
This year’s awards attracted some 75 entries, and we hope to maintain the trend of exceeding that tally next year. With more and more signs of pre-competitive approaches permeating across the industry (see, for example, pages 30 and 14) there is ever more reason to believe that the solutions we look forward to showcasing next year will not be private solutions but ideas and technologies that others can learn from, and possibly even use.
Best Practices is about recognizing partnership and real-world uses of technology, not just the virtues of a new piece of hardware or software. We encourage vendors to nominate groups and organizations that they have successfully collaborated with. Full details on the Awards criteria, categories, and very straightforward entry process are here: www.bio-itworld.com/bestpractices. Good luck!
This article also appeared in the November-December 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.