The One Percent Difference


By Kevin Davies

Dec. 2006/Jan. 2007 | It didn’t help that the story broke on Thanksgiving Day, or that the international team of researchers featured only a few Americans. By contrast, the news was splashed over the front page of at least one British broadsheet. And appropriately so, given that the research reveals a shocking new layer of human genome variation with profound implications for the future of genomic analysis and personalized medicine.

The report in question, published last month in Nature, would surely rank atop my list of scientific highlights for 2006. A team from the Wellcome Trust Sanger Institute in Cambridge, U.K., and the Hospital for Sick Children in Toronto, Canada, together with colleagues in Spain, Japan, and the U.S., uncovered a stunning genome-wide sea of variation in segments of DNA larger than 1,000 bases. These so-called copy number variations (CNVs) can be deleted, duplicated, or inverted from person to person. They were catalogued following a detailed analysis of the 270 HapMap DNA samples using two methods — comparative intensity analysis using Affymetrix 500K arrays and comparative genome hybridization using GE Healthcare Codelink arrays.

The results are quite remarkable, especially since we had all assumed that the human genome project was completed three years ago! All told, the researchers identified 1,477 CNVs, which, if laid end-to-end, would encompass 12 percent, or 360 million bases, of the human genome. These CNVs directly involve 2,900 genes, including 15 percent of currently known disease-related genes.

One of the study’s principal authors, Toronto’s Stephen Scherer, says he was so shocked by the sheer quantity of CNVs that his group spent six months double checking the data before sharing with colleagues. He is already searching for CNVs at higher resolution to create a second-generation map and a more complete database (see http://projects.tcag.ca/variation).

30 Million Changes
For the past few years, we’ve heard how unrelated humans differ at a mere 3 million bases, rendering them 99.9 percent identical at the genetic level. Many a genetics lecture includes the classic Annie Liebowitz photograph of Willie Shoemaker and Wilt Chamberlain, illustrating the remarkable range of human phenotypic variation. On some level, it seemed hard to attribute such major differences to just 0.1 percent of our DNA. Now we know — it doesn’t.

In a companion paper in Nature Genetics, Scherer’s team presents a detailed comparison of the only two previously published human genomes — the Celera sequence (largely that of former president J. Craig Venter) and the international consortium reference sequence (a composite). “The idea,” Scherer says, “was to come up with a good understanding of what we’re going to get when we do [personalized sequencing].”

Using MegaBLAST to align the genomes and the new Genome Comparison Algorithm to score the variants, Scherer and coworkers found a total of some 30 million base differences between the two sequences. These include roughly 1.5 million single nucleotide polymorphisms (SNPs), 24 million bases of unmatched sequence, 3.5 million of multi-copy sequence, and 1 million bases in inverted sequence. By this calculation, one could argue that humans are actually only 99 percent identical at the DNA level.

The researchers have found hints that CNVs could be implicated in schizophrenia, atherosclerosis, cataracts, and other diseases. Moreover, CNVs could play a huge role in the field of pharmacogenomics, shedding light on drug response variation. The authors note that, “CNV assessment should now become standard in the design of all studies of the genetic basis of phenotypic variation, including disease susceptibility.” A simple SNP test won’t capture all of the newly appreciated genome variability. With Illumina validating this field with its $600 million takeover of Solexa, and GE Healthcare jumping in, the future for next-generation sequencing technologies has just taken another major leap.

Meanwhile, Scherer is pondering the implications of his findings: “If you have 1 million fewer nucleotides than your buddy, shouldn’t you get a break on your golf handicap?”

Email Kevin Davies.

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

isilon white paper

“Storage for Science – Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments” sponsored by Isilon
Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics. Determining how best to manage those data stores has become a significant challenge for Researchers and IT Pros alike.

This paper is intended to:

  • Provide guidance on the many storage requirements common to Life Science research;
  • Explain the evolution of modern storage architectures;
  • Summarize the major data storage architectures currently in use.

Additionally, it will present the Isilon IQ clustered storage product as a strong and flexible solution to those needs. Download now



definiens briefingon-76Next-Generation Technologies Revolutionizing Oncology and Diagnostics
underwritten by Definiens

This “Briefing On” collection of Bio-IT World features, commentaries and analysis, presents some of the latest thinking on high-throughput technologies that are being applied to the fields of research and drug discovery, with particular emphasis on oncology, diagnostics and imaging technologies. Download now at no charge compliments of the underwriting sponsor, Definiens. Download This Free Paper



metaminer image(1)

MetaMiner™ Cystic Fibrosis Report,  Sponsored by GeneGo
This paper discusses the MetaMiner™ (CF) data analysis platform for a broad range of CF researchers designed to: 1. Easily assemble important biological and chemical experimental data available today in cystic fibrosis research. 2. Visualize key mechanisms leading to the disease through pathway maps and network models 3. Provide the CF community a “one stop shop” tool for uploading and analyzing experimental data in a disease-centered interface.  Download now 



Life Science Webcasts & Podcasts

Storage for Science
Methods for Managing Large and Rapidly Growing Data Stores in Life Science Research Environments

Sponsored by Isilon

Isilon webcast1

Large and rapidly growing stores of file-based and other data are a hallmark of life science research and bioinformatics environments. Determining how best to manage those data stores has become a significant challenge for the Researchers and IT Professionals that support them.

This webcast is intended to: 

  • Provide guidance on the many storage requirements common to Life Science research; 
  • Explain the evolution of modern data storage architectures; 
  • Summarize the major data storage architectures currently in use;
  • Present the Isilon IQ clustered storage product as a strong and flexible solution to those needs.

    Download this webcast

More Podcasts

Job Openings

Isilon Systems ~ Senior Marketing Communications Manager
Isilon Systems is the worldwide leader in clustered storage systems and software for digital content and unstructured data. We seek an experienced marketing communications professional/writer expert in creating and delivering effective and persuasive business communications. The ideal candidate can think at the strategic and conceptual level and act, simultaneously, as a highly-effective and productive individual contributor. The position is based in Seattle, WA. For additional information click here:
 

Lilly Singapore Center for Drug Discovery (LSCDD) - Associate Director of Informatics
Lead and mentor a strong team for the Bioinformatics group at the Integrative Computational Sciences (ICS) department at LSCDD towards the development of novel algorithms, data analysis methods and software tools for drug discovery. Work closely with the Software Engineering group at ICS, and collaborate with the Discovery IT organization in Europe and USA. For additional information, or to apply visit: LSCDD 

For reprints and/or copyright permission, please contact RMS, 1808 Colonial Village Lane, Lancaster, PA;

(717) 399-1900 ext. 125 or via email to bio-itworld@theygsgroup.com.