June 14, 2006 | The International HapMap Project has completed Phase II of its data acquisition and will have a draft paper for publication before the end of the year. The result will be a publicly available trove of information on human genetic variation across populations that will reveal insights into the genetic basis of human disease and individual response to pharmaceuticals.
The official release of the Phase II HapMap data, following full quality control, will occur shortly, with the goal of a draft manuscript publication by the time of the 2006 American Society of Human Genetics meeting in October.
Kelly Frazer, senior vice president of Perlegen Sciences, kicked off the group’s third — and probably last — annual meeting, at the Broad Institute, with an update on Phase II of the HapMap Project. The project is typing single nucleotide polymorphisms (SNPs) from 270 anonymous donors, representing Nigeria (Yoruban), Europe, Han Chinese, and Japanese.
The HapMap Project was launched in 2002 by an international alliance of researchers from the United Kingdom, United States, Canada, China, Japan, and Nigeria. Nine centers generated the Phase I data, including the Broad Institute, Baylor College of Medicine, UCSF, Illumina, and Perlegen in the United States. Those results were published last year in Nature, and showed that the majority of the 1.3 million SNPs assayed were spaced within 5 kilobases (kb) of each other.
“The Phase I data did indeed accomplish its goal of a common SNP every 5 kilobases of the genome in all three analysis panels,” Frazer said. Only 3 percent of inter-SNP distances were greater than 10 kb.
For Phase II of the project, the data were generated exclusively at Perlegen. Frazer explained that all human chromosomes were studied using a single technology, which involved preparing 330,000 long-range PCR primer pairs to amplify the 270 HapMap volunteer DNA samples. Those reagents were then hybridized to Perlegen high-density oligonucleotide arrays.
In all, Perlegen genoptyped 4.6 million SNPs, including every variant in the latest version (Build 122) of the dbSNP database, except those SNPs assayed in Phase I and those in highly repetitive portions of the human genome.
Frazer said that out of 4.6 million attempts, 2.7 million assays were successful. She attributed the lower success rate chiefly to the fact that the Phase I assays employed the best SNPs. Preliminary analysis of the Phase II data reveals that for 60 percent the SNPs, the inter-SNP distance is less than 1 kb. For the combined Phase I and II data, greater than 97 percent of SNPs are spaced less than 5 kb apart, with only 0.6 percent separated by more than 10 kb. All of the data are publicly available at www.hapmap.org.