By Malorye A. Branca
June 12, 2002 | More than 1,000 viewers tuned in for the April 30 Webcast to launch Celera Genomics’ new mouse SNP (single nucleotide polymorphism) database, a potentially important new tool for mammalian geneticists and a key addition to the Celera Discovery System (CDS).
The Celera Mouse Reference SNP database includes approximately 3.4 million SNPs covering five different mouse inbred strains. Along with the new mouse SNP database, Celera also released a new mouse genome sequence assembly. Both offerings will be incorporated into the CDS, which will be marketed henceforth by Celera’s sister company, Applied Biosystems (see Applera Charts New Course, page 27).
“Mouse strains can differ significantly in their susceptibility to various diseases,” says Lothar Krinke, vice president and chief business officer of Celera Online. “These strains are excellent models for figuring out what causes that difference.” Some of the diseases in which these SNPs are likely most important include complex ones such as cancer, arthritis, asthma, obesity, diabetes, and atherosclerosis, as well as infectious diseases such as malaria, hepatitis, and AIDS.
Mouse SNPs have many other uses, including serving as markers for gene mapping, drug target identification, and exploring the evolutionary relationships between various inbred lines. About 40 percent of Celera’s mouse SNPs are found within gene-coding regions, with an average density of 40 SNPs per gene. More than 90 percent of the SNPs have been validated as true polymorphisms that can serve as good markers of variation.
To potential users of the database, a key issue revolves around which mouse lines are represented. “If somebody does this and it coincides with the strains I’m working on, then I’m going to be interested,” says William F. Dietrich, a Howard Hughes Medical Institute investigator at Harvard Medical School.
Celera is not alone in its efforts to map mouse SNPs. According to a spokesman for the National Center for Biotechnology Information (NCBI), its SNP database presently contains fewer than 6,000 unique mouse SNPs, but they come from 13 strains of mice. The Whitehead Institute Center for Genome Research was due to begin delivering another 50,000 or so SNPs from three mouse strains last month. All but one of the strains mapped by Celera will be covered either by NCBI’s current offerings or the Whitehead project, albeit in far less detail.
“The NIH [National Institute of Health] is talking about doing tens of thousands of SNPs, and Celera already has millions,” says Joseph Nadeau, professor of genetics at the Case Western Reserve University School of Medicine. “If the price of the Celera data was right, maybe that would be the best way to do this, and NIH could go on and finish something else.”
Having dense SNP maps is necessary for human pharmacogenomic studies, where large numbers of markers are needed to correlate with disease states. But mouse geneticists can determine gene-to-disease links by crossbreeding the mice. “For the mouse, dense [SNP] maps are not a necessity,” Dietrich says. “But some interesting stuff will fall out from having such dense maps.”
Nadeau agrees. “When you get down to fine mapping of precise locations, dense SNPs are very useful,” he says. “We spend an inordinate amount of time generating these.”
There are enormous advantages to having the two genomes side by side. “You can bounce back and forth between them and make comparisons,” Dietrich says. “And the holes in annotation will often be in different segments for each organism. I think it would be a nice thing to have.”
But Dietrich says the cost has kept him from using the Celera database, a barrier cited by Nadeau as well. “As much as we could use this, it would put us over the top,” Nadeau says of his budget limitations.
Now, the challenge for Celera and Applied Biosystems is to see just how many top researchers they can lure to their online service.