YouTube Facebook LinkedIn Google+ Twitter Xingrss  

SNPedia: A Wiki for Personal Genomics

By Michael Cariaso

Dec. 17, 2007 | November was an historic month for personal genomics. Four companies announced details of new direct-to-consumer genotyping services. Google-backed 23andMe's kit sells for $999, while Iceland's deCODE Genetics launched its deCODEme service for $985. Knome began to seek clients for full genome sequencing, and Navigenics, announced it would launch in early 2008, offering a screening for 20 common diseases for $2500.

A few years ago, having gained familiarity with various microarray platforms, I figured out how to run my own DNA and extract the details. By cataloguing my single nucleotide polymorphisms (SNPs), I knew 500,000 facts about myself, but had no idea about their implications. As my resources were more technical than financial, starting a wiki made more sense than starting a genetic testing company. And so SNPedia (, a wikipedia for SNPs, was born. The site currently has information on nearly 2,000 medically relevant SNPs.

"23etAl" (23andMe, Navigenics, deCODEme, Knome, and other private companies) appear to be building high-quality curated walled gardens, whereas SNPedia is more of a public park. They may even use SNPedia, since they can continue to take customers' money to do the testing, but could use SNPedia to simplify some of the annotation and report generation.

Most consumers will be satisfied with results obtained from 23etAl, trusting everything is reliable, and will not have much use for SNPedia. Researchers may use SNPedia to increase the visibility of their work, but scientific journals will still be primary. It's the "recreational genomics" crowd that might be motivated to learn what an odds ratio or Bonferroni correction is. A wiki is a good format for that sort of information. They will (mostly) understand that SNPedia is a home of lower confidence interesting possibilities. And while there will be limitations to SNPedia's content - it is a wiki after all - pages will continually grow to add the missing information. As we like to say in the open-source software world, with enough eyeballs all bugs are shallow. The same holds true for the science.

The NCBI rs#s used to identify SNPs are the key to the whole system. The use of other nomenclatures is still widespread, but improving. I look forward to the day when copy-number-variations and mitochondrial SNPs have been similarly cataloged. Full genome sequencing remains the Holy Grail, but without using BLAST or other tools to reduce a full genome into discrete SNP-like categories, I doubt anyone will be able to make any actionable statements based on a full genome.

SNPedia also reveals which SNPs are present on the commercially available chips from Affymetrix and Illumina used by  23etAl. This provides an opportunity to compare what information is common to the respective platforms, and what SNP probes are unique. Because many of the current SNPs in SNPedia pertain to rare disorders characterized in OMIM (Online Mendelian Inheritance in Man), the wiki may also help suggest which SNPs should be included on the next generation of microarrays.

In a sense, SNPedia has been waiting for the day when enough people actually know their genotypes. 23etAl will bring that day much closer. Given the legal and ethical issues involved with sharing genetic information, I'm happy to let the 800-pound gorillas fight those battles. Few people currently know their genotypes, so our authorship is small. However, the author of a recent New York Times article on 23etAl said she got her rs#s as part of the 23andMe report, then found additional information via SNPedia. Hopefully more consumers will do just that.

A Stroll Through
Use the search box to find the "Rs1799990" page. Clicking the history tab shows that this page was annotated entirely by the SNPediaBot (the wiki's meticulous and very capable librarian). The edit tab reveals:
{{ rsnum
| rsid = 1799990
| Gene = PRNP
| Chromosome = 20
| position = 4628251
| geno1 = (A;A)
| geno2 = (A;G)
| geno3 = (G;G)
{{ omim
| id = 176640
| variant = 0005
| rsnum   = 1799990
{{ neighbor
| rsid = 16990018
| distance = 127
{{on chip | Illumina Human 1}}
{{on chip | Illumina Human 1M}}

The SNPediaBot pulled down data from NCBI including the SNPs gene, chromosome, and position. It recognized the rs# in OMIM and recorded its existence and its link to OMIM. The bot identified that 127 nucleotides away is another SNP (for which additional information is provided), and that this SNP is found on two Illumina microarrays.

Technically SNPedia can be called a Semantic Web, which means authors can write programs that read, write, and understand the wiki. One of the goals of SNPedia is to create an ecosystem where people are encouraged to contribute. For example, if a researcher who has identified a SNP that varies across patient populations creates a page such as:

Title: rs12345
Body: The G allele is more common in prostate cancer patients

The bot will reward his or her efforts by connecting this SNP to its neighbors and identifying its presence on any known microarrays. Perhaps a neighboring SNP is on a microarray and can be used as a surrogate for easier testing. This sort of information hasn't existed before in any accessible way.

The Categories page under Special pages (left hand toolbox) automatically reveals the latest statistics on the site, such as the total number of SNPs and the number of SNPs located on various commercial microarrays.

In some cases, SNPs exist without an entry for the corresponding gene. For example, on the page for Rs28933101, notice that the gene MET is in red - the page about MET has not been created yet. Click on MET and you find a blank edit box. But even on a blank page, there is information. The What-Links-Here page (left hand toolbox) produces a list of six SNPs, and the entry for Autism. Even non-existent pages can be useful.

At the other end of the SNP spectrum is Rs1815739, a manually prepared entry that illustrates what most people hope to find at the site.

In addition to the wiki, there is also a chat room accessible from a tab on any SNP page. This allows people interested in a particular SNP or topic to talk in real time. For SNPs with a more academic interest, researchers across the globe may have a way to conduct a continuous virtual conference (akin to what some folks seem to be trying to do with Second Life). For SNPs of greater interest to the general public, the chat room may offer something between a genetic counselor and a peer support group.

Michael Cariaso is the senior scientific consultant for the BioTeam. He can be reached at

Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,