SNPedia: A Wiki for Personal Genomics



By Michael Cariaso
Loading...

Dec. 17, 2007 | November was an historic month for personal genomics. Four companies announced details of new direct-to-consumer genotyping services. Google-backed 23andMe's kit sells for $999, while Iceland's deCODE Genetics launched its deCODEme service for $985. Knome began to seek clients for full genome sequencing, and Navigenics, announced it would launch in early 2008, offering a screening for 20 common diseases for $2500.

A few years ago, having gained familiarity with various microarray platforms, I figured out how to run my own DNA and extract the details. By cataloguing my single nucleotide polymorphisms (SNPs), I knew 500,000 facts about myself, but had no idea about their implications. As my resources were more technical than financial, starting a wiki made more sense than starting a genetic testing company. And so SNPedia (www.snpedia.com), a wikipedia for SNPs, was born. The site currently has information on nearly 2,000 medically relevant SNPs.

"23etAl" (23andMe, Navigenics, deCODEme, Knome, and other private companies) appear to be building high-quality curated walled gardens, whereas SNPedia is more of a public park. They may even use SNPedia, since they can continue to take customers' money to do the testing, but could use SNPedia to simplify some of the annotation and report generation.

Most consumers will be satisfied with results obtained from 23etAl, trusting everything is reliable, and will not have much use for SNPedia. Researchers may use SNPedia to increase the visibility of their work, but scientific journals will still be primary. It's the "recreational genomics" crowd that might be motivated to learn what an odds ratio or Bonferroni correction is. A wiki is a good format for that sort of information. They will (mostly) understand that SNPedia is a home of lower confidence interesting possibilities. And while there will be limitations to SNPedia's content - it is a wiki after all - pages will continually grow to add the missing information. As we like to say in the open-source software world, with enough eyeballs all bugs are shallow. The same holds true for the science.

The NCBI rs#s used to identify SNPs are the key to the whole system. The use of other nomenclatures is still widespread, but improving. I look forward to the day when copy-number-variations and mitochondrial SNPs have been similarly cataloged. Full genome sequencing remains the Holy Grail, but without using BLAST or other tools to reduce a full genome into discrete SNP-like categories, I doubt anyone will be able to make any actionable statements based on a full genome.

SNPedia also reveals which SNPs are present on the commercially available chips from Affymetrix and Illumina used by  23etAl. This provides an opportunity to compare what information is common to the respective platforms, and what SNP probes are unique. Because many of the current SNPs in SNPedia pertain to rare disorders characterized in OMIM (Online Mendelian Inheritance in Man), the wiki may also help suggest which SNPs should be included on the next generation of microarrays.

In a sense, SNPedia has been waiting for the day when enough people actually know their genotypes. 23etAl will bring that day much closer. Given the legal and ethical issues involved with sharing genetic information, I'm happy to let the 800-pound gorillas fight those battles. Few people currently know their genotypes, so our authorship is small. However, the author of a recent New York Times article on 23etAl said she got her rs#s as part of the 23andMe report, then found additional information via SNPedia. Hopefully more consumers will do just that.

A Stroll Through SNPedia.com
Use the search box to find the "Rs1799990" page. Clicking the history tab shows that this page was annotated entirely by the SNPediaBot (the wiki's meticulous and very capable librarian). The edit tab reveals:
{{ rsnum
| rsid = 1799990
| Gene = PRNP
| Chromosome = 20
| position = 4628251
| geno1 = (A;A)
| geno2 = (A;G)
| geno3 = (G;G)
}}
{{ omim
| id = 176640
| variant = 0005
| desc    = PRION DISEASE, SUSCEPTIBILITY TO
| rsnum   = 1799990
}}
{{ neighbor
| rsid = 16990018
| distance = 127
}}
{{on chip | Illumina Human 1}}
{{on chip | Illumina Human 1M}}

The SNPediaBot pulled down data from NCBI including the SNPs gene, chromosome, and position. It recognized the rs# in OMIM and recorded its existence and its link to OMIM. The bot identified that 127 nucleotides away is another SNP (for which additional information is provided), and that this SNP is found on two Illumina microarrays.

Technically SNPedia can be called a Semantic Web, which means authors can write programs that read, write, and understand the wiki. One of the goals of SNPedia is to create an ecosystem where people are encouraged to contribute. For example, if a researcher who has identified a SNP that varies across patient populations creates a page such as:

Title: rs12345
Body: The G allele is more common in prostate cancer patients

The bot will reward his or her efforts by connecting this SNP to its neighbors and identifying its presence on any known microarrays. Perhaps a neighboring SNP is on a microarray and can be used as a surrogate for easier testing. This sort of information hasn't existed before in any accessible way.

The Categories page under Special pages (left hand toolbox) automatically reveals the latest statistics on the site, such as the total number of SNPs and the number of SNPs located on various commercial microarrays.

In some cases, SNPs exist without an entry for the corresponding gene. For example, on the page for Rs28933101, notice that the gene MET is in red - the page about MET has not been created yet. Click on MET and you find a blank edit box. But even on a blank page, there is information. The What-Links-Here page (left hand toolbox) produces a list of six SNPs, and the entry for Autism. Even non-existent pages can be useful.

At the other end of the SNP spectrum is Rs1815739, a manually prepared entry that illustrates what most people hope to find at the site.

In addition to the wiki, there is also a chat room accessible from a tab on any SNP page. This allows people interested in a particular SNP or topic to talk in real time. For SNPs with a more academic interest, researchers across the globe may have a way to conduct a continuous virtual conference (akin to what some folks seem to be trying to do with Second Life). For SNPs of greater interest to the general public, the chat room may offer something between a genetic counselor and a peer support group.

Michael Cariaso is the senior scientific consultant for the BioTeam. He can be reached at cariaso@bioteam.net.

----------------------
Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1

White Papers & Special Reports

Quantum
StorNext 4.0: Technical Product Brief
Sponsored by Quantum

 
Proven in the world’s most data intensive industries, Quantum StorNext is a scalable, high-performance file system which allows data sharing across Linux, Mac, Unix, and Windows operating systems and manages data in enterprise storage environments. In this Technical Brief you'll learn:

  • How a high-performing file system can accelerate your business
  • How to simplify your data management
  • How a tiered storage approach can save you money


SURETY-IP_WPx108
Protect Your Scientific Intellectual Property: Proof of Lab Informatics Data Authenticity is Your Best Legal Defense
Sponsored by Surety, LLC

As a bio-technology or life sciences organization, your formulas, treatments and research and discoveries are the “lifeblood” of your business. But if you aren't protecting the integrity of your scientific data in your lab informatics systems, you risk losing IP ownership, revenue and consequently your business if you can't prove time-of-creation and data authenticity. Learn how you can implement simple, cost-effective and automated controls to protect your scientific intellectual property. Consider:

  • IP protection requirements in bio-pharma and other science-oriented industries can extend out 20, 30, 40 or more years
  • Most electronic lab management solutions include generic authenticity controls, so how "legally defensible" is yours?
  • Only standards-compliant, independent controls can future-proof your approach to long-term IP integrity protection and authenticity.
  • Learn more - get the free whitepaper now


BlueArc_WP_DataMigration.jpg
The Key to Life Sciences Data Management: Transparent Migration
Sponsored by BlueArc

Life sciences organizations face new data management challenges as the volume of research data grows and more data is kept online for longer times. Read this paper to learn about:

  • The benefits of transparent data migration (TDM)
  • How TDM technologies can simplify data management.
  • How using TDM can help increase storage utilization, improve computational workflow performance, and optimize the use of storage resources.


Life Science Webcasts & Podcasts

adobe_i3_btn_webinarNext-Generation Clinical Trial and Data Management Applications
Sponsored by Adobe

This webinar introduces i3Cube - a web-based, fully integrated, clinical trial and data management system built on Adobe’s LiveCycle® Enterprise Suite.  I3 cube provides end-to-end automation that delivers unprecedented visibility into information that sponsors need to accelerate the study process and complete trials efficiently. Viewers will learn more about:

  • Creating faster and more efficient trial processes
  • Reducing investigator burden 
  • Real-time sponsor transparency into study information
  • Enterprise solutions based on Adobe LiveCycle® ES utilizing cross-platform clients of Reader, Flash and AIR

    Download now.



More Podcasts

Job Openings

Employers -- Don't miss this opportunity to reach well-qualified life science candidates.

Loading...

For reprints and/or copyright permission, please contact The YGS Group, 3650 West Market Street, York, PA;

(717) 505-9701 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.