SNPedia: A Wiki for Personal Genomics



By Michael Cariaso

Dec. 17, 2007 | November was an historic month for personal genomics. Four companies announced details of new direct-to-consumer genotyping services. Google-backed 23andMe's kit sells for $999, while Iceland's deCODE Genetics launched its deCODEme service for $985. Knome began to seek clients for full genome sequencing, and Navigenics, announced it would launch in early 2008, offering a screening for 20 common diseases for $2500.

A few years ago, having gained familiarity with various microarray platforms, I figured out how to run my own DNA and extract the details. By cataloguing my single nucleotide polymorphisms (SNPs), I knew 500,000 facts about myself, but had no idea about their implications. As my resources were more technical than financial, starting a wiki made more sense than starting a genetic testing company. And so SNPedia (www.snpedia.com), a wikipedia for SNPs, was born. The site currently has information on nearly 2,000 medically relevant SNPs.

"23etAl" (23andMe, Navigenics, deCODEme, Knome, and other private companies) appear to be building high-quality curated walled gardens, whereas SNPedia is more of a public park. They may even use SNPedia, since they can continue to take customers' money to do the testing, but could use SNPedia to simplify some of the annotation and report generation.

Most consumers will be satisfied with results obtained from 23etAl, trusting everything is reliable, and will not have much use for SNPedia. Researchers may use SNPedia to increase the visibility of their work, but scientific journals will still be primary. It's the "recreational genomics" crowd that might be motivated to learn what an odds ratio or Bonferroni correction is. A wiki is a good format for that sort of information. They will (mostly) understand that SNPedia is a home of lower confidence interesting possibilities. And while there will be limitations to SNPedia's content - it is a wiki after all - pages will continually grow to add the missing information. As we like to say in the open-source software world, with enough eyeballs all bugs are shallow. The same holds true for the science.

The NCBI rs#s used to identify SNPs are the key to the whole system. The use of other nomenclatures is still widespread, but improving. I look forward to the day when copy-number-variations and mitochondrial SNPs have been similarly cataloged. Full genome sequencing remains the Holy Grail, but without using BLAST or other tools to reduce a full genome into discrete SNP-like categories, I doubt anyone will be able to make any actionable statements based on a full genome.

SNPedia also reveals which SNPs are present on the commercially available chips from Affymetrix and Illumina used by  23etAl. This provides an opportunity to compare what information is common to the respective platforms, and what SNP probes are unique. Because many of the current SNPs in SNPedia pertain to rare disorders characterized in OMIM (Online Mendelian Inheritance in Man), the wiki may also help suggest which SNPs should be included on the next generation of microarrays.

In a sense, SNPedia has been waiting for the day when enough people actually know their genotypes. 23etAl will bring that day much closer. Given the legal and ethical issues involved with sharing genetic information, I'm happy to let the 800-pound gorillas fight those battles. Few people currently know their genotypes, so our authorship is small. However, the author of a recent New York Times article on 23etAl said she got her rs#s as part of the 23andMe report, then found additional information via SNPedia. Hopefully more consumers will do just that.

A Stroll Through SNPedia.com
Use the search box to find the "Rs1799990" page. Clicking the history tab shows that this page was annotated entirely by the SNPediaBot (the wiki's meticulous and very capable librarian). The edit tab reveals:
{{ rsnum
| rsid = 1799990
| Gene = PRNP
| Chromosome = 20
| position = 4628251
| geno1 = (A;A)
| geno2 = (A;G)
| geno3 = (G;G)
}}
{{ omim
| id = 176640
| variant = 0005
| desc    = PRION DISEASE, SUSCEPTIBILITY TO
| rsnum   = 1799990
}}
{{ neighbor
| rsid = 16990018
| distance = 127
}}
{{on chip | Illumina Human 1}}
{{on chip | Illumina Human 1M}}

The SNPediaBot pulled down data from NCBI including the SNPs gene, chromosome, and position. It recognized the rs# in OMIM and recorded its existence and its link to OMIM. The bot identified that 127 nucleotides away is another SNP (for which additional information is provided), and that this SNP is found on two Illumina microarrays.

Technically SNPedia can be called a Semantic Web, which means authors can write programs that read, write, and understand the wiki. One of the goals of SNPedia is to create an ecosystem where people are encouraged to contribute. For example, if a researcher who has identified a SNP that varies across patient populations creates a page such as:

Title: rs12345
Body: The G allele is more common in prostate cancer patients

The bot will reward his or her efforts by connecting this SNP to its neighbors and identifying its presence on any known microarrays. Perhaps a neighboring SNP is on a microarray and can be used as a surrogate for easier testing. This sort of information hasn't existed before in any accessible way.

The Categories page under Special pages (left hand toolbox) automatically reveals the latest statistics on the site, such as the total number of SNPs and the number of SNPs located on various commercial microarrays.

In some cases, SNPs exist without an entry for the corresponding gene. For example, on the page for Rs28933101, notice that the gene MET is in red - the page about MET has not been created yet. Click on MET and you find a blank edit box. But even on a blank page, there is information. The What-Links-Here page (left hand toolbox) produces a list of six SNPs, and the entry for Autism. Even non-existent pages can be useful.

At the other end of the SNP spectrum is Rs1815739, a manually prepared entry that illustrates what most people hope to find at the site.

In addition to the wiki, there is also a chat room accessible from a tab on any SNP page. This allows people interested in a particular SNP or topic to talk in real time. For SNPs with a more academic interest, researchers across the globe may have a way to conduct a continuous virtual conference (akin to what some folks seem to be trying to do with Second Life). For SNPs of greater interest to the general public, the chat room may offer something between a genetic counselor and a peer support group.

Michael Cariaso is the senior scientific consultant for the BioTeam. He can be reached at cariaso@bioteam.net.

----------------------
Subscribe to Bio-IT World  magazine.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .