[This is a sidebar to "The State of Mutation Curation". Click here to return to main story.]
By Kevin Davies
March 29, 2011 | During his career as a medical geneticist in Belgium, first in academia and currently at the genetic diagnostics company GENDIA, Patrick Willems grew increasing frustrated at the lack of comprehensive, publicly available, curated data on gene variants in genetic disease. Hence the creation of MutaDATABASE (www.mutadatabase.org), a publicly available, open access, free online database that provides information on human disease genes, variants, as well as clinical information on patients.
“We’re facing a big problem that everyone is throwing away data,” says Willems. Appropriately, the logo for the database is a figure discarding a DNA molecule in a trash can. Willems admires the work of David Cooper and the Human Gene Mutation Database (HGMD), and credits Cooper with starting to catalogue gene mutations when no-one was doing (or funding) it, but he says there are some deficiencies. “Only 100,000 variants in HGMD are classified. And if you extrapolate from the BRCA genes, we should have more than a million variants by now! 100,000 is nothing. None of the big [diagnostics] labs in the U.S. contribute variants [so far].”
There is nothing terribly original in MutaDATABASE, says Willems, no rocket science. But he can’t fathom why something like this wasn’t done before. “There is no way we can continue with manual variant assessment and variant submission,” he says. “This has to be done automatically. We have developed software so labs can do variant assessment, run in silico analysis, with a real-time link to the database.”
The groups that will soon start submitting variant data—mostly diagnostic labs and clinicians—will be able to see if the variant is already in the database, view frequencies, references, and submit the variant automatically. Variants will be grouped into one of five categories, from benign to pathogenic. “We think this sort of interaction between labs and a central database is basically the only way to facilitate the assessment and publication of variants,” says Willems.
“More importantly,” says Willems, “we want to add clinical information. The main objective is to get an in-depth phenotype-genotype correlation for each variant. The way we do that is quite nice: the MutaREPORTER software has integrated a nice piece of software called PhenExplorer, developed by Peter Robinson in Berlin. It’s an ontology of different symptoms... The software provides a thick box with the main features [for a given disorder] so a clinician can check off each symptom. It’s a real-time genotype-phenotype correlation.”
Willems is working with a team of genetics collaborators, including Heidi Rehm (see “The $1,000,000 Genome Interpretation,” Bio•IT World Sept 2010), Sherri Bale, Johan Den Dunnen, Bob Nussbaum, and a team of bioinformaticians headed by Frederik Decouttere and Martijn Devisscher. Willems says initial conversations with some leading diagnostics companies look promising, while GeneDx, Harvard Partners, and Emory are already uploading data. MutaDATABASE is also collaborating with Dick Cotton’s Variome Project. For now, funding is coming from GENDIA and contributing labs.
Willems isn’t bothered which variant database outlives the others. “We just want to make things public—if they end up in other databases, that’s fine with me,” he says. “We don’t have a copyright, we have a copy obligation. Everything is free except the software. All the data are free. The database belongs to the people submitting the data. There is a non-profit MutaDATABASE foundation, and everyone submitting data is invited to join the foundation.” Of one thing, Willems is certain: “All the databases that have no software to automatically upload variants will disappear, because [in the future] nobody will manually put in variants.”
The MutaDATABASE currently contains basic information on close to 20,000 genes, with links to other repositories. What is nice, he says, is a menu item for each gene with general information—genomic structure, physical location, references, tables and figures. The crucial test, of course, is the introduction of the variant data, which is just beginning as labs receive the first passwords and test the system.
“We want to find a curator for every gene over the next six months or so who confirms findings submitted by different labs and clinicians, which is a painstaking job.
“It’s fun for everybody!” says Willems. “It’s like Google Earth—a combination of a genome browser with database information.”