Steve Brenner is addressing the challenges, both sociological and technical, of shared genomic data.
November 16, 2010 | In 2007, University of California, Berkeley computational biologist Steve Brenner published a provocative commentary in Nature proposing the Genome Commons, an initiative to expedite the creation of tools and resources for personal genome interpretation. Brenner wanted to encourage the genome community to see the wisdom of taking this on. “To my surprise and disappointment, it didn’t really have much uptake,” Brenner told Bio•IT World recently. “It was only after that that I went about seeing how I could muster the resources to take this on myself.”
Brenner has had mixed success in that endeavor until now. Last year, he recruited Reece Hart from Genentech to be the Genome Commons chief scientist at Berkeley, but Hart left after less than 12 months to join an in silico drug design company called Numerate. “Reece’s transition [from industry to academia] didn’t go quite as anyone expected,” said Brenner. “Reece is really terrific and it’s a huge blow to have him leave.”
Capturing external funding sources has also not gone as smoothly as Brenner would have liked, though he said he is confident that an industrial source will be confirmed shortly.
Brenner initially identified five main goals for Genome Commons (see http://genomecommons.org), chief of which, says Brenner is: “Let’s collect our information in one place.” He bemoans the fact that genotype and phenotype information is either siloed, of inconsistent quality, or both. “There are more databases for these [disease-related] genes than there are genes,” he says, referring to resources such as OMIM, GeneTests, PharmGKB, dbSNP, DECPIHER, the NHGRI’s GWAS database, and many more. HGMD has the largest array of annotated mutations, but as Brenner notes on the Genome Commons Website, database creator David Cooper, frustrated by his inability to secure grant funding, partnered with a commercial firm that seeks user license fees. “As we enter the era of personal genomes, there is a profound new impetus for suitable open resources,” says Brenner.
“We want to create a Genome Commons Database, open access and open source,” says Brenner, “it’s a severe sociological problem.” He acknowledges that creating and amalgamating life sciences databases is tough. “You can imagine, they wouldn’t be very happy if someone came along and swooped up all their data and put it somewhere else, and they got no credit for the effort they put in there.”
That is why his efforts for now are more sociological than scientific or technological, working with groups such as NCBI, EBI, and clinical genetics groups among others. “We could build it ourselves, or help someone like NCBI or PharmGKB to build it,” said Brenner.
This December, Genome Commons will convene a promising community experiment called CAGI (The Critical Assessment of Genome Interpretation), to evaluate computational methods for predicting phenotypes based on genome variation data (http://genomeinterpretation.org). The program will be modeled on the successful CASP (Critical Assessment of Structure Prediction) meetings, which had a profound impact on methods for 3D protein structure determinations (see, “On the CASP of a DREAM,” Bio•IT World, Nov 2006).
Brenner says the first CAGI—he dubs it “Pre-Pro CAGI” to emphasize this is just the first iteration—is not designed to pick winners per se “but find challenges and lay groundwork to improve methods in the future.”
The idea is for participants to take genetic variants and make predictions of molecular, cellular, or the organism’s phenotype. These predictions will be evaluated and reviewed against experimental characterizations. “CASP had a profound impact,” says Brenner, making the best use of protein structure evolutionary information. “The whole field turned based on using alignment information. We want to do the same thing for genome interpretation. Here’s a [gene] variant: predict!”
Brenner is encouraging people to submit predictions, which will then be assessed and evaluated. He hopes that CAGI will help identify bottlenecks in genome interpretation, inform critical areas of future research, and connect researchers from diverse disciplines whose expertise is essential to develop powerful methods for genome interpretation.
Datasets are being contributed by the likes of George Church and Jasper Rine, while the assessors will be Pauline Ng (Genome Institute of Singapore) and Gad Getz (Broad Institute). •
This article also appeared in the November-December 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.