By John Russell
July 14, 2008 | How good is your algorithm for turning experimental data—mainly gene expression or protein interactions—into accurate pictures of biological networks? If the results from the first DREAM challenge are any indication, plenty of progress is needed to improve pathway predictions from data.
The DREAM initiative—Dialogue for Reverse Engineering Assessments and Methods—is intended to help the life science community improve computational techniques for inferring networks. Datasets from known networks are provided and researchers are invited to uncover the true underlying networks using computational techniques. In this first “competition,” there were five challenges, 36 participating teams, and 110 predictions.
“DREAM is trying to understand whether we predict something meaningful when we take high-throughput data like gene expression and we say, well this is the network of interactions,” says one of DREAM’s organizers, Gustavo Stolovitzky. “Usually you validate with five or six connections that you either do in the lab or validate through literature. But actually you have made 1,000 predictions and you only cherry pick the five that match your data, so in a way, we don’t know whether we are fooling ourselves or whether really we have something in those predictions.”
On balance, this year’s predictions were lousy. Talking about one specific challenge, Stolovitzky says, “These were 200 genes of which 53 were true positives and the rest were not. Some people did well in the first 10 or 15, but then really not so well. At the same time many, many teams, some of which are very well known people that have one of these algorithms as their favorite algorithm [did very poorly] and their favorite algorithm is very bad, really bad, as bad as random so if you put a random predictor you will have predicted better.”
Predictions of protein interactions were even worse. Stolovitzky says simply, “If you, today, go with a set of proteins and sequences and you give it to someone who says, ‘I can predict which of these proteins are interacting,’ probably he is going to give you garbage. And that is a truth, but it is important to know that this is the case. It’s not to just roast the people who didn’t do well. It’s just to understand where we should improve.”
Ouch. Not surprisingly no commercial tool providers tackled the challenges. And let’s be clear—as Stolovitzky is—the idea isn’t to roast anyone. The goal is to learn which algorithms work best, to assist in developing better algorithms, and to help the entire community move forward. In releasing DREAM2 results, the names of the teams were not revealed.
Stolovitzky is adjunct associate professor of Biomed Informatics, Columbia University, and manager, Functional Genomics & Systems Biology, IBM Research. He notes that despite the poor results of the first set of challenges, much good work is being done by researchers using similar techniques.
Asked for his thoughts on recent work led by Merck researcher Eric Schadt in identifying key networks involved in metabolic disease by interpreting gene expression and other data (see “Merck’s Informatics Mission,” Bio-IT World, May 2008), he says, “They are great. I think he’s doing something that we all should be doing, and we are not doing in DREAM, which is he puts together a lot of [different kinds of] information like quantitative analysis, gene expression, and clinical information.”
Plans are already afoot for DREAM3 which will be held in conjunction with 5th Annual RECOMB Satellite on Regulatory Genomics and the 4th Annual RECOMB Satellite on System Biology next October 29-Nov 2 (http://compbio.mit.edu/recombsat/). The meeting is jointly organized by the Broad Institute of MIT and Harvard, and the MIT Computer Science and Artificial Intelligence Lab (CSAIL). “We would like to cordially invite anyone to submit network inferences and answers to our new biological prediction challenges. To access the data sets and descriptions of the challenges, please go to http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM3_Challenges,” say Stolovitzky and DREAM3 co-organizer, Andrew Califano.
Interest in DREAM is growing, reports Stolovitzky. The International Society for Computational Biology is creating a team of students to participate in DREAM he says. He also says they will probably ask researchers if they want to be identified next year. Given the suggestion that some well-regarded researchers fared less well, this might by an opportunity for others to gain notice.
In any case, for the foreseeable future, consistent prediction of underlying networks from just a few data types and using only computational techniques is still something of a dream.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.