By John Russell
April 16, 2009 | Almost exactly three years ago, organizers of DREAM--Dialogue on Reverse Engineering Assessments and Methods--sketched an ambitious plan help improve network predictions (inferences) in life science. The plans was to conduct a yearly DREAM challenge with blinded data sets; invite predictions; hold a conference to discuss results; build a repository of better methods for network inferencing; and foster ongoing dialogue among computational biologists seeking to develop and use these tools.
So how’s it going?
“I wonder that same thing every day,” says Gustavo Stolovitzky, modestly. He is one of DREAM’s founding organizers.
No doubt more could be done, but much has been done. Two sets of challenges, DREAM2 and DREAM3, have been conducted. Many of the lessons learned are contained in a text just published in March, The Challenges of Systems Biology: Community Efforts to Harness Biological Complexity*, representing work by two community efforts: the European Network of Excellence (ENFIN) and the DREAM project. It contains a section on DREAM2
The writing up of DREAM3 results is nearing completion. Preparations for DREAM4 are underway with the next set of challenges set to be posted sometime this summer. A new DREAM web site is set to go live in a matter of weeks. A collection of papers from DREAM3 will be published in the journal, PLoS ONE--interestingly one of them is being prepared by a team that performed well in DREAM2 but badly in DREAM3 and has figured out in part why, which is the subject of the paper.
“They thought because they did poorly this year they wouldn’t be invited to contribute but I asked them to look retrospectively on both efforts and they feel they understand what happened. This paper will teach us something,” says Stolovitzky.
By virtually any measure, the DREAM project is a great success. The challenge, made stark by the competition’s results, is that network inferencing/prediction is difficult. By far, most predictions have been wrong--or in DREAM parlance no better or even worse than if made by chance. It is both sobering and hopeful.
“The bright side of the story is we did what we thought we were going to do and it wasn’t too painful,” says Stolovitzky, manager, functional genomics & systems biology, IBM Research, and adjunct associate professor, biomed informatics, Columbia University. He co-organized DREAM with distinguished researcher Andrea Califano of Columbia University.
In DREAM2, “We learned basically that the community is very varied. Sometimes pur results are a little bit short of where we think they are. [On one challenge] basically eight of 11 groups predicted not very different from random.”
That particular challenge was as follows: “BCL6 is a transcription factor that plays a key role in both normal and pathological B cell physiology. Unpublished data on BCL6 genomic binding sites formed the basis of a challenge in which participants attempted to discriminate functionally validated BCL6 transcriptional targets from decoy targets, for which evidence indicates no physical and functional control.”
The first lesson says Stolovitzky, is “when people think they are doing something, sometimes they are prisoners of their own prejudices." Interestingly, groups that emphasized using all the data rather than a subset of the data that was most relevant for the particular biological question did poorly. “It turns out the groups that took a more eclectic multi-information [approach] in the sense that they took many kinds of information and integrated those and did better than those who had in mind what they should find.” DREAM2 is thoroughly discussed in a section the new text (“Lessons from the DREAM2 Challenges, p.159-195.”) prepared by Stolovitzky, Califano, and post-doc, Robert J. Prill.
In retrospect, Stolovitzky believes the exclusive emphasis on network prediction in DREAM2 was a mistake. He says that to some extent networks can be thought of as constructs of the mind and perhaps less representative of how science works. DREAM3 includes more emphasis on predicting data values, something closer to how science works, he says. Forty teams produced more than 400 predictions in DREAM3, and again, most were relatively poor predictions
“Interestingly, the people who did the best in DREAM2 did very poorly in DREAM3 for the in silico challenge. And the people who did very well in DREAM3, I asked them to look retrospectively (DREAM2) and they didn’t do very well there either; so it seems that nuances of the data set might make a very good algorithm not so good. But this is only two data points,” says Stolovitzky.
Of course, the results are best examined in light of the specific question that was posed and space constraints here prevent presenting individual challenges in detail. It’s best to consult the recent text or the web site for a more complete explanation.
Another interesting DREAM3 finding, says Prill, is “The winners, for the prediction of proteins and cytokine levels, did a linear regression like solution. It’s funny because I noticed we get somewhat upset, I don’t know why, when people do a kind of statistical regression, and like Gustavo says, it gives no rationale for why but those tend to be winning strategies for certain types of problems.”
Says Stolovitzky, “I think that the hope, the maximum expectation we had was to learn how the circuits are wired, but like John Moult (CASP pioneer) once told us, it’s not unlike what happens in CASP--the methods that seem to be doing the best, at least when it comes to doing prediction, are methods that have more to do with statistical thinking or statistical learning rather than with physical things, you know mechanistic things. In my mind this is a little unsatisfying.”
DREAM, of course, is patterned after decade-plus-old CASP (Critical Assessment of Techniques for Protein Structure Prediction), a similar effort organized to improve methods to in silico prediction of protein structures from only their amino acid sequence. CASP is a biennial competition.
Having successfully run two DREAM exercises, Stolovitzky and Prill are looking forward. Despite the relatively poor individual performances,when all participants results are aggregated together and scored appropriately, the community does a good deal better than chance, which is the yard stick.
“There are things we would like to change and things we wouldn’t like to change,” says Stolovitzky. “There was one type of challenge, the in silico challenge, which delights the computational people and is abhorred by the biologists. That challenge seems to be useful and has the beauty of highlighting how people are doing because we can see whether the community as a whole is improving or not.”
On change: “I think we definitely want to create challenges that are more biologically meaningful, that teach us something new about the biology that we are looking at. You know the teams have the ability to do much more. We should be able to tap that intelligence and still struggling to find out how.”
It should be noted that several big pharma have participated in DREAM challenges, but have not been so identified by the organizers. Stolovitzky says he also wishes more “tool” companies would participate, though they, like the biopharma world, are loathe to have shortcomings revealed.
Just three years old, DREAM deserves the support of the computational biology community. Its fledgling efforts can help determine best approaches to network inferencing, gradually accumulate and promulgate well-characterized algorithms, and uncover new biology.
* The Challenges of Systems Biology: Community Efforts to Harness Biological Complexity, represents work by two community efforts: the European Network of Excellence (ENFIN) and the DREAM project. It provides an overview of the state-of-the-art in subdisciplines within systems biology and a clearer understanding of the strengths and weaknesses of algorithm development, the art and science of modeling in present-day biology, and the role of community efforts in today’s information-intensive biological research
The ENFIN addresses discrete function prediction, network reconstruction, and systems-level modeling, emphasizing the importance of strong collaboration between “dry” and “wet” laboratories. The DREAM project aims to foster collaboration between computational and experimental biologists to understand the limitations, and to enhance the strengths, of the efforts to model and reverse engineer cellular networks from high-throughput data.
This article first appeared in Bio-IT World’s Predictive Biomedicine newsletter. Click here for a free subscription.