GUEST COMMENTARY
By Robert J. Prill and Gustavo Stolovitzky
IBM Computational Biology Center, IBM T.J.Watson Research Center
Dec. 4, 2008 | Between October 29 and November 2, 2008, a motley crew of 500 experimental biologists computational biologists, algorithm designers, and assorted other practitioners of systems biology met at the Broad Institute, in Cambridge, MA. The occasion was a nucleation of three sister conferences: The fourth RECOMB Systems Biology conference (chaired by Andrea Califano), the fifth RECOMB Regulatory Genomics conference (chaired by Manolis Kellis) and the third annual Dialogue on Reverse Engineering Assessments and Methods (DREAM) Conference, or DREAM3 (chaired by Gustavo Stolovitzky). DREAM3 convened on October 31.
Several months earlier, DREAM participants downloaded voluminous data sets from the DREAM database and attempted this year’s reverse engineering challenges, a set of prediction problems inspired by current trends in experimental biology research. In 2008, 40 teams submitted a total of 413 predictions in one or more of the four challenge categories: Signaling Cascade Identification, Signaling Response Prediction, Gene Expression Prediction, and the In Silico Network Challenges. Participants predicted measurements that had been withheld, or in the case of the In Silico Challenges, predicted the network structures underlying computer-generated “measurements.”
The spirit of the conference was light-hearted, inclusive, and fun. Participation was anonymous, so there were no professional repercussions for trying a crazy idea to solve a challenge. Only the best performers in each category were identified by name, and it has become customary for them to reveal their winning strategies to their peers at the conference. In addition, there were presentations by the data producers, without whose efforts there would be nothing to predict. This year’s data producers were Gregoire Bonnet (MSKCC, Challenge 1), Peter Sorger, Julio Saez-Rodriguez and Leonidas Alexopoulos (Harvard, Challenge 2), Guillome Bourque and Neil Clarke (Genomics Institute of Singapore, Challenge 3), and Daniel Marbach (EPFL, Challenge 4).
It can be argued that the entire human endeavor to understand the world is a grand exercise in reverse engineering. Based on observations, we construct simplified models of the world that enable the mind to grasp complex concepts. And even though models can be useful abstractions, they can sometimes be misleading. The best test of a model is its ability to predict a situation that has not yet been observed. In practice, to validate models we often settle for a prediction of something that is known, but set aside as if it were not (e.g., “leave some out” style validation). It is more scientifically rigorous, however, if the people making the prediction are not allowed to look at the answer while they work.
Because of a paucity of well-validated data, the typical situation in reverse engineering research is to predict an answer that we knew all along. This can lead to a confounded validation design. The problem is not that the researcher is dishonest. The problem is that knowing the answer in advance inevitably creeps into our predictions despite rigorous technical controls to keep it out. When we know the answer in advance, it is not impossible to convince ourselves that we are making traction on a problem when in fact, we are not. As the adage goes, “If you torture the data long enough, it will confess to anything.”
Post-genomic measurement technologies have unleashed an explosion in the volume of quantitative cell biology readouts, and the trend is accelerating. There has been a concomitant rise in the application of machine learning, and similar frameworks, to infer the structure and function of the cell based on quantitative cell biology measurements. Despite enthusiasm for this work, there are reasons to doubt that the individual efforts of research groups working in isolation will be sufficient to move this body of work towards the ultimate goal of reverse engineering the cell. Every journal article and every conference paper reports an improvement in the state of the art, but is the field as a whole really making progress relevant to the ultimate goal?
Patterned after the time-honored Critical Assessment of Protein Structure Prediction (CASP) conference and associated challenges, DREAM fills a void left by the individual efforts of isolated research groups. The DREAM organizers provide a non-biased (blind) assessment of participants’ predictions on a set of annual reverse engineering challenges inspired by current biological research. However, beyond functioning as a referee and scorekeeper, DREAM achieves a higher purpose. It is a barometer of the efficacy of the current state of the art in reverse engineering algorithms in systems biology. If the community as a whole is not moving in a productive direction, at least there is some hope that the problem can be identified and the course can be altered.
A fundamental question that DREAM is in a position to answer is: Are the algorithms improving from year to year? Given only two years of challenges, there is not enough data to make a definitive statement. However, it is a curious fact that teams that predicted the In Silico networks well in 2007 did not predict well in 2008. There can be many reasons, but one possibility is that in 2007, In Silico “measurements” were generated in the absence of noise, whereas in 2008 additive Gaussian noise was present. The 2008 best performer on the In Silico network challenge specifically acknowledged the noise and turned it into a feature for learning the network. Upfront exploratory data analysis was necessary to identify appropriate features for learning.
Another lesson to emerge from the DREAM challenges is that an eclectic mix of methods can predict better than any individual method. By combining the individual predictions in DREAM3 signaling cascade identification challenge (Challenge 1) into an overall community prediction, there was statistical significance in the community’s accuracy, whereas each team’s prediction was not statistically significant. The community seems to have identified subtle signals in the provided flow cytometry data that were informative of the underlying signaling network structure.
Whereas a handful of teams score very high on some challenges, a sobering result consistent from year to year is that the vast majority of teams do not achieve accuracy better than random guessing, no matter what the challenge. For a practitioner, the value of this kind of instant feedback on the efficacy of a method should not be underestimated. It may be the most tangible benefit to participation in DREAM.
DREAM is a dialogue because we are all still learning what it is to reverse engineer biological systems. At this early juncture, dialogue is what is needed most.
----------------------------------
This article first appeared in Bio-IT World’s Predictive Biomedicine newsletter. Click here for a free subscription.