February 3, 2012
| Bio-IT World > Interpreting DREAM3


Interpreting DREAM3



GUEST COMMENTARY

By Robert J. Prill and Gustavo Stolovitzky
IBM Computational Biology Center, IBM T.J.Watson Research Center

Dec. 4, 2008 | Between October 29 and November 2, 2008, a motley crew of 500 experimental biologists computational biologists, algorithm designers, and assorted other practitioners of systems biology met at the Broad Institute, in Cambridge, MA. The occasion was a nucleation of three sister conferences: The fourth RECOMB Systems Biology conference (chaired by Andrea Califano), the fifth RECOMB Regulatory Genomics conference (chaired by Manolis Kellis) and the third annual Dialogue on Reverse Engineering Assessments and Methods (DREAM) Conference, or DREAM3 (chaired by Gustavo Stolovitzky). DREAM3 convened on October 31.

Several months earlier, DREAM participants downloaded voluminous data sets from the DREAM database and attempted this year’s reverse engineering challenges, a set of prediction problems inspired by current trends in experimental biology research. In 2008, 40 teams submitted a total of 413 predictions in one or more of the four challenge categories: Signaling Cascade Identification, Signaling Response Prediction, Gene Expression Prediction, and the In Silico Network Challenges. Participants predicted measurements that had been withheld, or in the case of the In Silico Challenges, predicted the network structures underlying computer-generated “measurements.”

The spirit of the conference was light-hearted, inclusive, and fun. Participation was anonymous, so there were no professional repercussions for trying a crazy idea to solve a challenge. Only the best performers in each category were identified by name, and it has become customary for them to reveal their winning strategies to their peers at the conference. In addition, there were presentations by the data producers, without whose efforts there would be nothing to predict. This year’s data producers were Gregoire Bonnet (MSKCC, Challenge 1), Peter Sorger, Julio Saez-Rodriguez and Leonidas Alexopoulos (Harvard, Challenge 2), Guillome Bourque and Neil Clarke (Genomics Institute of Singapore, Challenge 3), and Daniel Marbach (EPFL, Challenge 4).

It can be argued that the entire human endeavor to understand the world is a grand exercise in reverse engineering. Based on observations, we construct simplified models of the world that enable the mind to grasp complex concepts. And even though models can be useful abstractions, they can sometimes be misleading. The best test of a model is its ability to predict a situation that has not yet been observed. In practice, to validate models we often settle for a prediction of something that is known, but set aside as if it were not (e.g., “leave some out” style validation). It is more scientifically rigorous, however, if the people making the prediction are not allowed to look at the answer while they work.

Because of a paucity of well-validated data, the typical situation in reverse engineering research is to predict an answer that we knew all along. This can lead to a confounded validation design. The problem is not that the researcher is dishonest. The problem is that knowing the answer in advance inevitably creeps into our predictions despite rigorous technical controls to keep it out. When we know the answer in advance, it is not impossible to convince ourselves that we are making traction on a problem when in fact, we are not. As the adage goes, “If you torture the data long enough, it will confess to anything.”

Post-genomic measurement technologies have unleashed an explosion in the volume of quantitative cell biology readouts, and the trend is accelerating. There has been a concomitant rise in the application of machine learning, and similar frameworks, to infer the structure and function of the cell based on quantitative cell biology measurements. Despite enthusiasm for this work, there are reasons to doubt that the individual efforts of research groups working in isolation will be sufficient to move this body of work towards the ultimate goal of reverse engineering the cell. Every journal article and every conference paper reports an improvement in the state of the art, but is the field as a whole really making progress relevant to the ultimate goal?

Patterned after the time-honored Critical Assessment of Protein Structure Prediction (CASP) conference and associated challenges, DREAM fills a void left by the individual efforts of isolated research groups. The DREAM organizers provide a non-biased (blind) assessment of participants’ predictions on a set of annual reverse engineering challenges inspired by current biological research. However, beyond functioning as a referee and scorekeeper, DREAM achieves a higher purpose. It is a barometer of the efficacy of the current state of the art in reverse engineering algorithms in systems biology. If the community as a whole is not moving in a productive direction, at least there is some hope that the problem can be identified and the course can be altered.

A fundamental question that DREAM is in a position to answer is: Are the algorithms improving from year to year? Given only two years of challenges, there is not enough data to make a definitive statement. However, it is a curious fact that teams that predicted the In Silico networks well in 2007 did not predict well in 2008. There can be many reasons, but one possibility is that in 2007, In Silico “measurements” were generated in the absence of noise, whereas in 2008 additive Gaussian noise was present. The 2008 best performer on the In Silico network challenge specifically acknowledged the noise and turned it into a feature for learning the network. Upfront exploratory data analysis was necessary to identify appropriate features for learning.

Another lesson to emerge from the DREAM challenges is that an eclectic mix of methods can predict better than any individual method. By combining the individual predictions in DREAM3 signaling cascade identification challenge (Challenge 1) into an overall community prediction, there was statistical significance in the community’s accuracy, whereas each team’s prediction was not statistically significant. The community seems to have identified subtle signals in the provided flow cytometry data that were informative of the underlying signaling network structure.

Whereas a handful of teams score very high on some challenges, a sobering result consistent from year to year is that the vast majority of teams do not achieve accuracy better than random guessing, no matter what the challenge. For a practitioner, the value of this kind of instant feedback on the efficacy of a method should not be underestimated. It may be the most tangible benefit to participation in DREAM.

DREAM is a dialogue because we are all still learning what it is to reverse engineer biological systems. At this early juncture, dialogue is what is needed most.

----------------------------------

This article first appeared in Bio-IT World’s Predictive Biomedicine newsletter. Click here for a free subscription.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .