Sage Bionetworks and the Thriving Ecosystem of Challenge-Based Discovery

By Maxine Bookbinder

June 2, 2014 | “How do you get a million eyeballs on your project?” asked Stephen Friend, president, co-founder and director of Sage Bionetworks, in his keynote at the 2014 Bio-IT World Expo. “What is the way you can get the rest of the world to see and try to help you solve something? What’s the most efficient way to get the truth?”

The answer, Friend believes, is scientific contests or challenges.

He’s not alone. The culture of privacy in healthcare, pharma, academia, and government-supported research is yielding to more shared data, and an influx of new open challenges to find therapies cheaper, faster, and with fewer resources.

The scientific intent of each challenge varies, but the overarching goals are to drive innovation and benchmarking. “We have so much data and so many unsolved questions, we need more people developing computational methods that drive modern biomedicine forward,” says James Costello, assistant professor at the University of Colorado Anschutz Medical Campus and Director of Computational and Systems Biology Challenges in the Sage Bionetworks/DREAM organization. “Challenges push science to be open.”

According to a 2013 paper authored by Costello and Gustavo Stolovitzky, manager of functional genomics and systems biology IBM, and director and founder of the DREAM (Dialogue for Reverse Engineering Assessments and Methods) initiative, “challenge-based competitions refer to a framework for addressing fundamental research questions in which the community is presented with a challenge, the data to address the challenge, and independent, unbiased assessment to rank submitted solutions.” (Clin Pharmacol Ther. 2013 May; 93(5):396-8.)

A few items from the growing list of challenge topics include text mining; systems biology, predicting protein structure, drug sensitivity, and biomarkers for Alzheimer’s Disease; clinical genome interpretation; visualization (see, “Illumina Showcases New Visions in Genomic Interpretation”); cellular network inference; species translation (see, “sbv IMPROVER Launches Species Translation Challenge”); and creating patient-centric websites and apps (see, “Lilly’s Design Challenges Take on Patient User Experience”).

“Challenges bring diversity by leveraging the phenomenon of the ‘wisdom of the crowds,’’’ says Costello.

“There are a lot of really smart people out there,” he adds. “In the span of only several months, if a challenge attracts 100 groups, and each group each spends 100 hours on the problem, then that’s 10,000 person hours. This out-performs what most companies can do on their own.”

Costello

James Costello, Director of Computational and Systems Biology Challenges at Sage Bionetworks.

The excitement of entering a challenge, solving a problem, and winning a prize encourages collaborative competition. While some prizes are monetary, others include authorship, peer recognition, and future grant applications.

For a recent DREAM breast cancer challenge, Science Translational Medicine agreed—“remarkably!”—to reserve a publication slot for the winning team, Friend said at the Bio-IT World Expo (see, “Topple the Walls, Open the Data”). “The interest here has changed the incentive-rewards structure. So instead of the classic peer review, why not have a challenge-assisted review? Why not let the person who performed the best get the paper, instead of the standard peer review process?” he proposed.

However, the biggest motivation is becoming part of a community addressing a common problem, and the greatest reward is actually solving the problem. “People want the methods they develop to contribute to scientific discovery,” notes Costello. “The open science idea drives innovation quicker. The motivation is to nucleate a community around a set of challenges and find a solution to the problem. I have interacted with many people I never would have met outside these challenges.”

Traditional research has its limits: data can take years to gather, and may not be publicized for years after that; research—and potential discovery—is tightly held, and in-house researchers may inflate their evaluations of methodologies to highlight their methods’ strengths and ignore weaknesses, termed the “self-assessment trap.”

Challenges, on the other hand, can be unbiased with consistent method assessments. Evaluations are blinded, and there can be diverse participation and cumulative innovation. Determination to tackle a challenge within a specific timeframe with current data and best methods motivates hard work, and community collaborations on fundamental research questions bring new voices to the conversation. Individuals can participate worldwide, with different skills and mindsets, to solve problems. For example, physicists may analyze and approach problems differently than biologists, physicians, or engineers.

Open challenges make high-quality and well-annotated data accessible, allowing other researchers to continue or even complete previously unsolved challenges. Some challenges implement code sharing. For an informatics challenge, an entrant in New York can submit code, someone in Sweden can modify the code and out-perform the original entrant, and someone else in India can then further improve on that submission.

Not everyone is on board, however.

Critics complain that challenges are too short and could negatively impact how researchers approach scientific problems, rushing to results. Other researchers remain protective of personal data, ideas, and potential earnings.

In response, some challenge sponsors, such as Sage Bionetworks, have initiated collaborative contracts, agreements delineating how participants can share information and still maintain data privacy and first publishing and production rights. DREAM challenges also track the provenance of participant submissions that offer appropriate—and deserved—publication and recognition opportunities.

The 2012 Sage-DREAM breast cancer prognosis challenge required participants to submit their methods as open-source R-code. The codes, run on a gold-standard evaluation data set, were viewable to all participants and reported immediately to a leaderboard. “The immediate feedback and code sharing allowed participants to better their performance by resubmitting improved code,” says Costello.