Phil Bourne: Open Data Evangelist on NIH Data Plan

May 4, 2015

By Bio-IT World Staff 

May 5, 2015 | Before we get ahead of ourselves, Philip Bourne, Associate Director of Data Science at NIH, emphasized that in the 6D framework of patient-centered health, we are still mired in “deception”—step two—and haven’t yet reached disruption. Democratization is still far on the horizon.

Bourne opened the 2015 Bio-IT World Conference & Expo in Boston, giving the opening plenary keynote to a packed auditorium. He last graced the Bio-IT World stage as the 2009 winner of the Benjamin Franklin Award for Open Science, recognized for his work on the Protein Data Bank, as a past president of the International Society for Computational Biology, and as the founding editor of PLoS Computational Biology. At NIH, his commitment to open science remains firm.


The mission statement of NIH’s Office of Biomedical Data Science is to foster an open ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances health, lengthens life, and reduces illness and disability, and to train the next generation of data scientists.

It’s a vision that will be central to the Precision Medicine Initiative.

Bourne predicted successes of the Initiative over the next ten years. A 50-year-old patient with diabetes may soon monitor her blood glucose wirelessly via an implantable chip. In five years she may change her drug dosing based on her sequencing results. In ten years her disease may be well-controlled thanks to personalized dosing information.

But for those predictions to become reality, communities, policies, and infrastructure must work together as a balanced “three-legged stool”.

People are coming together spontaneously to build communities, Bourne said, citing the Global Alliance for Genomics and Health (GA4GH), the NIH’s Big Data to Knowledge (BD2K) program made up of 12 centers; FORCE11; and more. Hackathons and gamers are the building the societies of the day, he said, and should be supported, encouraged, and funded.

Of course, he pointed out, the NIH funding ecosystem is fragile. Gamers bring “sheer energy and delight” to bioinformatics problems, Bourne said, but when NIH hosted gamers to tackle some of the most pressing big data problems there was immediate pushback. One Senator wrote an oped condemning the NIH for “playing games”.

Though he was tempted to ignore the reaction, Bourne quickly realized a dismissive reaction could have serious consequences for all research funding, proving how valuable policies are to progress—both top-down policies from the government, and bottom-up policies implemented within communities.

Bourne outlined some of his own suggestions.

Every organization needs a machine-readable data sharing plan, regardless of size. Data should be cited directly, legitimizing data as a form of scholarship. Software needs provenance.

Bourne referred a few very new steps forward in this regard. Early in April, NIH approved cloud hosting for dbGAP data. And Bourne outlined plans for The Commons, an ecosystem of data not restricted to research NIH has funded, but open to any participant.

The vision is to have a web of digital objects with unique identifiers hosted in public cloud spaces, supercomputing platforms, and possibly in-house or private clouds accessible to indexers and thus search.

Soon, NIH will experiment with the idea of credit grants, Bourne said, encouraging grantees to spend their credit grants with Commons-compliant resources. The hope, Bourne said is to drive competition within the marketplace, and have a “much better handle” on what data exists and where it is. NIH could even measure data usage over time, even “potentially” choosing to sustain datasets after grants run out, Bourne said.

Finally, the new digital enterprise needs upgraded infrastructure; Bourne focused on standards for data, particularly within the Commons, and training.

There’s a need for data science specialists, Bourne said, and recruiting and training new workers needs attention. For instance, how does one choose between various virtual and physical courses for data science? What career path workshops are available? Bourne called for schemes to measure the utility of courses.