Hacking Bio-IT: Bio-IT World Launches First Hackathon

By Allison Proffitt

May 10, 2017 | At the end of May, the Bio-IT World Conference & Expo will host our first Bio-IT Hackathon, this year focusing on FAIR data, data that are findable, accessible, interoperable, and reusable.

Registration is free to individuals and teams who want to participate, and includes access to the Bio-IT World Expo Hall, all plenary keynote sessions, and several session talks on FAIR data. Three teams will receive cash prizes for their work.

In March 2016, a group of authors published in a comment in Nature outlining “a concise and measureable set of principles that we refer to as the FAIR Data Principles.” The FAIR principles were unique from other open data initiatives that focused on the human scholar, the authors said, instead putting “specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.”

Good data management shouldn’t be the goal, the authors argued. Instead data stewardship is merely the conduit for discovery, innovation, and data use. It’s a position that aligns well with the Bio-IT World community and our vision: connecting people and data for innovation and advances in precision medicine.

“We feel strongly that wider recognition of the value of FAIR Data principles and their application could have real benefits for improving the value of datasets and contributing to better knowledge generation across biomedical fields, including clinical trials and healthcare,” said Phillips Kuhl, President of Cambridge Healthtech Institute, which produces the Bio-IT World Conference & Expo.

With the help of ONTOFORCE, a Belgian company that develops semantic search technologies, Bio-IT World is launching our first hackathon, an opportunity to learn about FAIR data principles and apply them to the problems and datasets in the Bio-IT community.

FAIR data isn’t completely novel. There have been similar ideas in the past, some successful, some not, explains Filip Pattyn, scientific lead and product manager at ONTOFORCE. “But we have a lot of things in the right place now,” he believes, for the idea to take off. “There is a need, there is really a need, because the amount of data is growing in all kinds of niches and domains… Things are consolidating around FAIR data.”

Hans Constandt, founder and CEO of ONTOFORCE, believes the biggest bottleneck slowing our progress toward medical advances is the quality of data. “You can make the nicest tool, but crap [data] in is still crap [data] out. Better open data will really have an impact on science… If everybody would do a little bit of working making their data better quality… then the world would be a lot better off.”

ONTOFORCE’s user-friendly DISQOVER platform uses federated semantic search through an unlimited number of internal, external, and third party data sources accessed through a scalable cloud architecture. Parts of the DISQOVER platform are free to access, but the architecture can be applied to internal and private datasets as well.

FAIR Start

The Hackathon is open to anyone interested in learning more about FAIR data, data management, or open data, and working on a project. “Part of the idea of the hackathon is to provide people with more hands-on exposure to the concepts and practice of FAIR Data, with the goal of attracting people with domain expertise, data scientists, and software coders to come together and work as teams on open data sets to address better solutions for specific use cases,” Kuhl said.

We expect some attendees will be researchers with datasets they’d like to share either within their organization—a kind of open data—or with the larger research community. Other registrants will come with a problem or question that they feel sure could be addressed if only they could get to the right data.

The Hackathon is structured to provide guidance on FAIR data principles and approaches as well as hands-on application. The programming includes tutorials on FAIR data from the Dutch Techcenter for Life Sciences (DTL) and presentations of FAIR data applied from AstraZeneca, Amgen, Biogen, and ONTOFORCE. Attendees and registrants will learn more about the concepts and apply them immediately.

ONTOFORCE has been involved in several hackathons, and Pattyn has advice for entrants. “Try to make something that’s working.” Of course projects will be in their earliest stages—“first quick hacks”—but Pattyn insists that teams focus on working projects that are sustainable, not just ideas. We want to educate everyone about the philosophy of FAIR data, he said, but hopes that the projects aren’t just for fun or the game, but really useful and sustainable. To support the sustainability of projects, DTL is offering to facilitate publication for datasets during the Hackathon.

But it is a game, and there are prizes. Teams will present their work on Thursday afternoon at a live session on the Expo floor. A jury will consider the entries, and award three prizes: a $1,000 gift card and a speaking slot at Bio-IT World Conference & Expo 2018; $500 and registration to the 2018 event; and $250.

Stumped for ideas? Pattyn was willing to share ONTOFORCE’s planned project to get your ideas flowing.

In February of 2016, Foundation Medicine published a dataset of mutated genes in their pediatric cancer samples. It’s a very nice dataset, Pattyn said, accessible through a website Foundation Medicine designed to visualize and interact with the data. But if a researcher wanted to incorporate the dataset with other data, it’s only downloadable as an Excel file.

“It’s not really linkable. You have to do some tweaking of the data, some conversions, some mapping, some copy and pasting. The data is not immediately usable for a researcher to implement it somewhere else,” Pattyn said. ONTOFORCE hopes to reformat that dataset during the Hackathon.

Pattyn applauded Foundation Medicine’s release of the data, but hopes data generators will think more broadly about how others use datasets. “That’s the kind of thing we’d like to change.”