Data Science, AI, Genomics, & Much More: A Preview of #BioIT18

April 16, 2018 | We are in countdown mode to the Bio-IT World Conference & Expo. In just a few weeks, 4,000 attendees, exhibitors, and students will converge on the Seaport World Trade Center for our 17^th meeting of the bio-IT community. Whether you are coming to tour the exhibit hall, watch the three awards programs, listen to the plenary speakers, network at Lawn on D, participate in the Hackathon, or dig deep into the latest science you are sure to have a full and fruitful three days. We’re taking a break from our own meeting prep, awards judging, and other planning to mark our programs. Here’s a first pass at the events, talks, and programming we have flagged. –The Editors

There’s an all-star plenary program this year, starting with Mark Boguski, EVP and Chief Medical Officer at Liberty BioSecurity on Tuesday evening. He’ll discuss creating a new healthcare ecosystem spanning population health to individualized care. On Wednesday morning, we’re hosting a panel of data science experts—John Reynders (Alexion Pharmaceuticals); Tanya Cashorali (TCB Analytics); Jerald Schindler (Alkermes); Lihua Yu (H3 Biomedicine)—getting data science examples from across the enterprise (and answering some hiring questions), and on Thursday New York Times bestselling author Carl Zimmer will join us to discuss his new book, She Has Her Mother’s Laugh, and explore the complexity and controversy of heredity.

The awards programs at Bio-IT World never disappoints. Best of Show highlights the best new products exhibited on the Expo floor. Winners will be announced in several categories chosen by our judges, and the Peoples’ Choice Award winner will be chosen by the Bio-IT community. (Finalists will be announced soon.) The Bio-IT World Best Practices Awards will again recognize examples of collaboration and innovation within the Bio-IT space, highlighting projects that push the life sciences forward. And finally, we are looking forward to hearing the Benjamin Franklin Award plenary address given by this year’s winner, Desmond Higgins.

Last year’s Bio-IT World Hackathon on FAIR data has given rise to not only another competition, but a track on FAIR data as well. One of the keys to making data more findable, accessible, interoperable, and reusable is to make use of unique, permanent, and universally accepted identifiers and metadata, which link datasets semantically and allow users to draw useful inferences and learnings. We are looking forward to hearing from publishing executives from Elsevier and Nature Genetics on how to promote the wider use of FAIR data, as well as speakers from AstraZeneca, Dell and Mount Sinai about FAIR data efforts.

John Jacquay of BioTeam will dig into iRODS. iRODS’ power is in its ability to virtualize the data, Jacquay told Bio-IT World earlier this year. To some extent, Jacquay said, "its flexibility is also part of its curse." Developers building tools on top of iRODS have to pay close attention to managing that metadata because with this type of system it is "garbage in, garbage out." Unless there is an effort to ensure that people put in good system metadata or manage it effectively, the ability to query it again is limited. He’ll share more about how organizations can leverage the features of iRODS to setup automated bioinformatics pipelines, optimize data storage mediums and access patterns, share and collaborate on data, and provide intelligent insight via data visualizations. Wednesday, May 16, 1:55 pm

In the new Machine Learning Track, Sanjay Joshi, CTO, Healthcare and Life Sciences, at H2O.ai, promises to cut through the hype about AI and life sciences with two clinical use-cases and then to walk through how similar principles can apply to financial and operational use-cases as well. Wednesday, May 16, 2:25 pm

As part of the NIH-funded Biomedical Data Translator project, Kimberly Robasky at the University of North Carolina, Chapel Hill, is working to integrate multiple, previously disparate datasets, and empower investigators with new tools for data-driving patient subtyping. For example, through the Data Translator project, users can combine clinical records with exposure data in support of powerful models for classification. She will present the results from supervised and unsupervised machine learning models trained to real world evidence (RWE) from asthma phenotypes curated in the Carolina Data Warehouse (CDW-H), combined with publicly available exposome data (e.g., PM2.5, ozone). Thursday, May 17, 10:40 am

Last year’s Bio-IT World Best Practices Winner in Knowledge Management was the Allotrope Foundation. Bayer is implementing the Allotrope Framework to escort scientific data throughout its complete lifecycle from method development over acquisition, processing, reporting, archiving to submission, using a common set of standard tools. Henning Kayser, R&D IT, Scientific Development IT, Bayer will give an update. Wednesday, May 16, 11:30 am

In a lunch sponsored by Intel, Geraldine Van der Auwera, Broad Institute, will outline how the Broad worked to democratize access to its GATK Best Practices Pipelines. The approach: providing versions of production pipelines optimized for a range of platforms and priorities (e.g. cost vs. speed), validated by methods developers to ensure scientific equivalence. Wednesday, May 16, 12:40 pm

Christian Stolte of the New York Genome Center will discuss the visualization tools within MetroNome, NYGC’s variant repository database. Without writing code, and while protecting privacy, MetroNome provides web-based tools for data exploration that use intuitive visualizations. It is built on a database that connects genomic and clinical data across projects and diseases. NYGC leverages public datasets for comparison and increased statistical power. Wednesday, May 16, 1:55 pm

Jill Chappell and Malika Mahoui, both at Eli Lilly, plan to present an integrated systems pharmacology approach to aid the prediction of adverse drug reactions with bioinformatics tools. High quality annotated databases supported with analytics and visual descriptive techniques are being used to collate information about upstream and downstream proteins in a pathway along with tissue distributions, and additionally integrated with clinical information of other drugs and their adverse reactions known to interact with the proteins in that same pathway. Wednesday, May 16, 2:25 pm

Taking a suggestion from last year’s plenary discussion, a Thursday morning panel will explore how to build a career in big data. Panelists Stephanie Hintzen (Dana Farber Cancer Institute), Chrystal Mavros (Boston Children’s), Jeremy Jenkins (Novartis Institutes for BioMedical Research), Bino John (Dow-Dupont Ag Division), Joseph Lehar (Merck), Patrice Milos (Medley Genomics), Michelle Penny (Biogen), and Daniel Robertson (Indiana Biosciences Research Institute) will discuss what it takes to get hired and succeed in industry. Thursday, May 17, 10:40 am

Can AI Beat Cancer? That’s the question Jay (Marty) Tenenbaum, Founder and Chairman of the Cancer Commons, promises to tackle. AI can beat Go and drive cars, and Tenenbaum argues that it can help connect cancer doctors and patients to the right information at the right time. AI can plan and coordinate thousands of formal and informal treatment experiments that take place daily in oncology, optimizing individual outcomes and maximizing collective knowledge. Making this vision a reality will require global collaboration, and Tenenbaum will discuss opportunities for all to participate. Wednesday, May 16, 11:00 am

Luba Smolensky, Director of Data Science & Analytics, The Michael J. Fox Foundation, will describe the MJFF initiative for open source Parkinson’s disease research and data integration. MJFF is leading a Parkinson’s research data curation and standardization effort that will accelerate insights into the disease. The goal is to provide access to curated datasets across platforms for all researchers across academia, public institutions, and industry. Wednesday, May 16, 2:35 pm

William Van Etten, BioTeam, will describe a simple tool that leverages several AWS services (S3, Athena, Lambda, Cognito, IAM, CloudWatch) to enable a biologists/geneticist to drag & drop VCF and BAM files onto an S3 bucket, then point their web browser at this bucket, to provide a scalable, server-less, web UI to querying the reads and annotated variants within these files. He’ll explain what BioTeam learned from this proof of concept software development. Wednesday, May 16, 4:00 pm

Alexander Sherman, Harvard Medical School, will explore patient centricity and big data in clinical research. He’ll outline an implemented system for secure unique patient identification that allows for aggregation of information and data for people with diseases across studies and venues, thus creating a clinical and translational research ecosystem, in which clinical and phenotypical data are connected to biobanks, image banks, whole-genome sequences, -omics, patient-reported outcomes and mobile apps. Wednesday, May 16, 1:55 pm

At the Dana-Farber Cancer Institute, Catherine Del Vecchio Fitz and her colleagues built and deployed an automated clinical trial matching platform called MatchMiner (which won a 2017 Best of Show Award). The goal of MatchMiner is to efficiently and accurately match patients to relevant clinical trials using the patient's genomic profile and manual trial pre-screening, which can result in missed opportunities. Automated matching against uniformly structured and encoded genomic eligibility criteria is essential to keep pace with the complex landscape of precision medicine clinical trials. Wednesday, May 16, 4:30 pm

Krista McKee (Takeda) and Raveen Sharma (Deloitte Consulting) will present Takeda’s Data and Analytics Hub platform, which was conceived, designed, and built to address issues of data transparency, trust, and accessibility. A critical use case of the platform—called Platypus—manages clinical data review/medical monitoring that will ultimately allow for efficient cross-study and cross-compound analysis. “We used powerful advanced analytics on the clinical data to give the medical reviewer that power of electroreception within the data. It’s the equivalent of swimming through muddy waters with their eyes closed, but still being able to see the food,” McKee told Bio-IT World earlier this month in a conversation about Platypus. Thursday, May 17, 3:00 pm

Enoch S. Huang, Head of Computational Sciences, Pfizer Worldwide Research and Development, plans to talk roadblocks in his presentation on “Things I Didn’t Know I Needed to Know before Attempting to Implement a Cloud-Based Genomics Data Environment.” He’ll outline his experiences moving genomics data processing and analysis to a public cloud environment and the unanticipated challenges associated with implementation, he says, “most of which were not technical or funding-related. Nevertheless,” he continues, “I am optimistic about the future of this platform, and will be sharing the reasons why I believe that this strategy will ultimately produce sustainable solutions for pharma R&D.” Wednesday, May 16, 4:20 pm

Catherine Brownstein, Boston Children’s Hospital & Harvard Medical School, studies very early onset psychosis and other rare conditions, and it’s been a fruitful area of research that often requires going back and looking at—and sometimes revising—older cases. She’ll present a global platform for rare disease and give case studies and examples of how her team partners with clinicians to provide the best possible care. Wednesday, May 16, 2:25 pm