A Virtual Feast: A Preview of Bio-IT World 2020

September 1, 2020 | Bio-IT World Conference & Expo Virtual kicks off in a month. The program this year includes three days full of content, networking, and interaction across 16 tracks, plenary session, workshops, posters, awards, and happy hours. The virtual meeting platform offers integrated tools to stay connected with colleagues, reach out to speakers, ask questions during presentations and live Q&A panels, and tour virtual exhibit booths.

And for the first time ever, many presentations will be available on demand after their first showing. Attendees will be able to learn more, ask more questions, and derive more value from the program for weeks and months to come.

We are already busy marking our agendas—notations we’ll be able to transfer to the virtual platform soon. Here’s some of what is our on current list.—The Editors

As usual, the event kicks off with workshops designed to be instructional, interactive and in-depth. This year topics include data management for biologics, a crash course in AI, data science-informed decisions, data at scale, and more.

The 2020 Bio-IT World plenary program is exceptional. Susan Gregurick, NIH’s Office of Data Science Strategy, and Rebecca Baker, Director, HEAL, will share NIH’s strategic vision for data science, and how the the pandemic will shape NIH’s future. Tuesday, October 6

Robert Green, Brigham & Women’s Hospital and Harvard Medical School will share fascinating news on his latest sequencing project. And Natalija Jovanovic, Sanofi Pasteur, will share pharma’s view of the AI-enabled future. Thursday, October 8

A panel of experts including Seth Cooper, Northeastern University; Lee Lancashire, Cohen Veterans Bioscience; Pietro Michelucci, Human Computation Institute; and Jérôme Waldispühl, McGill University will tackle how AI, citizen science, and human computation are working together with the help of gaming. Wednesday, October 7

And Trends from the Trenches—Bio-IT World’s annual deep dive into IT for life sciences—will celebrate its 10th year with an all-star panel including Vivien Bonazzi, Deloitte; Tim Cutts, Wellcome Trust Sanger Institute; Kjiersten Fagnan, Lawrence Berkeley National Laboratory; Matthew Trunnell, Data Commoner-at-Large; and—of course—Chris Dagdigian. Thursday, October 8

As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories says Terrell Russell, Chief Technologist, The iRODS Consortium at Renaissance Computing Institute (RENCI). With an open, policy-based platform, Russell argues that metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance. Data management should be data centric, and metadata driven. Wednesday, October 7

Michael C. Conway, Technical Architect, Office of Data Science, NIH NIEHS outlines NIEHS’ work to build its own Data Commons to manage today’s research data. Managing daily work while observing future trends, incorporating key capabilities, often in a tentative and piecemeal fashion, without losing sight of the big picture—this is the challenge we all face. Wednesday, October 7

Ian D. Harrow, Pistoia Alliance will report on building a new toolkit to help life science industry implement the FAIR (Findable, Accessible, Interoperable, Reusable) principles for data management and stewardship. It provides practical support by bringing together relevant methods for tools, training and managing change, which are illustrated by use cases mostly from life science industry. These elements are assembled together as one user-friendly and freely accessible website. Wednesday, October 7

In a collection of talks and a panel discussion, a team of Gen3 users and architects present their experiences building patient platforms with Gen3. Speakers include Robert Grossman, University of Chicago; Christopher G. Meyer, University of Chicago; Gabriella Miller, Kids First Data Resource Center; Allison Heath, Children’s Hospital of Philadelphia; William Van Etten, BioTeam; and Daniel Huston, Bristol-Myers Squibb Co. For an inside look at the collaboration with BioTeam and Bristol-Myers Squibb to set up a Gen3 data commons see Building A Commons: How Bristol-Myers Squibb And BioTeam Used Gen3 To Build A New Data Paradigm. Wednesday, October 7

Luis A. Mendez, Bristol Myers Squibb, will present advances in multiparameter flow cytometry analysis using machine learning algorithms. Both t-distributed Stochastic Neighbor Embedding (t-SNE) and FlowSOM algorithms are very effective in the comprehensive analysis and visualization of multiparameter flow cytometry data, resulting in a deeper understanding of disease biology at the single-cell level. Mendez will describe BMS’s cloud-based, high-performance compute environment coupled with GPU processing, deployed to overcome challenges with executing these CPU/RAM/GPU-intensive algorithms on large datasets. Wednesday, October 7

Rare disease patients suffer too often from long diagnostic delays and misidentified diseases. This creates a significant burden, not just for patients, but for healthcare systems. Tom Defay, Alexion, will share examples of collaboration with researchers and hospital systems to develop novel approaches for rare disease patient identification using tools like genomics, machine learning, and NLP. Thursday, October 8

Biomedical research over the last decade has become increasingly complex, and different disciplinary experts are needed to solve challenging scientific questions. No longer is a single disciplinary perspective enough for truly breakthrough research advances. L. Michelle Bennett, NIH NCI, explores how to build the most impactful interdisciplinary teams and how to keep them working effectively. (For more, see our conversation with Michelle at Training Scientists For Our Interdisciplinary Future.) Wednesday, October 7

John Quackenbush, Harvard Medical School, will outline how his group uses networks to understand genetic and genomic drivers of disease. By using innovative computational methods built around network representations of biological interactions, we can gain insight into the disease process, develop predictive biomarkers, and identify possible avenues of therapeutic intervention, he argues. Wednesday, October 7

Iman Tavassoly, Icahn School of Medicine at Mount Sinai, will present the mTOR system, a database he designed for exploring biomarkers and systems-level data related to the mTOR pathway in cancer. This database consists of different layers of molecular markers and quantitative parameters assigned to them through a current mathematical model and is an example of merging systems-level data with mathematical models for precision oncology. Wednesday, October 7

Alexander Sherman, Massachusetts General Hospital, will describe how MGH is pursuing patient centricity to bring such information together and bridge clinical trials data with RWD, such as data from EHRs, DNA sequences, image banks, biobanks, -omics, etc. We are introducing patient-centric approaches with a unique secure patient identification and aligning incentives for all players in a research continuum, including academia, industry, government, patient advocates, and patients. Wednesday, October 7

Exploratory visualizations generated from clinical trials and real-world data sources provide important insights into safety, efficacy, and biomarker responses to novel and standard-of-care treatments, says Philip Ross, Bristol-Myers Squibb. He’ll explain how automation of data updates in near-real time increases the impact of this information on decision-making and can drive clinical and biomarker exploration. Wednesday, October 7

Two presenters—Yan Ge, Director, Data Analytics, Data Science Institute and Erik Koenig, Principle Scientist, Translational Oncology, Head Strategy Innovation Management, both of Takeda Pharmaceuticals—will present how a knowledge-base analytics platform has empowered data-driven decision making and is transforming translational research. The Takeda R&D Data Hub has been established to maximize the value of data, make them FAIR, increase access for efficient analysis and to drive data-driven decision making. The Strategic Translational Oncology Research Knowledge-base (STORK) platform is a mission-critical strategic application leveraging both the R&D Data Hub and leading-edge Big Data technologies to harmonize the increasing data density of Immuno-Oncology Research and Development. STORK provides better catalogued and enriched biomarker assays data, allows researchers to intuitively and easily query internal preclinical data, clinical trials data, and external data like full-text literature and clinicaltrials.gov sources using NLP. Furthermore, STORK’s self-service visualizations enable more efficient benchmarking, cross comparisons, forward and reverse translational insights to support key decision-making throughout the therapeutic lifecycle. Thursday, October 8

Matthew Trunnell, Data Commoner-at-Large (former Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center) will present the Cascadia Data Discovery Initiative, and its goal to accelerate health innovation and cancer research through collaboration, data sharing, and data-driven research. Wednesday, October 7

The future of the intersection of healthcare and the life sciences will be data- and process-focused, not application- or software-focused. “Bringing the analytics to data” is the challenge from an infrastructure and methods perspective. According to the FDA, Real-World Evidence (RWE) is defined as “the clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of Real-World Data (RWD): e.g., effectiveness or safety outcomes from an RWD source in randomized clinical trials or in observational studies.” Sanjay Joshi, Dell EMC, leads a topical, honest, and “real-world” panel to discuss the sources of RWD (EHR, Claims & Billing, Registries, Patient Reported Data, etc.) and their process implications for RWE and the future of clinical trials themselves. Wednesday, October 7

Many critical facts required by healthcare AI applications are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks, like named entity recognition, entity resolution, de-identification and others, using novel healthcare-specific networks and models. Vishakha Sharma, Roche Molecular Systems, will discuss how Roche applies the greatest advances in AI for healthcare to extract clinical facts from pathology reports and radiology. She will then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale. Wednesday, October 7

Researchers use biomarker and outcomes data to model and predict adverse events. However, access restrictions to safeguard patient privacy necessarily slow down the rate of discovery and increase research costs via IRB review. Kimberly Robasky, Renaissance Computing Institute (RENCI), makes an argument for synthetic data that preserve patient-variable relationships. She will discuss current advances made by generative models in this area and the breakthrough AI technologies accelerating those advances. Wednesday, October 7

Digital transformation is still a driving principle in pharma R&D with the ultimate goal being to streamline processes and enable precision medicine. Anastasia Christianson, Janssen Pharmaceuticals, will present examples of digital technologies driving transformation and tangible results in R&D. Wednesday, October 7

While the value of FAIR data has been established–as well as the costs of un-FAIR data–adoption lacks easy routes. Tom Plasterer, AstraZeneca, presents the Pistoia Alliance FAIR data toolkit and Innovative Medicines Initiative (IMI) FAIRplus Cookbook which offer frameworks to start. Key decisions on what to name things (e.g., identifiers) and their semantics (e.g., vocabularies) are critical at journey inception. Once established, FAIR knowledge graphs and FAIR analytic services become enterprise data-centric enablers. Wednesday, October 7

In less than a decade, CRISPR has evolved from a bacterial immune system to the foundation of a powerful, flexible genome editing technology that has already transformed biomedical research, spawned a $10-billion biotech industry, and is poised to make major strides in the clinic. Kevin Davies, the founding editor of Bio-IT World, has spent the past few years working closely with the CRISPR community as the Executive Editor of The CRISPR Journal. In this talk, he shares highlights of his new book, EDITING HUMANITY, to be released October 6, which explores the genesis of the CRISPR revolution, its impact on the gene therapy field, and the recent scandal involving the birth of CRISPR babies in China. Wednesday, October 7

Lara M. Mangravite, Sage Bionetworks, will describe a radically open approach to diversifying the AD drug portfolio. Using multi-omic and genetic models of disease built from human brain data, a suite of emerging therapeutic hypotheses are generated that complement the small set already in drug development. To catalyze rapid evaluation of these targets, target enabling packages containing computational and experimental resources including prototype drug compounds are developed and openly distributed for use across the research community. Thursday, October 8

Tanya Cashorali, TCB Analytics, leads a panel discussion on how to create an effective data and analytics strategy answering the questions: Which data are actionable? What is the end goal? How do you build out an organization? Are you sure you know what problem you are trying to solve? How do you set up an analytics environment? She’s joined by panelists Lauren Young, Beam Therapeutics, and Heather Shapiro, Pear Therapeutics. Wednesday, October 7

The Hutch uses EasyBuild for building software containers and all software for its computer cluster. John Dey, Fred Hutchinson Cancer Research Center, will share more details about the software stack and how the Hutch shares its work with the global community of EasyBuild users. Wednesday, October 7

Carolina Nobre, Harvard University, reports on the state of the art in visualizing multivariate networks. Multivariate networks are made up of nodes and their relationships (links), but also data about those nodes and links as attributes. Most real-world networks are associated with several attributes, and many analysis tasks depend on analyzing both, relationships and attributes. Visualization of multivariate networks, however, is challenging, especially when both the topology of the network and the attributes need to be considered concurrently. Nobre will analyze current practices and classify techniques along four axes: layouts, view operations, layout operations, and data operations. She will also provide an analysis of tasks specific to multivariate networks and give recommendations for which technique to use in which scenario. Finally, she will survey application areas and evaluation methodologies. Wednesday, October 7

Ritu Kamal, Illumina, will present Illumina’s TruSight Software Suite, offering ready-made infrastructure to analyze and interpret rare disease variants. Powered by DRAGEN variant-calling, this software platform can evaluate all rare disease variant types within a single interface. Intuitive variant filtering, visualization and curation enable laboratories to perform streamlined interpretation and generate customizable reports. Wednesday, October 7

Biomedical researchers have access to many data sources but finding data with specific characteristics remains a challenge. Datasets have different metadata, format, and structure. The Broad Institute envision a simpler and more comprehensive search capability to allow researchers to find and reuse data across many datasets. Kathy Reinold proposes a cross-domain data model built specifically to facilitate search and reuse and will share methods, lessons learned, and status. Wednesday, October 7

Kjiersten Fagnan, Lawrence Berkeley National Laboratory, will present the National Microbiome Data Collaborative: A FAIR data resource for microbiome research. This multi-lab collaborative partnership will pilot an integrated, community-centric framework within 27 months to fully leverage existing microbiome data science resources and high-performance computing systems available within the DOE complex for data access, integration, and advanced analyses, Fagnan says. She will cover some of the challenges in microbiome data sciences and how the partnerships aim to overcome these by creating a large, open-access repository of FAIR data. Wednesday, October 7

In a pair of talks from Bristol-Myers Squibb speakers, Ajay Shah and Albert Wang will present Sage, a comprehensive platform for innovation with data and how BMS researchers use Sage to maximize real-world assets. Shah will start by giving an overview of essential components of the platform, such as uniform high-quality data ingestion, data lake enhancement with semantic integration conformance of data, and a reproducible research framework. Wang will highlight how Sage catalogs, models, integrates, conforms, and presents patient-level metadata across all RWD assets to facilitate downstream cross-dataset analysis within an integrated managed analytics environment. This talk will touch on the business drivers for this initiative, our current progress, as well as some lessons learned. Wednesday, October 7