Data Visualization's Use In A Data-Driven World

April 4, 2019 | Complex concepts require researchers to communicate information clearly and efficiently. Data visualization provides accessibility within the complex amount of data researchers are working with in numerous areas, such as drug discovery and clinical development.

No one understands that more than Jeremy Goecks, an assistant professor of Biomedical Engineering and Computational Biology and head of the Goecks Lab at Oregon Health and Science University. Goecks’ lab help lead development of Galaxy, a Web-based scientific analysis platform that is used by thousands of scientists throughout the world to analyze large biomedical datasets, including genomic, proteomic, metabolomic, and imaging data.

As medicine becomes frequently data-driven, data visualization will be key to the progress of its various fields.

On behalf of Bio-IT World, Mana Chandhok spoke with Goecks about data visualization’s impact on oncology, various challenges he encountered as he developed Galaxy, and what technologies might revolutionize the field in the next 5 years.

Editor’s note: Mana Chandhok, a Conference Producer at Cambridge Healthtech Institute, is planning a track dedicated to Data Visualization & Exploration Tools at the upcoming Bio-IT World Conference & Expo in Boston, April 16-18. Goecks will be speaking on the program. Their conversation has been edited for length and clarity.

Bio-IT World: What is the one thing that you want to tell the world about data visualization, especially in the oncology field?

Jeremy Goecks: Oncology has become increasingly data-driven, using tumor molecular profiling to understand and guide treatment decisions. Good visual design and visualizations are key tools that help physicians and patients understand and use molecular results appropriately for making treatment decisions. Complex concepts such as gene copy number gain/loss and pathway activity levels require effective visualizations that report individual patient results, show those results in the context of a larger cohort, and connect results to potential treatments and likelihood of success.

You helped to develop Galaxy, a Web-based scientific analysis platform. Can you explain some of the challenges that you encountered and some of your key takeaways for others that are developing platforms that analyze many types of data?

Ensuring that Galaxy can adapt as new data types, analysis tools, and modes of computing (e.g. cloud computing) are developed for biomedical data science is an ongoing challenge. The Galaxy team addresses this challenge by using a modular software architecture that enables novel data types, tools, and compute resources to be “plugged in” to Galaxy in a general way. Another key challenge is scaling Galaxy to support analyses of large numbers of datasets such as entire patient cohorts, which are increasingly common as it has become easy to generate genomics, proteomics, and imaging data for many individuals. For the past several years we have worked to extend Galaxy so that users and bioinformaticians can analyze data in “collections” of hundreds or thousands of files that can be processed as a single unit. This greatly simplifies analysis of large numbers of datasets.

What new technologies or strategies do you think will revolutionize the field in the next 5 years and why?

Visual analytics, the coupling of interactive visualization with analysis tools, will dramatically speed up the data analysis loop and enable much more rapid data exploration, hypothesis generation, and evidence-based decision making. Visual analytics will be aided by increased access to large computing clusters and cloud computing through a Web browser. In oncology, multi-scale computational modeling—integrated models from the individual cell up to the entire tumor microenvironment of tumor, stromal, and immune cells—will enable us to understand and predict tumor behavior at much higher fidelity than we currently can as well as more rapidly design new treatment approaches. Biomedical computing will benefit from robust workflow platforms that enable scalable and reproducible computing on large datasets, making it possible to perform unbiased comparisons of different methods across tens of thousands of datasets and integrate numerous well-performing approaches to produce the best results.