#BioIT18: Data Science Definitions

May 29, 2018

By Joe Stanganelli

May 29, 2018 | Data science was one of the driving themes at this year's Bio-IT World Conference & Expo—so it only makes sense that one of the plenary keynote sessions focused on defining what that "data science" actually means.

Kicking off day two of the conference, four data-science executives convened for a panel where they were asked to present their own respective definitions of this murky buzzterm—and expound upon these definitions further. Here's what they had to say:


Cashorali: Optimize the Results

DataScience1

This hedging answer by TCB Analytics founder-CEO Tanya Cashorali—addressing what data science can be and what its goals are instead of what it actually is—reflects her philosophy on the fluidity of data-science practice.

"I don't think anybody really has a gold standard of what data science is," said Cashorali." "It's everything from… the dirty janitorial work to summarizing the core data."

More to the point, Cashorali emphasized her view that data science is (or should be) more outcome-driven than process-driven.

"If you can't make a decision or take an action on your analysis, it's essentially rendered useless," said Cashorali. "Ultimately, it's all about the quality[.]"

On this point, Cashorali shared one of her war stories about a manufacturing arm of a pharmaceutical company that would shut down their production for six months whenever there was a corruption in the batch—while a team of five would attempt to diagnose the problem by manually plugging in SQL queries. After Cashorali was able to help them better visualize and connect the data they had, she reported, the upshot was that batch corruptions now only require a single person to address the problem in less than a day.

"It's really like an insurance policy that you're buying," said Cashorali. "This stuff is just thinking with common sense about your data and how [to] optimize processes like this."

 

Reynders: Optimize the Question

DataScience2

At the other extreme, Alexion's Vice President of Data Science, Genomics, and Bioinformatics, John Reynders—right or wrong—presented perhaps the most rigorous "data science" definition among the panelists. Moreover, for Reynders, data science is about the process instead of outcome.

"What data science is not, once you have the answer, [is asking], 'Well, how do we repeat this? How do we build a tool? How do we build a platform?' That's very different from data science," said Reynders. "Data science is: 'What is the question that you're trying to ask?' … [Then] you ask the question, you use all these tools that you see on the screen, and you've got the answer."

The corollary to this, Reynders explained, is that data science is about identifying askable and answerable questions.

"We have an incredible toolbox at our disposal," said Reynders. "The first step is not asking the question, but helping colleagues know what questions they can [ask and] answer."

 

Schindler: Democratize Understanding

DataScience3

As much as Reynders may have found his position at odds with Cashorali's, Jerald Schindler, Alkermes's Vice President of Biostatistics, disagreed with Reynders much more (or agreed, depending upon one's point of view). For Schindler, data science is neither process nor outcome—which is to say that it is process and outcome…and the kitchen sink.

"I think [data science] is the whole package. It's everything. It's not just the people at the end of the line… [but it's also, respectively,] people who are collecting the data, storing the data, building the process and platforms to store the data… [and] analyzing the data and looking at it to make insights and decisions," said Schindler. "All these people are data scientists."

As fluffy as this may seem, Schindler had a justifiable bone to pick with the real-world impracticalities of the academic explanations of his peers.

"I I think people sometimes get bogged down in the formula of the algorithm," said Schindler. "It's not the formula that you use as the explanation—[so] why did you use it?"

Indeed, for Schindler, data science appears to be as much about understanding as it is about flexibility.

"I've asked people, 'Why did you pick this value to represent what you want to do with that data?'" said Schindler. "They say, 'Well, that's what they always do!'"

"Why is it appropriate for the situation that you're in?" Schindler continued. "There are people who are so busy that all they do is push buttons [and] run the right program… You really have to understand what's inside."

Accordingly, Schindler's explanation of data science is reconciled with the caveat present in his original definition—that data science stems from a cross between two rigorous STEM fields where logic is everything.

 

Yu: Optimize and Democratize Everything

DataScience4

Among this respectful discord, Lihua Yu, H3 Biomedicine's Chief Data Science Officer, found common ground.

"I don't disagree with what people have said already," said Yu. "I agree with everyone; data science is everything… Data science has to focus on the questions…and, most importantly, the solutions."

What Yu added to these competing definitions of data science: that data science represents, on top of everything else, enablement via communication and facilitation. Yu described vigorous conversations in which her data-science team gets internal business partners to think about and properly reframe their questions—leading to better results.

To wit, Yu perhaps found herself most in alignment with Schindler—right down to his caveat on understanding; Yu related that she has a sign in her office that reads: "If you cannot explain it simply, you do not understand it."