Atomic Changes: Reshaping our Data Journey in Small Steps

By Allison Proffitt

November 22, 2022 | Mike Tarselli describes his as a “journeyman’s career in the life sciences.” He started college pre-med and got a job as an emergency medical technician. “I realized things like blood and wounds made me hit the deck,” he said. “That’s not a good career path, maybe, if you’re going to faint when you see your proximal case in front of you.”

Tarselli needed a new plan, so he pursued synthetic chemistry, interned with biotech and pharma, and finally decided that the data world offered the most fascinating breadth of projects. Today, Tarselli is Chief Scientific Officer at TetraScience.

He recently sat down with Stan Gloss, founder of BioTeam, to chat about data for Bio-IT World’s Trends from the Trenches podcast and explore the most recent trends in how the life sciences is talking about and thinking about data.

They started with an idea that Gloss has been exploring for the past year: the theory that while small biotechs view data as fuel, big pharma generate data almost as a byproduct of their drug discovery efforts.

With his broad experience, Tarselli’s take was generous. He recalled Jay Bradner, then-president of the Novartis Institute of Biomedical Research, addressing that issue in a company town hall.

“He stood up and instead of saying, ‘Let’s talk about our drug programs. Let’s talk about our modalities or our basic research.’ He said, ‘Guys, look. We produce data. Data is our outcome… We’ve got to get to the data, and we’ve got to understand how to house it, what to do with it, etc.’ It was very refreshing to hear a large pharma leader coming around to that.”

That was 2018, Tarselli said, but he is still not sure it’s the primary approach in most large pharmas. The perspective is probably more common for small biotechs without the stability and existing product line that pharma has.

Another shift in the drug discovery mindset has been moving from a project mindset that yields outputs to a product mindset that yields outcomes, Gloss continued. Considering a “product” anything that can be consumed, Gloss challenged researchers to think about any dataset as a product that future users will compute against. “Wouldn’t the person who’s doing a complicated AI and machine learning run benefit from us thinking of the dataset they’re about to receive, [and having] pride in producing a high-quality product?”

“I love the product mindset; it’s very valuable,” Tarselli agreed. No one consumes projects but other teams, investors, and project managers. Products, on the other hand, are made to specifications that are high quality and can be reused, he added.

But Tarselli raises the bar further. Shared Data Use policies—another recent buzzword and impactful effort—still aren’t productized. Keeping some gels and raw data is not, “an encapsulation of what that data means and how you’ll wrap it up so that it’s useful for the end consumer the way it would be if you put it, say, as a package on GitHub, or as an open source Python library,” he said. “Those are products.”

Gloss and Tarselli joked about companies announcing new initiatives as data-driven discovery companies—"Weren’t you already data-driven? Haven’t you always been deriving data in some way?” Gloss quipped. They suggested data-informed instead. “Ultimately what we’re really trying to get to… is man-machine symbiosis: this whole idea of augmented intelligence where the human and machine work together as a partnership,” Gloss said.

Start Small

That vision will require data we can put to work, and Tarselli outlined his advice for that transformation.

“First, pause. Pause and look around,” he said. “Every datum that you’re going to generate is a reflection of a human or physical or mental process. You should probably look around and say, ‘Is this the way I want my lab to run, I want my process to run, etc.,’” Tarselli said. He recommends “getting your process house in order” first, which may mean acquiring new equipment or automating steps in your process.

Only then does he advise looking carefully at data: What kinds of data come in? From where? To whom? He recommends thinking up front about universal unique identifiers (UIDs), standard ontologies, definitions, and glossaries, and standard APIs for microservices. “Doing these very, very basic things sounds a little bit pedestrian, but if everybody did these little 1% and 5% hacks here and there, you would have a 20x better laboratory in two years.”

Digital or data transformation initiatives tend to fail when they go too big, Tarselli noted. Changing the whole business is a project that is too large and will be overwhelming for teams. The secret to success, he believes, is small, additive changes.