Trends from the Trenches: Data, Discovery, Politics
By Allison Proffitt
May 13, 2025 | Ari Berman, CEO of Bio Team, opened the Trends from the Trenches session at the Bio-IT World Conference & Expo by following the data journey from generation to application and outlining the pain points along the way. At the end of his presentation, he got political, challenging the audience to stay engaged in science and the scientific community no matter the political winds.
Berman began with a striking historical perspective, noting that before 1865, the average human lifespan was just 35 years. Today, the average American lives to 79.3 years—a dramatic increase largely attributable to advancements in life sciences research. "The first major inflection point in 1865 was the onset of germ theory and its subjects in surgery, learning how to wash our hands double human lifespan," Berman explained, underscoring how fundamental scientific discoveries have transformed human health.
The Data Explosion Challenge
A central theme of Berman's talk was the unprecedented volume of data being generated by modern scientific instruments. He noted that humanity produced approximately 24 zettabytes of data last year alone—the equivalent of 12 gigabytes per person per day. Life sciences research is a significant contributor to this data deluge.
"If you just multiply that out, you have to be prepared as a nation for 166 exabytes per year being produced just from laboratories," Berman said, emphasizing the scale of the challenge.
This explosion of data comes from numerous sources including genomic sequencing, high-resolution microscopy, high-content screening for drug discovery, and various scanning technologies. Each laboratory might generate over two petabytes of data annually, creating massive storage and analysis challenges.
The IT-Science Disconnect
The storage and analysis required to manage data of that volume cannot be an IT problem, Berman said. He called the fundamental disconnect between IT departments and scientific researchers a persistent problem. "Low IT budgets, or flat IT budgets, it’s a cost to be controlled. There's not enough infrastructure for scientists, and so they do it themselves, or find someone to do it for them," he explained, resulting in the "shadow IT" phenomenon.
This disconnect manifests in several ways:
- Low mission alignment between IT and organizational goals
- Poor communication between scientists and IT professionals
- Mutual mistrust about resource needs and allocations
- Lack of understanding about the importance of computing as a laboratory tool
"The reality, though, is that high performance computing, computing in general, advanced computing, it is a laboratory tool now," Berman emphasized. "And it needs to be treated as such by IT, by an organization, because you cannot do modern analytics without some form of advanced compute."
The Data Stewardship Challenge
Another critical issue Berman addressed was the struggle to implement FAIR (Findable, Accessible, Interoperable, Reusable) data principles in life sciences. Despite recognizing the importance of these principles, implementation remains limited.
"Why can't we get to FAIR data universally right now? The short answer is people, not technology," Berman stated bluntly. He criticized the prevailing culture where data stewardship is delegated to specialists rather than being embraced as a responsibility by the scientists generating the data.
"If you're generating data, it's your job to take care of it. It's your job to be a good data steward and have a mature data culture. Asking IT to curate, maintain, and make decisions about your data is crazy," he argued.
Science DMZs, an approach BioTeam has recommended in the past, are not a solution either, he said. “Science DMZs on the network, which is basically move all your science outside of the main network and make that a fast path for your data, it's a band-aid. The guy who invented it says it's a band-aid.”
Cloud, of course, is another solution. Berman described a "resurgence of on-prem versus cloud-based planning" in scientific computing. While cloud providers like AWS, Google Cloud Platform, Azure, and Oracle offer increasingly sophisticated high-performance computing options, he recommended a hybrid approach for established organizations.
"Build on-prem HPC for things that are 80% utilization or greater. That's going to be a much, much more cost-effective way to do things," he advised, while suggesting cloud resources for less frequently used applications.
AWS still has the most mature cloud services for life sciences services, Berman reported, but highlighted Oracle’s offering as particularly intriguing.
“They have a vast and mature HPC offering that understands life sciences,” he said. “They have dense and available GPUs and partnerships with all the other cloud providers, which is interesting. They even waive egress fees to and from GCP and Azure.”
What Really Works
The real solutions to data volume problems are to create a dialogue between IT and research, Berman said, seeking to understand science’s needs, understand the data inventory, and develop an actionable strategy. He recommends listening to consultants and internal champions—the squeaky wheels who have been flagging problems all along.
In true Trends from the Trenches fashion, Berman named names. Starfish Storage and Hammerspace can help with users understand how and where data are distributed. Hybrid storage and compute environments are preferred he said; you can and should have both cloud and on-premises options. Use data platforms to consolidate data, data management policies, accessibility, and usability. Use open standards. Use archive tiers and software-defined storage systems in concert with data platforms, he advised. “We see a lot of VAST, a lot of WEKA out there and they’re good systems and they work.” Invest in your network, he emphasized, recommending modern software-defined networking, zero-trust networks, and micro-segmentations.
“If we don't solve this problem, you can't get to FAIR, AI, or any of the other ‘Promised Lands,’” Berman said. “Don't focus on AI readiness. Make your data analysis ready. Do all of your data… Recast IT services as a mission capability and make investments on it, not an IT budget to be controlled.”
The AI Incentive
In some ways, Berman said, AI hype is actually helping incentivize data stewardship and data management. “Now people are like, oh crap, we have to do this, or we can't do AI,” he said. “So it's getting marginally better.”
But he still characterized AI as "a buzzword meant to make money fall from the sky" and placed most AI technologies at the "peak of inflated expectations" on the Gartner hype cycle.
For organizations pursuing AI initiatives, Berman outlined several prerequisites for success:
- Budget allocation for people, compute, and storage with an identified return on investment
- Well-defined use cases with testable outcomes
- High-quality curated training data
- Advanced computing infrastructure with GPUs
- Expertise in AI engineering
He also reminded the audience that AI represents less than 20% of all life sciences workloads, cautioning against neglecting the other 80%. "Only 30% of life sciences codes are accelerated for GPUs," he noted, warning that exclusively focusing on GPU infrastructure would "alienate 70% of your researchers."
Looking to the Future
In the final portion of his talk, Berman expressed concern about impacts of political changes on life science research, particularly cuts to agencies like NIH, NSF, and FDA.
He highlighted the importance of diversity in science: "Diversity is not an initiative, it is a scientific fact. People from different backgrounds think differently, and they have different perspectives that are required in order to solve the world's problems."
He also predicted a flood of talent coming to the job market from funding cuts. “At every organization I've talked to, layoffs are being planned, okay, because they don't know how they're going to do it. Scientific progress is going to slow. The blast radius is huge… If you can hire, grab the incredible talent that's going to flood the market very shortly.”
Finally, he urged the community to persist in supporting scientific progress despite uncertainty. He recommended posting datasets to public repositories—anonymously if necessary—so they won’t be lost, and actively looking for alternative funding sources.
"Keep doing and supporting as much science as possible for as long as you can," he encouraged. "Educate as much as possible. Help the general public understand the benefits to them of scientific progress."