Open Source ‘Floats All Boats’ in AI-Driven Drug Discovery
By Deborah Borfitz
March 17, 2026 | More than two dozen “grand challenges” remain in small-molecule drug discovery, spanning everything from target selection to developing a medicine that can be safely used in the clinic, according to Woody Sherman, Ph.D., founder and chief innovation officer at PsiThera. No single company can solve all these problems alone, which underscores the importance of computational tools, many of which are built on open-source software.
At the same time, drug discovery companies have grown increasingly interested in automated “closed-loop” systems that combine artificial intelligence (AI), machine learning (ML), robotics, and rapid bioassays to accelerate discovery. The goal is to optimize the design-make-test-analyze cycle in near real time.
These closed-loop cycles can be built with or without open-source tools, Sherman notes, but they typically involve large, interconnected workflows linking multiple software platforms with laboratory automation and continuously learning from experimental data. While the concept is compelling, most implementations today remain early and are rapidly evolving. Fully operational systems are still rare because building them is “enormously complex,” he says.
The topic will be explored in depth during a panel discussion at next month’s Drug Discovery Chemistry conference in San Diego. Sherman will join representatives from Terray Therapeutics, Eli Lilly, NVIDIA, and the OpenFold and OpenADMET consortia to discuss real-world progress with closed-loop systems and the enabling role of open-source software. The session will be moderated by Anthony Bradley, Ph.D., assistant professor of chemistry at the University of Liverpool and a contributor to the RCSB Protein Data Bank, a foundational dataset for structural biology, protein design, and AI-driven drug discovery.
Automation Challenges
Automation in drug discovery remains challenging, particularly in assays and chemistry. “A typical drug discovery program involves dozens of assays and complex synthetic workflows,” says Sherman. Reactions require physical reagents that must be ordered, shipped, and stored under different conditions. “Then there’s all the mixing, heating, and exploring various reaction conditions. Once you complete a reaction, you still must purify the molecule of interest.”
For these reasons, closed-loop systems have not yet been fully automated in production for drug discovery, although progress is being made. Sherman points to initiatives such as Ginkgo Bioworks’ Virtual Cell program, which aims to screen at least 100,000 compounds and generate billions of data points. That effort focuses primarily on assay data, while other groups are tackling the chemistry side of automation, often using tools with open-source components.
Closed-loop systems ultimately operate in the physical world, where materials must be purchased and instruments built. The companies developing them are typically well-funded ventures investing heavily in automation infrastructure.
Software layers within these systems, however, can often be open source. That flexibility allows researchers to adapt code for different assays, targets, or workflows. Commercial software can be easy to deploy, Sherman says, but it can be harder to customize. “If you need a new capability, you often have to bring in a specialist or submit a feature request that may not be a priority for the vendor.” Automation also becomes more difficult as biological systems grow more complex.
Layering in Physics and AI
PsiThera focuses on developing oral small-molecule drugs for inflammation and immunology targets that have traditionally been treated with injected antibodies. QUAISAR, a platform developed over more than a decade, uses biomolecular simulations to identify biologically relevant protein motions that reveal new opportunities for drug binding, identifies novel chemical matter, and optimizes drug properties to efficiently progress programs toward the clinic.
But computation alone is not sufficient, Sherman emphasizes. “We have wet labs to perform mechanistic biology studies, solve X-ray and cryo-EM structures, and medicinal chemists designing and synthesizing molecules,” he says. “It’s a fully integrated approach where the computational engine works alongside experimental science.”
Sherman’s career reflects that integration of computation and drug discovery. After earning his Ph.D. from MIT, he joined Schrödinger, where he eventually became head of global applications science. He later helped found Silicon Therapeutics, where the QUAISAR platform first began to take shape.
“We advanced a small-molecule STING agonist from concept to clinic in about three years, which is close to the speed limit for drug discovery when starting with novel chemical matter,” Sherman says. The company was acquired by Roivant, where he served as chief computational scientist before helping spin out PsiThera in 2022 to further develop the platform and pipeline.
Open-Source Avenues
QUAISAR integrates computational predictions with experimental work and relies heavily on open-source tools. Sherman says PsiThera often chooses to build new capabilities internally or adapt open-source software when tackling broad scientific problems.
“I’ve been a big proponent of open science since my days at Silicon Therapeutics,” he says. The company launched an Open Science Fellows program that partnered with academic developers of open-source tools. Those tools were scaled internally to enterprise-level software, and many improvements were contributed back to the community.
As Sherman puts it, “A rising tide floats all boats. You just have to make sure your boat is fast and agile, and that you have the right crew.”
Sherman recently became chair of the OpenFold Consortium, which aims to democratize AI for structural biology. The project began by reproducing Google DeepMind’s AlphaFold model after the original system became less accessible as proprietary development shifted to Isomorphic Labs. The consortium is now working to extend OpenFold capabilities beyond AlphaFold.
OpenFold operates within the broader Open Molecular Software Foundation ecosystem, which also includes projects such as the Open Free Energy Initiative for binding free-energy calculations, the Open Force Field Initiative for improved molecular force fields, and OpenADMET for open models and datasets addressing drug absorption, distribution, metabolism, excretion, and toxicity.
Other open-source tools used at PsiThera include GIST (grid inhomogeneous solvation theory), which analyzes the thermodynamics of water molecules within protein binding sites, a key factor influencing binding affinity and selectivity.
The original work on GIST by Tom Kurtzman at the City University of New York provided the foundation for water thermodynamics analysis tools now used within PsiThera. The company has also contributed improvements back to the open-source ecosystem, including code for binding free-energy calculations and its molecular dynamics engine STORMM (Structure and Topology Replica Molecular Mechanics), designed to scale efficiently on modern GPU architectures.
Scaling Solutions
A major challenge for AI models in drug discovery is data scarcity. As a result, some organizations are exploring federated learning approaches, where models are trained across distributed datasets without the underlying proprietary data ever leaving company servers.
One example is the AI Structural Biology Initiative (AISB), which aggregates structural data from multiple pharmaceutical companies while protecting sensitive information. Eli Lilly’s TuneLab platform similarly allows partners to access machine learning tools trained on Lilly’s internal models in exchange for contributing data.
Other efforts are focused on generating new datasets. Ginkgo Bioworks’ Virtual Cell initiative is one example. The Broad Institute’s open-source CellProfiler software for analyzing high-throughput cellular imaging data is another.
“There are open-source tools for almost everything,” Sherman says, “but they vary widely in quality.” Many originate in academic labs with strong ideas but require additional engineering, parameterization, and data to scale to enterprise environments.
Despite those challenges, Sherman describes the drug discovery open science ecosystem as healthy and growing, similar to the evolution of the Linux operating system, which began as an academic project and later became foundational enterprise infrastructure supported by companies like Red Hat.
Major industry players have also contributed to open-source software in drug discovery. Novartis developed The RDKit, now one of the most widely used open-source cheminformatics platforms. NVIDIA has invested heavily in optimizing open-source software for modern GPU architectures through its BioNeMo platform and inference microservices.
Open datasets also play a critical role. The RCSB Protein Data Bank, for example, provided the structural data that enabled AlphaFold’s breakthrough performance.
Ultimately, Sherman says, open-source tools are a powerful starting point, but the goal is not software itself. “The endpoint isn’t building tools,” he says. “The endpoint is discovering new medicines. That requires combining the right tools with the right people and experimental capabilities in an environment that can scale.”
Editor’s Note: Woody Sherman will also be a panelist for a session entitled “The Collaboration Breakthrough: How Federated Learning Is Rewriting the Rules of Drug Discovery,” at the Bio-IT World Conference & Expo May 19-21 in Boston.


