Building AI for the Drug Discovery Learning Problem

By Allison Proffitt

October 19, 2022 | BERLIN—At the Bio-IT World Conference and Expo Europe yesterday, Richard Law, chief business officer for Exscientia, kicked off the event with an AI reality check. The term AI barely means anything anymore, he said, but he’s not the least bit soured on the possibility. We just need to shift our expectations.

We all know the bleak numbers. Drug discovery takes an average of 13 years, fails 96% of the time, and costs about $5 billion per successful drug, depending on how you crunch the data. At the height of the hype cycle, AI was (and still sometimes is) hailed as a panacea for those numbers: AI will be able to “fix” drug discovery: 0% failure!

“The expectations have become slightly ridiculous,” Law said, but if the current drug discovery success rate is 4%, even small improvements would be dramatically helpful. “If you changed it to just 10%, you would get a 600% increase on the return on investment. We’re really talking about how can we make better decisions so that we—over time—change this paradigm of failure to one of slightly-more-often-success.”

Law believes that is happening now, though there has not yet been an approved AI-designed drug. But he’s also realistic about the burden of proof. Even when there are approved AI-designed drugs, companies like his will still need to show their work.

Encode and Automate

Exscientia, Law explained, calls itself a “pharmatech” company—half pharma, half tech—and of the 500 or so employees, the breakdown backs up that sentiment. The company does most of its science and its AI in-house. “Our mission is to encode and automate every stage of the drug design and development process,” Law said.

Exscientia was founded by Andrew Hopkins, now CEO, and a small team of co-founders out of his lab at the University of Dundee in 2012. Hopkins and the others had just published a landmark paper in Nature Chemistry on the druggable genome, and the vision for AI-enabled drug discovery was still in its infancy. “Noone was talking about AI in drug discovery ten years ago,” Law said. The Exscientia team did not raise outside money for the first five and a half years, preferring instead to “build the proof points of the company” before investors starting seeking their own ROIs.

“What they were basically doing was running around pharma companies doing tiny comp chem service projects to get money in the door to help them develop a platform,” Law said.

But those projects started yielding larger pharma partnerships with Sumitomo Dianippon, Sanofi, BMS, and more. Those early pharma partnerships then led to larger, round-two agreements as well as more formal funding rounds and investments from Evotech (where Law was working at the time) and others.

The company’s IPO, in October 2021, raised $350 million with additional concurrent private funding. In January of 2022, Exscientia announced an additional agreement with Sanofi potentially worth $5.2 billion to develop an AI-driven pipeline of up to 15 novel small molecule candidates across oncology and immunology.

Currently, Law said, the company has four molecules in the clinic—all for central nervous system indications—and two more coming soon in oncology.

The System You Build Around It

Perhaps one of the most attractive things about Exscientia in the current landscape of life sciences AI companies, is its insistence that AI should never be a black box. “We still need to prove that it works,” Law said. “And I think we’re going to be asked to prove that it works constantly for probably the next decade.”

For the IPO, Law recalled, “We spent six months—I tell you six months!—writing the F1. It was really terrible… but the reason we wanted to do that was because we wanted to be as open and as genuine as we could, and try to actually explain to people how it works.” The team didn’t actually publish their algorithms, but Law said it was important to them to explain the kind of AI they use and how it fits into a larger system.

Even if the company did choose to give away the code, Law contended, “People still wouldn’t be able to use it and it wouldn’t make any difference, because it’s about the system that you build around [the AI] and how you use [all of it].”

The Exscientia secret sauce, Law says, is how the company approaches the problem of drug discovery. It is not, he emphasized, a screening problem. It is not even a big data problem. It’s a learning problem.

“This, I think, is one of the biggest confusions in how people try to apply AI to [drug discovery],” Law said. This isn’t Tesla, where you already know what you want the AI to do and you have trillions of similar data points to use to train it, he said. “At the beginning of the project, you actually have very little data on the specific project you’re working on… This is a sparse data matrix problem.”

At Exscientia, Law says, the system is flipped. Instead of the human telling the AI what to output and judging whether the output is successful and correct, the human’s job is to make sure the AI has the right input data. And from there the AI system is designed as a non-linear learning process, iterating on its own conclusions.

Another key point in the system: different AI tools for different parts of the learning loop. The core is molecular design: “generative learning and active learning to design small molecules,” Law said, or “the chemistry AI.” Different AI systems are used in target selection (a knowledge graph-based platform called Centaur Biology) and patient microscopy image analysis.

Finally, Law is extremely dismissive of starting the research process with cell lines or animal models. “Animal models are bad. We kind of all know this, but every single project involves one anyway.”

Instead, the Exscientia approach begins with patient data: “using the patient data to select what we even start to work on in the beginning,” he explained. “When we start a drug discovery project, we actually have already selected the patients we’re going to work on before we do a single assay set up, before we do a single round of molecular design.”

Learn First, Optimize Later

The whole ethos of the company is designed around creating better, more effective, more patient-specific drugs, Law said. “It’s just that, as a positive side effect of how the AI works, ‘faster’ happens as well.” Faster is useful, he said, because it allows you to test more hypotheses, find out when you are wrong quicker, and gain competitive advantage by using other people’s data to catch up.

Law reports that the start of project to the synthesis of the development candidate takes about 12 months. The bulk of that, he added later, is in the synthetic chemistry, a step Exscientia is also working on automating via robotics. He calls this speed a “massive improvement” over the industry standard.

To ensure better, not only faster, Exscientia relies on generative and active learning. Initially, at least, the AI is, “not trying to make better molecules, it’s trying to learn.”

Projects usually have around 2,500 models, and the goal, Law said, is to generate models that are useful, not good. Generative design is driven by learning, and the system scores itself. Models are assigned an extremely harsh merit score—“basically a multi-parameter optimization score”—based on the model’s ability to improve the usefulness of future models, not whether or not it’s “good” on its own.

Law likened it to being dropped out of a helicopter in an unknown land with an unfamiliar map. While humans may strike off toward wherever they guess “home” to be—right or wrong—the AI system’s first goal is not to get home. The first goal is to figure out where it is on the map, to learn the topography and the landmarks.

The first 3-5 molecule design cycles are just purely exploration, Law said. Only then does the system start to optimize.

AI is just a problem-solving tool, he stressed. But while humans struggle with more than just a few parameters, AI excels at taking all of these competing parameters and charting a path.

“It’s not designing molecules to be incrementally better; it’s designing molecules to optimize learning. What this ends up doing is creating—sometimes—really interesting solutions,” Law said. “And that’s what’s really, really cool.”