By Stan Gloss and Allison Proffitt
November 17, 2021 | As a “digital native,” Recursion is working to reshape the drug discovery industry. Quite literally. Gone is the vision of a traditional discovery funnel, wide at the top and slowly narrowing. Instead, Recursion proposes a drug discovery machine powering a T-shaped funnel where many options are considered, but intended failure is fast and the candidates that advance have a potentially higher likelihood of being a successful treatment.
Easier said than done.
The funnel image is pervasive because it is accurate. If a company tries to advance everything, they will, “see sequential winnowing,” says Imran Haque, VP of Data Science at Recursion. “There are some things that will fail at different stages and you're going to follow the traditional funnel.”
Recursion is taking on the challenge of creating a model of biology that will predict the failures and the successes. The Salt Lake City company, founded in 2013, raised $502 million in its April 2021 IPO on the vision of decoding biology with its drug discovery operating system.
“You can have the data lead you to where you actually need to go. So, you look at everything up front. And then you have the data to tell you, ‘We should not be considering most of these things, because if we did, they're just going to fall apart in late discovery. They're going to fall apart in pre-clinical. They're going to fall apart in Phase One. They're not going to be efficacious,’” Haque said.
Diverse input data helps. Traditional drug design, Haque observes, tends to start with either biology or chemistry: focusing on a biological pathway or a library built from interesting chemistry. It’s an inherently narrow approach, he says. So Recursion conducts millions of wet-lab experiments. But high data volumes don’t, alone, guarantee success.
“What are the things that you want to put forward that are going to go all the way rather than falling out downstream?” Haque said. “We want to be able to ask… the right questions to get the right answer.”
“Biology has historically been an extremely brute force science,” he continues. “You tinker with some things and you try to figure out [through] brute force experimentation, what will work, what will not. It’s difficult to engineer; it’s difficult to predict... If you have the ability to build models that are actually predictive, as opposed to simply treating biology as a black box, then you have decoded it. Then you know how the system will respond when you do something new, or you have a new condition. And you can go about manipulating that biological system in a much more efficient manner, right? Lower failure rates, higher rates of actually doing the thing that you want to do.”
The vision isn’t fully reality, Haque concedes. But the T-shaped funnel is the goal, and Recursion has built the company around empowering that sort of early—and accurate—predictive modeling.
Recursion calls the platform reshaping the funnel the Recursion Operating System, which combines an advanced infrastructure layer to generate what the company believes is one of the world’s largest and fastest-growing proprietary biological and chemical datasets, and the Recursion Map, a suite of custom software, algorithms, and machine learning tools that the company uses to explore foundational biology unconstrained by human bias and navigate to new biological insights.
The combination of wet lab biology and in silico tools works together as a suite of products explains Mason Victors, Chief Product Officer. “Think of a car and all of the components in your engine and your drive train and everything else. Those are all individual products themselves that really serve a greater purpose when they interlock together and work with other products. And so Recursion's operating system consists of a large suite of products, some explicitly named, some implicit that we've created along the way.”
Victors shares a few examples. “You can think of our high throughput automation lab as a cohesive product. It's a set of interlocking components for our assays that has been developed to be highly reproducible and reusable,” he says. Other examples include the company’s scaled CRISPR technology; Moving Pictures, which syncs data off instruments into the cloud; Pipeworks, which processes data in the cloud and scales computational nodes; and PhenoDetect, which identifies robust statistically significant pheno-prints or disease signatures.
Pharma has traditionally thought in terms of projects, not products. The weakness of that approach, Victors says, is, “you're not focused on reusability. You're not focused on extensibility or repeatability of something, it's one-and-done to get to the end result.” In order for Recursion to achieve a T-shaped funnel, data must be reused to refine future predictions. “Rather than just focusing on each drug candidate and how we move it along, we focus on how to build a cohesive machine that is able to be reused again and again and again, where data is both fuel into the machine and then comes out of the machine to refuel it,” Victors says.
The Soul of the Machine
“From our perspective, data is key, and so you need to be able to generate more relatable data in a consistent manner,” explains Ben Mabey, Recursion’s Chief Technology Officer. “It's not that you can just take a public dataset, and find the magic molecule, the right molecule. You have to be able to iterate on it and have really tight feedback loops between the dry lab and the wet lab.”
To accomplish that, Recursion has invested in systems to generate, characterize, and analyze data so that it can be reused: “fuel, not just exhaust”, Victors explains.
Haque’s data science team is embedded within the cross-functional groups of the company—for example, in high throughput screening alongside biologists and chemists. “One of our company values is something that we call One Recursion, which is that we’re all in this for one mission together,” he said. “For example, if it’s an early-stage discovery program, you might have one or two data scientists, you’ll have a couple of biologists, you’ll have some chemists, you may even have folks from the high-throughput screening lab, if there’s a huge amount of data generation happening in that project, all working together on that one goal,” Haque explains. They are building predictive models and developing the company’s maps “hand-in-hand” with researchers.
Mabey also sees the One Recursion view as a key differentiator. “The fact that you have teams of biologists, data scientists, and engineering, all working together as one—One Recursion—is a pretty key differentiator, I think, between us and other, more mature companies who have naturally siloed [teams] away,” Mabey says.
The result is a lot of data created and metadata tagged in a very purposeful way for reuse. It’s FAIR (findability, accessibility, interoperability and reusability), though Haque says the acronym isn’t frequently evoked. The practices are, instead, the culture of the company.
“The whole goal here is not to generate an experiment, and then use it once and never look at it again,” Haque says. “The point is to do experiments that really drive that cycle of learning—that you can use to build up a predictive understanding—and then feed that back in order to improve your model. If your data is not reusable, if it’s not findable, if it’s not accessible, you can’t do that.”
Named at Creation
Data are generated for the models, Mabey explains. “We know the model, that’s step one. Step two is then to go to the lab and have them generate the data for us. We run up to 1.7 million wet-lab experiments a week, and they’re all intentional. We know where it’s going to go in our models, and how we’re going to be using it.”
Metadata is crucially important here. Recursion has worked diligently to tag data coming off instruments—both in automated ways and with human infrastructure for quality control. “Being able to track the data lineage back to the lab is really key,” Mabey says, to the integrity of the predictive model being built. If something doesn’t look right, Recursion researchers can trace datasets back to their creation and look for any confounding variables. “It may turn out to be that it was the temperature of the lab on that date. If you keep track of all that provenance, you’ll be able to do things like that.”
Partly to lessen the lift for data science, the company has put standard operating procedures and templates in place so that data are always prepared for long term use. Following the templates ensures that data are analyzable.
“The lab is essentially a factory of biological data, and the robots, and all the other instruments in the machine, they produce logs,” Mabey explains. “Those logs turn into events that we send to the cloud. For example, when we take an image, that triggers an event, that then we upload that image to the cloud, and then that triggers another event, in which we process that image for QC reasons. And then once we had all the images for a plate, we then run that through our deep learning models.”
But Haque also emphasized that the company is not tethered to the SOPs. “If you’re looking to do something that is unusual or custom, we absolutely want to do that, because that’s where a lot of really interesting discovery happens,” he says. “But those are places where, ‘Hey, let’s partner with the data scientists. Let’s partner with a statistician or machine learning scientist, and make sure that we’ve designed the experiment, the number of replicates, the randomization, the controls, and so on, in order to be able to support the analysis that we want downstream.’”
There is more than just science in the Recursion secret sauce. Haque highlights the cultural advantages for his data scientists. “A lot of my academic friends will complain, ‘Oh, the statistician or the bioinformatician, they’re just the person who gets handed some mess at the end of the project and asked to clean it up!’” Haque says, “You don’t want to be in that kind of relationship; you want to have that partnership right from the beginning.”
But there are also challenges to working in cross-functional groups. “People are more comfortable in homogeneous settings,” Victors observes, where their colleagues speak the same languages and have the same challenges. “But it’s limiting,” he warns.
In fact, Haque spends a lot of his time working to find the right data scientists to help build, “this data driven future of drug discovery”. He’s looking for team members willing to take an asset or project approach and to design experiments always with the bigger picture in mind, which, in some cases, means throwing away low-performing data. “Sometimes the right answer is, ‘There’s nothing here.’ And you have to do a different experiment. But there’s often a lot of attachment or sunk-cost fallacy around stuff that you've already collected,” Haque observes.
In many ways, the culture of following the data—more than any operating system or funnel redesign—is what will drive Recursion’s future. “Are you going to be more attached to the hypothesis that you started with and the particular research direction that you want to explore? Or are you going to follow the mission that you had and where the data is actually leading you on that mission?” Haque asks. Only then will the tools help, “parse that whole set of questions that you could ask and narrow down to the ones that are most important and most applicable.”