Owkin’s Generative AI Vision for Understanding Biology

November 15, 2023

By Allison Proffitt 

November 15, 2023 | Generative AI in drug discovery is primarily a patient stratification play, says Thomas Clozel, CEO of Owkin. “People generate new molecules; that’s cool. But I think what you need to generate first with Gen AI is knowledge.” Identifying which group of patients might best respond to a particular drug can still be foggy at best. “Today what AI brings is a new way to characterize certain groups of patients that will see high value and benefits from a drug.” 

This is the problem Clozel hopes to solve with Owkin, a techbio company he co-founded in 2016. But while he calls Owkin tech-first and his drug discovery platform is in silico, he is prioritizes gathering new knowledge from patient biology.  

Editor’s Note: Owkin will be represented on the plenary panel at Bio-IT World Europe in London at the end of this month. For more information, see www.bio-itworldeurope.com.  

“A lot of people are doing generative AI without having a biology step: ‘I'm building a new molecule for fibrosis, but I haven't understood anything new about fibrosis biology,’” says Clozel. The Owkin approach, instead, prioritizes better biology through AI across the entire drug discovery and development pipeline.  

“We find new biology. We try to understand which biology matches the condition. We work with CROs to find the right molecule. And then we do improved clinical trials. We understand how to use the subgroups to do clinical trials. We use biomarkers from the subgroups to diminish the variability of trials and build new types of synthetic control arms, and then we can deploy the subgroups as diagnostic tools,” he says.  

Patient Data Strategy

Clozel is a physician himself, a former Assistant Professor of Clinical Onco-Hematology at Hopital Henri Mondor in Paris and former member of Ari Melnick’s lab at the Weill Cornell Medical College where he co-led several projects focused on prediction of resistance to chemotherapy in B-cell lymphoma. This background informs his prioritization of patient data.  

“Our data angle is patient data is better than other data to find discoveries because you have less functional gaps,” he explains. Discoveries in cell lines are notorious for not working in actual patients, he argues. “What we discover is right in the patient data from the academic hospitals, so you always start with the patient data.” 

Clozel is also a champion for multimodal data. “The reality is the best biomarkers is multi-modal. And if you want to understand the population that will benefit a lot from a drug using AI, you need this multi-modal approach. We use pathology, multi-omics, and clinical data usually to really incorporate a model.” 

Owkin accesses these varied patient data via a federated learning model. “We built this federated data network with the largest academic hospital in the world. We have 35 of the 100 largest hospitals in the world in Europe and the US,” Clozel explains, listing the Cleveland Clinic, New York University, Sloan-Kettering, Pittsburgh, UK’s National Health System, and Gustave Roussy in France as examples. “We never own data—we own zero—but we have access to the largest database and the patient samples. And the samples give us the possibility to enrich [the datasets] every year with new analyses,” he said.  

Owkin uses these data to train its generative AI models, a plan that the hospitals appreciate. “Hospitals don't want to be data brokers anymore,” he says. The federated networks lets hospitals keep their own patient data onsite, preserving privacy, but Owkin can benefit from the data volume. “We invest to create new datasets that the hospital can still use for internal research. It's a win-win. The only thing we keep is the IT and the discoveries of the model,” he adds.  

Output on Schedule

Owkin is taking an end-to-end, full stack approach, Clozel explains. The drug discovery space is a bit of a feeding frenzy, he said, referring to the number of AI companies focusing on drug discovery. Very few of the techbio companies are looking at clinical trials and diagnostics, he adds.  

Owkin aims for regular outputs of the model—both therapeutics and diagnostic tools. Clozel plans to release a “new phase one, phase two, or new diagnostic tool” to patients every year. Validation for the most recent—an AI-driven digital pathology pre-screening tool called MSIntuit that is aimed at optimizing the precision of diagnosis and treatment of colorectal cancer—was published in Nature Communications earlier this month (DOI: 10.1038/s41467-023-42453-6). Another diagnostic in the works predicts relapse of treatment negative breast cancer.  

Owkin’s first phase one therapeutic is being developed with a pharma partner. “We had a pharma that came to us saying, ‘We have a great molecule, but we don't know the right subgroup and the right indication, and we want to improve the clinical trial,’” he said. (Owkin has announced a few of its pharma partnerships including with Bristol Myers Squibb, Sanofi, and Amgen.) Owkin is using its AI platform to, “understand the right indication, which gets us to go to the subgroup and how we can improve the clinical trial.”  


Owkin’s most recent large project, MOSAIC, aims to create the world’s largest spatial multiomics dataset in oncology. Owkin believes that the convergence of AI and spatial omics combined with patient data will spark new cutting-edge research and discoveries in cancer. 

Spatial omics “is very good for AI because you have gene expression and structure of the tissue—pathology—on the same slides,” Clozel says. “You can really connect genes to tissue; it’s very fine for computer vision.”  

Launched in June, MOSAIC represents a $50 million investment from Owkin and collaborations with Gustave Roussy, the University of Pittsburgh through the UPMC Hillman Cancer Center, Lausanne University Hospital, Uniklinikum Erlangen/Friedrich-Alexander- Universität Erlangen-Nürnberg, Charité - Universitätsmedizin Berlin, NanoString Technologies, 10X Genomics, and more. 

The goal: a multimodal dataset built from 7,000 patient samples with seven types of cancer—a cohort 100 times larger than any similar dataset, the company says. For each patient sample, MOSAIC aims to collect data in at least six data modalities: spatial transcriptomics, single-cell RNA-seq, bulk RNA-seq, whole exome sequencing, digitized hematoxylin and eosin stain, and clinical data.  

Spatial omics data will change the way we look at oncology, the company believes. Owkin is partnering with NanoString and 10x Genomics for these data. NanoString gives a high-definition view of the data, he explains. The two 10x platforms—Visium for spatial transcriptomics and Chromium for single-cell analysis—give a broad look.  

Owkin and the MOSAIC partners will mine this data resource for immune-oncology disease subtypes in pursuit of biomarkers and novel therapies. They expect to discover new disease biology and better understand the tumor microenvironment, differentiate patient subtypes and tumor-immune cell interactions, improve diagnostics with new biomarkers, find patient-specific drug targets, and match the right patient with the right treatment.  

“If you want to bring precision medicine live, you need to go to understand which biology matches which category of patients and deploy drugs within this specific category,” Clozel says, and he believes MOSAIC will go a long way to helping illuminate the biology. “AI can really help to see this thing a bit differently.”