Chasing Digital Twins: Pilot Project Tackles Cancer Patient Twins in Neuro-Oncology

June 1, 2022

By Allison Proffitt 

June 1, 2022 | At the Bio-IT World Conference & Expo last month, a team of researchers reported on their 12-week pilot effort to build cancer patient digital twins to aid in oncologist decision-making.  

“To me what’s really groundbreaking [about this project] is the adoption,” said Eric Stahlberg, Director, Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research. “As we looked over this time, it’s how do you actually impact the clinical workflow in a favorable way that’s received well [by clinicians] rather than being seen as a threat.”  

Stahlberg and colleagues from Stanford University, the National Cancer Institute, and Lawrence Livermore National Laboratory outlined a framework for Cancer Patient Digital Twins last November in a correspondence in Nature Medicine (DOI: 10.1038/s41591-021-01558-5).  

They proposed a CPDT framework that would integrate, “individual-level data, such as proteome and clinical characteristics, with other factors like clinical trials and population studies to create a multiscale and multimodal data set for model training. To ensure rapid and comprehensive data integration, data must be captured under FAIR (Findability, Accessibility, Interoperability, Reusability) principles and across diverse populations to make sure that all patients equally benefit.” 

There are many active pilots underway, Stahlberg said, highlighting the Department of Energy’s efforts to start pilots at Georgetown University, University of South Carolina, Stanford University, UMass Amherst, and more. The particular project the panel explored is a consortium of groups that came together to improve neuro-oncology  

Neuro-Oncology in New Delhi  

This project began with a trip to India for Martin Deutsch about four years ago. Deutsch is co-founder of the Open Health Systems Lab (OHSL), a public benefit corporation that builds, supports and manages project teams to improve research and diagnosis outcomes. He traveled to India to meet Kunhiparambath Haresh, MD, All India Institute of Medical Sciences.  

OHSL’s goal was to build tools to help Dr. Haresh be a better doctor, Deutsch said, but he admitted that he was wholly unprepared for the challenges of medicine in India.  

Haresh practices neuro-oncology in New Delhi at All India Institute of Medical Sciences. His patients—many of them illiterate—come to his office with stacks of paper collected along their clinical odyssey. Hospital bills, notes from friends, and hand-written treatment reports are all mixed together, Haresh said. There’s rarely chronology and almost never radiology or pathology reports or genomic data, he said. These case files are difficult to read—“people have poor handwriting!”—and more difficult still to comprehend.  

Several years ago, Haresh along with help from Ken Buetow from Arizona State University, Anil Srivastava of OHSL, and Ravi Chamria of Zeeve, began brainstorming an online database to gather clinical features, treatment courses, and follow up details. “It is basically a structured database where we capture the information in specified columns that are in a machine-analyzable format,” Haresh explained.  

Currently there are more than 2,000 patients in the OHSL database, Haresh continued, and the number is growing day by day. This initiative has helped us solve many of our issues, Haresh said, but now is time to move to the second part of the initiative: to incorporate digital twin technology into the existing database.  

“We aim to have smart and lively electronic health records for neuro-oncology,” Haresh explained, organizing all of a patient’s data linearly, “so it’s easy for us to interpret… We wanted the system to be upgraded to a bigger repository for all the images, all the clinical data, so that it is a multi-omic platform for neuro-oncology patients.” Haresh envisioned including radiology images, genomic data, pathology photomicrographs, and clinical photos.  

“We should be able to get intelligent suggestions at each and every step of the journey of the patient with cancer,” Haresh said. He envisioned quick access to protocols and institutional guidelines for particular diagnoses. “If a system can give us an auto-summary in a short format—a few lines, four or five lines—we can refresh ourselves with the case and move on.”  

But more than just having data organized and accessible, Haresh hoped for digital twins or a digital family: real-time insight into similar patients—"similar for age, similar for tumor size, similar for mutations… similar for other treatments”—that could be used to model or predict the right course of action for the patient before him and offer treatment guidance based on the results from local cases. “The expectation is that whenever a new patient comes into the system, he or she is going to guide a future patient!” 

Haresh cast a bold vision. “In short, we wanted to achieve a one-stop solution for guiding neuro-oncology management.”  

Building Out the Platform 

The task of building out this digital twin fell to C3 AI along with input from Microsoft and Intel, but from the start Nicholas Siebenlist, Solutions Leader for Public Health at C3 AI, emphasized, “the value of Dr. Haresh’s involvement in guiding the translation from how this medicine is being used in the clinic to what that would mean for technological needs, because that sort of end-user engagement really helps drive adoption and informs the right way to build this application, not just at the moment but in the long run.” 

For this project, C3 AI devoted twelve weeks and two full time software developers. The application was intended to filter patients on clinical criteria, accelerate treatment planning with clinical decision support, develop AI/ML models from a unified data image, identify clinically similar patients, visualize similar patient disease trajectories, and read AI/ML identified patient-specific clinical literature.  

“As [Dr. Haresh] emphasized, there’s too much data and too many patients and not enough time,” Siebenlist said. “The real emphasis in on delivering the key pieces of information in a single place to accelerate the treatment decision and remove a bunch of the biases that come from the fact that we’re just humans, and the heuristics of experiences that are actually not the optimal solution in a lot of cases.” 

The platform took as data inputs chemotherapy encounters, DICOM images, genomic data, biospecimens, and more. C3 AI’s natural language processing abstracts key pieces of medical information from text files and clinical encounter notes to ingest details about treatments and other context.  

Choosing which data to include in the model—and at what thresholds—was done through lengthy conversations with the clinical team, Siebenlist said, again highlighting the importance of having the clinical team involved in every step of development. These data thresholds are arbitrary, Siebenlist conceded, but they serve as a starting point.   

Data harmonization was the most challenging part, Siebenlist said. We “leveraged existing interoperability standards, whether that’s working with the NCI Thesaurus or having a distribution of the data that uses the mCODE model, because we’re looking at other standards that people can connect to, particularly with APIs, because we want to accelerate the connectivity in that space.” 

C3 AI combined data from OHSL database with publicly available data from the Genomic Data Commons, then, using the C3 AI Suite, integrated the types of data and presented the patient digital twin.  

The digital twin is a visualization of clinically similar patients and groups of patients and their disease trajectories and timelines. Patient similarity scores show how closely two patients match, and the platform offers a human language breakdown of how that score was calculated. There isn’t a ground truth to the similarity model, Siebenlist said, but there is ground truth on each individual component, for example the genomic data or individual pieces of clinical information.  

Comparing a current patient to the most similar “twins” in the database informs decisions at the point of care and gives a more nuanced view of likely prognosis.  

Iterative Approach 

For the entire project, iteration and the ability to build over time is really necessary, Siebenlist emphasized. The Patient Digital Twin model is based on continuous improvement: new data for patients are constantly fed back into the model and external data contextualizing the patient are added as well to train, tune, and test the model.  

“With all works like this, what is going to be possible in one year is going to be very different from what is possible in two years, etc. Our ability to incorporate new pieces of information and continuously improve both the models and give individuals the tools to build them themselves and deploy and compare—this is how you empower users,” Siebenlist said.  

While this pilot is focused on treatment planning for cancer, Siebenlist outlined opportunities for the model in drug target discovery, molecular knowledge graphs, diagnosis, disease surveillance, treatment simulation, and clinical trial enrollment.  

Consortium members currently include OHSL,, Microsoft, Intel, Yale New Haven Health, Arizona State University, Internet 2, University of Virginia, and more. The public-private partnership is focused on bringing more individuals and institutions into the consortium, Siebenlist said.  

The consortium welcomes new datasets to integrate, new infrastructure to support the consortium’s federated learning model, and new clinician end users. “We are very concerned with continuous ethical review,” Siebenlist added, “so anyone who wants to be involved with either making sure that the data which goes into any AI/ML development is appropriate, whether the models ultimately achieve the algorithmic transparency and justice that we’re shooting for… these are all areas where individuals can contribute.”  

The intellectual property developed on the platform belongs to the consortium, and the goal is for users to be trained on the platform so they can further develop it.  

H. Kim Lyerly of Duke University and Open Health Systems Laboratory (OHSL) chaired the panel after the individual presentations, and advocated for the consortium model. “We think this would be a really exciting model for people to rally around because it would set a fantastic example and really meet a need that would have an impact,” he said. “Then commercial applications and opportunities afforded by the success would be opportunities that would extend beyond that.”