AI-Based Approach Crunches Real-World Data To Emulate Clinical Trials

By Deborah Borfitz

February 4, 2021 | A computer scientist at The Ohio State University (OSU) recently demonstrated the potential of artificial intelligence to crunch real-world data and emulate randomized clinical trials, speeding the pace of drug repurposing. The initial use case was focused on preventing heart failure and stroke in patients with coronary artery disease, but the model could be applied to any disease with a definable outcome, says Ping Zhang, who leads the Artificial Intelligence in Medicine Lab at OSU.

Zhang is senior author of a recently published study in Nature Machine Intelligence (DOI: 10.1038/s42256-020-00276-w) where a deep learning algorithm ingested insurance claims on nearly 1.2 million deidentified patients to identify existing medications with a heretofore unknown therapeutic effect on coronary artery disease (CAD; e.g., diabetes drug metformin and antidepressant escitalopram, both of which were already being tested for their effectiveness against heart disease) as well as two medications (lisinopril and atorvastatin) found to be effective only when used together.

Claims data provided information on assigned treatments, disease outcomes and potential confounders, says Zhang. Inputs for drugs were based on their active ingredients, which may or may not occur in the same pill.

The high-throughput, computational drug repurposing framework factored in the sequence of events for individual patients—including when they were assigned a diagnosis code, went to the hospital, had a lab test done, or filled a prescription, he says. It also “upgraded” traditional propensity scoring methods to correct for most biases inherent to RWD, including confounding (demographics, comorbidities, and co-prescribed drugs) and selection biases.

As described by Zhang, the workflow involves extracting potential repurposing drug ingredients from the observational medical database given a disease cohort. For each ingredient, the framework identifies corresponding user (active drug arm) and non-user (placebo) sub-cohorts and computes the many confounding factors for patients in both groups.

The model tracked patients for two years, comparing their disease status at that end point to whether they took medications, which drugs they took, and when they started the regimen, he says. Treatment effects were estimated by mimicking a randomized clinical trial for each ingredient.

Researchers ran a simulated trial on each of 55 drugs identified in prescription drug claims, estimating their treatment effects, says Zhang. These included six medicines without a known CAD indication—metoprolol, fenofibrate, hydrochlorothiazide, pravastatin, simvastatin, and valsartan—and they were all within the top eight as ranked by their effectiveness against the disease.

Outperforming Preclinical-Based Methods

The new algorithm is designed to speed up hypothesis generation as well as reduce translational problems since observational data on humans, rather than pre-clinical data on animals, is being leveraged, says Zhang. For the CAD test case, the OSU research team demonstrated that its method could outperform three existing pre-clinical drug repurposing methods: construction of binary vectors for chemical structures and protein targets, and continuous vectors for chemical–protein interactome docking scores.

Although Zhang has not yet had any discussions with the U.S. Food and Drug Administration (FDA) about the new AI-based drug repurposing approach, it may only be a matter of time. Under the 21st Century Cures Act, the FDA launched a Real-World Evidence Program in 2018 that made plain its intention to start using RWD in support of regulatory decisions.

The recently emulated trial—using data on almost one-fourth of the U.S. population—is certainly larger than traditional randomized clinical trials, but whether it is better is not his judgement to make, says Zhang. He intends to next tackle use cases for more “difficult” diseases with limited therapeutic options.

CAD was chosen for the initial study because it afflicts so many people and has a lot of associated data, including information on multiple marketed medicines, he continues. But the drug repurposing framework could be applied to diseases such as Huntington’s disease and amyotrophic lateral sclerosis to potentially find medicines that are curative or treat psychiatric aspects that affect some patients more than others.

Working With RWD

To date, AI-based drug repurposing methods have focused primarily on the structural features of compounds or proteins, genome-wide association studies, transcriptional responses, and gene expression, says Zhang. Scientists in the computational biology department of GlaxoSmithKline were pioneers in the field nine years ago when he was interning at the company.

What is new here is the addition of longitudinal RWD as a data source—in this case, everyday phenotypic data such as whether patients have a disease, if they are taking a drug, their age and gender, and comorbidities such as acute myocardial infarction and cardiac dysrhythmias.

Secondary use of existing RWD like electronic health records (EHRs) and claims databases for drug discovery is a lower-cost, more scalable alternative to randomized clinical trials and “better represents heterogeneity in the population,” says Zhang. But without AI, the human mind could never wrap around the hundreds or thousands of variables that might be at play.

RWD also has many types of biases that need to be addressed, as OSU researchers have done by combining causal inference theory with deep learning methods, Zhang says. Among these are protopathic bias (e.g., an analgesic prescribed to treat the symptoms of an undiagnosed tumor that gets erroneously blamed for causing the tumor), indication bias (e.g., metformin is more likely to be associated with hyperglycemia in health records because it is a medication for diabetes), and selection bias (e.g., insured people or hospitalized patients in health records do not represent the target population).

While randomized clinical trials represent the strongest evidence for drug discovery, he adds, they are “costly, slow, and often impractical to generate evidence for many important questions.” As pointed out a decade ago in an article in Critical Care Medicine (DOI: 10.1097/CCM.0b013e3181f208ac), this is especially true in the challenging intensive care unit (ICU) environment.

OSU researchers have recently used AI and RWD from EHRs to predict which patients in the emergency department and ICU will contract sepsis, as described in an article (DOI: 10.1101/2020.09.21.20198895) that has just been accepted by Patterns. Many patients with COVID-19 end up in the ICU, says Zhang, but if they die there it is often due to sepsis.

Collaborators Wanted

For the new deep learning framework, researchers adopted a popular algorithm called long short-term memory (LSTM) traditionally applied to a classification problem, says Zhang, such as people who do or do not have a disease. OSU scientists instead used an LSTM-based “inverse probability of treatment weighting” (causal inference) approach to derive drug-disease associations.

The source code for the algorithm is available for download from the Github repository. Zhang says he hopes to find an academic or industry (pharmaceutical or IT company) collaborator to scale up the drug repurposing framework to a variety of different diseases, as well as develop new, more powerful algorithms combining RWD with information on the biological function of drugs.

Zhang's graduate student, Ruoqi Liu, and research assistant, professor Lai Wei, both at Ohio State, were co-authors on the Nature Machine Intelligence study.