Unchanging Rules Of Gene Expression Could Improve Drug Approval Odds
By Deborah Borfitz
August 13, 2021 | Network theory holds that everything is connected, including people (e.g., Facebook), but few will have many connections, and most will have few. The same rule applies if the “nodes” happen to be human cells, genomes, proteomes, or transcriptomes, says Pradipta Ghosh, M.D., professor in the departments of medicine and cellular and molecular medicine at the University of California San Diego (UCSD) School of Medicine as well as cofounder of the Institute for Network Medicine (iNetMed) endeavoring to chart the most powerful connections.
It is “pure, algorithm-run precision math,” she says, and an entirely new approach to big data science that could erase the declining odds that a drug entering clinical studies will exit a winner. “Sadly, more often now than in the 1970s… biologists have been wrong” when subjectively making a hypothesis based on what has been published in the literature or observed under a microscope.
Given the big data explosion of the past few decades, and substantial reductions in the cost of genome sequencing, “we should have done better,” Ghosh says. Big data has not lived up to its promise because computer scientists are not trained in basic biology or medicine and biologists don’t question a target when they’re told to run with it.
Two engineers and two biologists, one of them (Ghosh) also a physician-scientist, started iNetMed to address the “big disconnect,” she says. Researchers here are focused on righting the ship by mapping the network connections to identify good drug targets and the best animal models to use in preclinical studies, as well as providing cross-disciplinary training in the methodology.
In the same vein, Ghosh adds, “engineers and mathematicians are huddling with the biologists to understand the most decisive parts of our eukaryotic cells’ communication network and use those insights to build smarter machines.”
The iNetMed approach identifies drug targets based on invariant rules of gene expression that are fulfilled by every patient across disease datasets and introduces a human “phase 0” preclinical efficacy test using personalized, organoid-based models, she explains. Artificial intelligence (AI)—in particular, the Boolean Network Explorer (BoNE)—enables exploration of the connections and identification of targets. The computational platform was invented by Debashis Sahoo, Ph.D., associate professor of computer science and engineering at UCSD.
The phase 0 approach is “much more tedious, as it needs patient-derived stem cells to first model the disease-in-a-dish accurately, and then uses the model to test therapeutic efficacy,” Ghosh says. Modeling the complexity of human disease requires a team effort (scientists, clinicians, caregivers, and patients) and expertise in 3D biology, stem cell technology, and, most importantly, “insights into the complex interplay between the host immune system and the trillions of pathogens that co-reside within us.”
BoNE is based on Boolean, the oldest language in computer science, which reduces the range of gene expression in any biological sample into 0s and 1s based on whether gene expression is low or high, says Ghosh. Then, through a series of precision algorithms, BoNE allows the scientists to visualize any disease as a progressive continuum of changes such that clusters of genes, all interconnected, change in an invariant pattern in all samples. “When used at the very first step of drug discovery when a target is identified, the approach also provides guidance on the very last step of the discovery process… regulatory approval,” which is tied to a medicine’s efficacy in large and expensive phase 3 clinical trials.
“We usually build [maps of the invariants] in publicly available datasets, so we can show everybody that it is not the information that was different, it was the methodology that mattered to have resulted in different insights—in this case, actionable targets that are more likely to succeed,” Ghosh says. The first step is to use that data to build a model of the disease as a progression from health toward disease, then find the invariant rules, and finally rigorously test the rules on every available dataset in the disease space.
The model achieved its high predictive power because the approach relies on Boolean invariants and testing on diverse heterogeneous datasets, she stresses.
Three centers make up iNetMed and two of them—the Center for Precision Computational Systems Network (finds fundamental disease patterns and identifies high-value biomarkers and drug targets) and the HUMANOID Center of Research Excellence (reverse-engineers complex human organs and tissues to test predictions)—are collaborating on a series of studies demonstrating the potential of the human-centered approach to drug discovery, Ghosh says. The third, the Consortium for Cell-Inspired Systems Engineering, is focused on building machine learning algorithms that emulate the decision-making capabilities of cells in the body.
Flocking With Success
The Human Genome Project created a Pandora’s box of potential drug targets without a good way to “pick the horse that is most likely to win,” Ghosh says. The current, state-of-the-art method is to generate a lot of big data, create cohorts for disease and health, and strive to suppress (or elevate) what is found to be high (or low) in the disease group.
But the chosen target in a disease cohort may not translate to other independent cohorts, she continues. Laboratory animals in one cage and facility all have the same genetic background but humans tend to be highly heterogenous in terms of their genetic makeup, microbiome, diet, sleep patterns, body’s immunity, disease subtypes, and the rates at which the disease progresses, relapses, or goes into remission.
Ghosh and her colleagues are cleverly using machine learning to factor “the thunderous noise of heterogeneity in the clinic” into the drug discovery process and locking in the invariant parameters of disease. “Once we do that, we can move mountains later,” she says.
Dealing with heterogeneity is particularly important when tackling inflammatory bowel disease (IBD). Clinically, IBD is broadly classified as Crohn's disease or ulcerative colitis, but the condition does not progress in a unidirectional direction like cancer, she notes. Flares occur randomly and unpredictably.
As detailed in a study recently published in Nature Communications (DOI: 10.1038/s41467-021-24470-5), the BoNE platform was used to guide discovery of therapies (PRKAB1-specific agonists) protective of the gut epithelial barrier for IBD as well as an onslaught of pathogenic bacteria. The findings imply such drugs could be used both to treat and prevent acute flares.
Machine learning identified invariant gene expression patterns across all publicly available IBD datasets. Target validation used IBD-afflicted guts-in-a-dish created from biopsy tissues taken during colonoscopy procedures on consenting IBD patients. The disease model was developed by Soumita Das, Ph.D., associate professor of pathology at the UC San Diego School of Medicine.
IBD is the first of many diseases where iNetMed intends to prove the superiority of its predictive approach in head-to-head comparisons with existing methodologies, Ghosh says. In a paper recently published online in The Lancet’s EBioMedicine (DOI: 10.1016/j.ebiom.2021.103390), the research team trained a model on pandemics of the past—very few COVID-19 datasets were available in the early days of the pandemic, she notes—to discover the shared host immune response to respiratory viruses and use that knowledge on COVID-19 samples to identify viral targets.
One of those treatments, antibodies to SARS-CoV-2 spike protein (such as those developed by Regeneron) has shown >80% efficacy in preventing symptomatic disease among household contacts of SARS-CoV-2 infected individuals in a large phase 3 trial and the other, molnupiraviran (an antiviral developed by Merck and Ridgeback Biotherapeutics), is already showing promising results in phase 2/3 trial, says Ghosh.
Separately, the iNetMed team is working with colleagues in the pharmaceutical industry and on the UCSD campus to test a novel COVID-map-inspired target and drug, she continues. That may help in preparing for a third or fourth wave of COVID-19 by zeroing in on what is likely to work on most patients in the current as well as future viral pandemics.
As was also discussed in the paper, identifying the gene expression invariants also enables researchers to pick the best animal model in which to test a given drug. Among about 10 mouse models of colitis examined for their ability to mimic the human gut epithelial barrier, “it appeared that a few models were good but the rest of them were terrible.” A drug tested in the wrong model would probably have been mistakenly set aside, she adds.
Among the long list of other diseases that iNetMed aims to conquer are colorectal cancer, breast cancer, gastric cancer, esophageal cancer, childhood cancers (leukemias, brain tumors, and neuroblastomas), preneoplastic conditions, micro polyps, polyposis syndromes, Alzheimer’s disease, age-related macular degeneration, and non-alcoholic steatohepatitis (NASH)—in addition to human performance studies to understand “what makes super athletes,” Ghosh says. “The best way for us to test if this repeats itself is to pick challenging areas that either have no known therapies or have ineffective ones and provide insights into biomarkers and/or prognostic signatures.”
For the IBD study, the network-based model could accurately classify FDA-approved vs. failed drug targets because “we have a few winners that made it to the finish line,” she notes. That kind of prediction isn’t possible for diseases where there are no good or approved treatments, including Alzheimer’s disease and NASH.
With IBD, the researchers could use their Boolean Network map to predict which drugs were “flocking with success or clustering with the failures,” says Ghosh. The same approach could be used to vet upcoming treatment targets.
iNetMed’s AI-assisted approach to precision drug discovery faces barriers to adoption, including skeptics who think it sounds too good to be true, says Ghosh. The methodology itself is not hard to learn, but “resistance to anything new stands in the way.”
The road to acceptance, she says, is through many peer-reviewed publications that “consistently demonstrate rigor, provide evidence of superiority in head-to-head comparisons against existing methodologies, and reveal previously unforeseen therapeutic avenues in diverse disease conditions [i.e., new targets/drugs paired to diseases].”
The iNetMed team is holding nothing back when it comes to creating a deep pipeline of many such programs in diverse therapeutic areas. Since the researchers generally use publicly available datasets, the resulting disease maps, gene signatures, source data, and software codes are also freely available to the public.
“Several biotechs and pharmaceutical companies have approached us for help with vetting their pipelines,” Ghosh adds, although their own entrenched drug discovery processes may be difficult to uproot. It could well take a decade for the mathematical model to become mainstream. “There is no better proof than FDA approvals and that takes time, so we have to be patient.”