March 19, 2010 | Iya Khalil is a fast talker, she readily admits. But she’s not afraid to admit that her company, Gene Network Sciences (GNS), has had to pause and change its tune over the past decade in order to establish fruitful business partnerships with pharma (see, “GNS Charts ‘Unknown’ Biology,” Bio•IT World, Oct 2006). However, the latest version of its supercomputer-based scientific platform called REFS (Reverse Engineering, Forward Simulation) and efforts to harness complete ‘omic and clinical data are putting GNS on the right track.
“Our big focus is building models from patient data,” says Khalil, GNS’ executive vice president, from her spacious offices in Cambridge, Mass. “Now that you can measure what’s happening in a human system, there’s no reason you can’t collect the data directly from patients and build models.” And that’s important, for while “someone might get lucky” and identify another blockbuster drug, she insists that’s not the way to go. “We have to think about matching therapies to patients, genotype or molecular profiles or disease pathology.”
Khalil says the ability to collect genetic and expression data, as well as other data from proteomics, imaging, and clinical are all coming online. The challenge is how to use those data. One approach is to help identify key markers to understand who would (and would not) respond to a specific therapy. “Matching patients to drugs is one of the primary focal points—it’s not that there’s a lack of drugs out there, but finding who would match the drug is the challenge,” says Khalil.
Established by Khalil and fellow Cornell physicist Colin Hill, GNS brings together specialists in computer science, biology, genetics, and computational physics. It has established partnerships with several top pharma companies, including Pfizer, Johnson & Johnson, and Biogen Idec, as well as the personalized medicine start-up CombinatoRx (see, “The Odd Couple,” Bio•IT World, Sept 2003).
GNS started out in 2000 doing mechanistic modeling by using the literature, and created a patented language to model the interactions. GNS completed a computational model of a colon cancer cell, but eventually realized that it could only describe a small fraction of the total interactions. “The ‘known’ biological circuitry is context dependent, horribly incomplete, and may sometimes be flat out wrong, so you’ve got to discover the missing links and find the relevant parameters, and you’re not going to get that from the literature alone,” says Khalil.But many pharma companies were building their own in silico modeling groups, which didn’t exactly lift demand for GNS’ services.
Other strategies have their pitfalls as well. Hiring knowledgeable domain experts to identify key targets is useful but hardly scalable, Khalil says. Another strategy is to use standard statistics and informatics to identify the most significant changes in a dataset. The problem there is that “the things that pop out may not be the most important players when it comes to identifying the best targets or predictive markers. And stats alone won’t separate causal from reactive.”
GNS chose to invest in an industrial-scale software platform and in optimizing its use of supercomputers. Khalil shows me two familiar graphs: one is the plummeting costs of DNA sequencing over the past ten years, which contrasts with the rapidly rising capacity of supercomputing. It is these opportunities that GNS wants to exploit. “This is the opportunity for biology as we see it. We see those two concepts being tied conceptually. Now we can go into the system and learn what’s going on from the data using an inference approach.”
Khalil identifies four main problems that GNS believes it can help pharma address, not merely on the R&D side but also the commercial side and the health care sector.
1) The creation of new drug programs, matched to specific sub-populations, using genetics as a major part of the modeling;
2) Prioritizing drug targets;
3) Identifying biomarkers to separate responders from non responders; and
4) Combination therapy opportunities.
Using supercomputers and machine learning, integrating genetic, genomic, and clinical data, GNS seeks to learn the disease models before simulating them. Much of the effort focuses on using measurements of DNA variation, molecular changes and clinical outcomes.
“First, we compute the local models in the system,” Khalil explains. “There are hundreds of thousands of genetic changes, thousands of gene expression changes and clinical phenotypes that produce an astronomically large number of possible relationships. We enumerate trillions of these possible network fragments and score these against the data. The platform will ask: Does the data support these relationships? Do those transcripts interact? Does this SNP drive changes in that transcript?”
The REFS platform investigates these questions, leveraging massive supercomputing power. The global optimization portion of the REFS platform samples from these network fragments and assembles them initially randomly, distributing the computation over thousands of processors. It then evolves those models, switching network fragments—akin to puzzle pieces—in and out of the models. After each ‘puzzle piece’ change, the score of each hypothesized model is updated, and changes that result in a better-scoring model are retained.
GNS uses the IBM Blue Gene supercomputing resources. “IBM provided a great easy platform for us to run the REFS platform—the machines work really well,” says Khalil. “Eventually cloud computing will become important and we’ll use that as well.”
Khalil says there is never enough data to capture a single “best” model, so GNS produces a population of models. “Once we’ve learned the ensemble, we run the simulations. Because of the distribution of models, we can assign p values and confidence levels.” The key part is learning the network structure and the underlying parameters. “If we’re collecting genotype data that can be specific to a patient, we can set genotypes in our models to reflect a specific patient background and ask: What are the nodes at which I can intervene that would shift that clinical phenotype towards greater efficacy?”
The engine infers the connections between genotype, expression, and phenotype. “Once you’ve learned the models, you can extract hypotheses in a very high-throughput way. This part isn’t as computationally intensive. We are able to convert the simulation aspect of the models into software that can run on our partners’ desktops, called Models In A Jar.”
One of GNS’ early successes in leveraging patient data with the REFS platform has come with Biogen Idec in rheumatoid arthritis (RA), which began in late 2008. Biogen Idec scientists have been searching for new drug programs for the 40% of RA patients that do not respond to standard anti-TNF therapies. Khalil says the study identified CD86, which happens to be a clinically validated target for the Bristol-Myers Squibb biologic Orencia.
Aside from identifying targets, the GNS model simulation also identified some new hypotheses that could do better than the existing drug. “The concept is, can you learn something new from your clinical trial? You can collect patient samples and measure genetic variation, changes in disease severity, measure what happens after the patients are given drugs and their molecular profiles change? Can we learn a model describing disease in patients, before or after they’ve been treated? That’s the big opportunity—we can now go into patients and do exactly that.”
By analyzing thousands of transcripts, GNS computed some 1.3 trillion local models that addressed whether SNPs or transcripts directly drive changes in transcript levels or clinical phenotypes. “We identified 3 million local models or network fragments from those trillion,” Khalil continues. The platform ended up with 1,000 models that best described the data, which in turn were consolidated into a consensus network for a graphical view. For example, 1,000 answers are obtained when the question: What happens to the clinical endpoint A in a patient with genetic background X when gene Y is knocked down by 70%? The range of these 1,000 answers allows GNS to provide an assessment of confidence in each model prediction.
Starting with hundreds of thousands of components, only a handful turns out to be critical. The model predicts connections between genotypes, transcripts, and clinical phenotypes such as swollen joints and tender joints. “You can have interactions between your genetic components, transcripts, and swollen joints. And tender joints could be explained by swollen joint count, SNPs, and transcripts,” says Khalil.
Next, GNS sets each of those genotypes to match a patient, obtaining a distribution of scores for say swollen joints while asking, what predictors are key? What happens if CD86 is modulated, for example? By doing this analysis for all 70 patients in the trial, GNS predicted various key intervention points modulating the clinical phenotype. “We predict the clinical outcome of patients on anti-TNF—some have responded, many still have tender joints,” says Khalil. “We predict modulation of CD86 will reduce tender joints and swollen joint scores, so we have generated a hypothesis that this should be a good intervention.” That was a new prediction, not based on prior knowledge.
The GNS modeling exercise also identified other potential targets, notably a member of a gene family well-known in cancer that ranked higher than CD86, and a novel regulator of the development of certain inflammation-related cells. Predicting What’s Predictive
Despite recent progress in identifying biomarkers that help personalize medicine, Khalil cautions that, even for targeted therapies, it is not usually just a single genetic region that ends up being predictive. In cancer, for example, it could be EGFR, KRAS, gene amplifications, or mutations. “What combination of factors ends up being predictive for that patient? For many inhibitors, it’s not something in that pathway or target. It could be downstream or a crosstalk pathway that ends up being predictive.”
From the pharma side, most requests for GNS come in two flavors. One the biomarker side, pharmas want to match drugs in development to patients that will respond, by identifying the factors predictive of response. Pharmas also want to use data from clinical studies to help identify targets in the context of existing sub-populations of patients, given that not everyone will respond to the same intervention points. GNS is getting repeat partnerships. “A computational engine is still a black box to many of our partners, so we make the process transparent by providing the Models In A Jar that result from the process, which can be simulated and whose predictions can be verified. Ultimately, the most relevant evidence we have that our platform works is that our partners want to use it again.”
Another constituency is the health care community, where the challenge is identifying who is going to benefit once the drug is approved and who is going to pay? Working with payors and studying clinical observational data could help identify unmet medical needs. Although not yet formally announced, GNS is starting a collaboration with a leading payor organization to use patient records to identify potentially adverse drug combinations and to identify optimal treatments for individual patients, looking at clinical data in concert with claims and other data to predict informative markers.
“In the short term, we can help a drug get to market and help improve patient health by matching the right drug to the right patient, and vice versa,” Khalil concludes. “In the long term, we believe this approach can be part of how you do discovery from the very beginning. It’s not that you’ve identified some interesting phenomenon in a cell assay and now you’ve got to identify all the biology around it. Instead, the biology is transparent in the models you generate directly from patient cohort data. Then you base your discovery program from the beginning around targets and interventions for sub-populations identified from these models.”
Finally, it sounds like Khalil is speaking pharma’s language.
This article also appeared in the March-April 2010 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.