Pharma’s AI Future

By Allison Proffitt

December 12, 2018 | Why not pharma?

That was Peter Henstock’s challenge to the audiences last week at the AI World Conference & Expo in Boston. Henstock, the AI and Machine Learning lead at Pfizer, argues that pharma isn’t new to AI. The industry has been doing bioinformatics, business intelligence, cheminformatics, text analytics, and QSAR—quantitative structure–activity relationship—models for years.

Now is the time for pharma to embrace AI and start realizing some of the benefits other industries are enjoying. For example, Forbes suggests that some 35% of Amazon’s revenue is generated by the recommendation engine. And ImageNet says AI can now recognize images better than humans.

The possibilities are vast, but what are the road blocks? Henstock listed several possible culprits: data volume, integration challenges, and the skills landscape. To chart a way forward, he shared his vision of the AI Hierarchy of Needs. Foundational is data at sufficient volume for AI, then comes security needs, collaborations with other groups, robust data science, machine-learning capable, then—at the pinnacle—an AI-driven company.

But even Henstock’s foundation layer—data volume—isn’t a solved problem. “The molecular space is huge, and our datasets are still very small,” said Ed Addison, CEO of Cloud Pharmaceuticals. Cloud is using a mix of AI and prior knowledge for lead design because for some questions, the datasets aren’t yet big enough.

We tend to break problems down into sub-questions, Addison said, look at the data we have available, and then pick the right algorithm based on our data depth and the questions we’d like to answer.

The data-first approach is a wise one, said Robert Bogucki, CTO at deepsense.ai. In an executive roundtable at the event, he was asked how the development process happens in an AI or analytics project. First, he advised, make sure you have the data. “If I’m just thinking about collecting the data now, maybe I should wait on this project until I have the data sources.”

Joe Cheng, associate director of data and statistical sciences at Abbvie, argued for more freely available data to explore. Clinical trial datasets are a huge asset, he said, but the data are mostly “hidden away.” He acknowledged data sharing initiatives, but said most only release data according to a proposal with a specific hypothesis. Users can’t usually just explore the dataset, he lamented. He encouraged pharma to find a way to make the data available to play with and explore.

GlaxoSmithKline is first in line.

ATOM, the Accelerating Therapeutics for Opportunities in Medicine consortium, was started in 2015 by GSK, the US Department of Energy, and the National Cancer Institute. The consortium is investing in building up the data foundation for all of pharma in an effort to jumpstart the AI returns.

ATOM is focused on modulating the biology, said John Baldoni, senior vice president of in silico drug discovery at GSK and founder and governing board co-chair at ATOM. This includes molecular design, human-relevant assays, and ADME-tox, he said. GSK donated two million failed compounds to launch ATOM; the consortium now has 150 model-read datasets.

“Industry is really the repository of the data we use to better understand biology,” Baldoni said, arguing that the pharma industry is “by far” the biggest repository of control experiments. He believes pharma has an obligation to use that data for humanity, even if that means disrupting business structures.

Accelerant AI

At AstraZeneca, Tom Plasterer, US cross-science director of R&D information and Jonathan Dry, director of bioinformatics, oncology, view artificial intelligence as an “accelerant” for oncology informatics. The challenge they’ve found is passing data between different parts of the pipeline: seamless information connectivity across domain nodes, they said. Plasterer, in particular, is a proponent of FAIR data—maintaining data that is Findable, Accessible, Interoperable, and Reusable. AstraZeneca has found that AI can help with pre-processing data to ensure clean data networks. Where this becomes AI instead of just fancy text mining, Plasterer clarified, is how a system can infer relationships between terms.

The two agreed with Addison: by deep learning standards, pharma doesn’t have big datasets. But that doesn’t mean AI doesn’t have a role to play. When aligning AI in service of FAIR goals, Plasterer and Dry had this advice: clearly define business objectives for each AI trials, and apply AI across the span of the pipeline.

Eric Neumann agrees. CEO and founder of Aidaka, Neumann believes drug discovery can certainly leverage deep-learning today. There are limitations to deep learning technology, he acknowledged. It is opaque: predictions are often without explanation, and there is no obvious way to incorporate prior knowledge.

To address the opaqueness, Neumann recommends abductive reasoning. Build the model so that outcomes predict the mechanisms behind them, uncovering causal relationships. The approach leverages the ingenuity of deep learning to find optimal explanations, he says. Regulatory bodies want more than a black box, he said. Machine learning has already been accused of being alchemy. Instead, including a “formalism” inside deep learning helps preserve prediction and hypothesis testing that reassures all of us.

We are also limited, Neumann believes, by current architectures. He recommends Graph Nets, work done by Peter Battaglia and colleagues from DeepMind, Google Brain, MIT and others that Neumann believes will be particularly applicable to cellular and biomolecular systems.

“We present a new building block for the AI toolkit with a strong relational inductive bias—the graph network—which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors,” Battaglia et al write in their arXiv abstract published in June 2018. The Graph Nets library is available at GitHub.