Pharma, Life Sciences Partnerships Driven By NVIDIA AI and Processors

By Allison Proffitt

April 12, 2021 | At NVIDIA’s GTC event starting this week, NVIDIA CEO Jensen Huang again shared the latest in high performance computing from his kitchen. In his more than an hour-long opening keynote, Huang explored Omniverse, NVIDIA’s platform for creating virtual worlds; the latest in high performance data centers; AI for 5G; and NVIDIA’s work with the auto industry.

In the healthcare space, much of the announcements focused on the expansion of Clara Discovery, NVIDIA’s computational platform for healthcare. “Clara is a collection of pre-trained models, AI application frameworks, and reference applications so we can bring these capabilities into the domain of healthcare,” explained Kimberly Powell, VP of healthcare at NVIDIA in a press briefing. “This is very domain-specific,” she added.

Within Clara Discovery, NVIDIA announced four new pre-trained models: MegaMolBART, an open source transformer-based generative AI model; GatorTron, the world’s largest clinical language model; AlphaFold 1, which predicts the 3D structure of a protein from the amino acid sequence; and ATAC-Seq for denoising single cell genomics.

Huang announced—and Powell elaborated on—several healthcare and drug discovery partnerships driven by Clara Discovery.

Schrödinger: Optimized Computational Drug Discovery

Schrödinger is already “a heavy user of NVIDIA GPUs” in their drug discovery and materials science business, Huang explained, even recently entering into an agreement to use hundreds of millions of NVIDIA GPU hours on the Google Cloud.

But today, Huang announced a partnership to serve Schrödinger customers who cannot use the cloud. NVIDIA plans to optimize Schrödinger’s FEP+ computational drug discovery platform— designed to model and predict the properties of novel molecules—for the NVIDIA DGX SuperPOD, which is built with NVIDIA DGX A100 systems and NVIDIA InfiniBand HDR networking. By optimizing the platform for the SuperPOD, Powell said, “we’ve essentially accelerated the ability to do the work by five times.”

The work includes the physics-based modeling in Schrödinger’s product suite, as well as support for NVIDIA Clara Discovery. The companies also plan to partner on scientific and research breakthroughs to further advance physics-based computing and machine learning for drug discovery.

“The world’s top 20 pharmas use Schrödinger today. Their researchers are going to see a giant boost in productivity,” Huang said. Powell put it in more concrete terms: “We can simulate over one million drug candidates in a year. To put that in perspective, if you were to do this in the lab, it would cost you well over $100 million and take well over five years to do it.”

AstraZeneca: AI Learns Language of Chemistry

NVIDIA is also collaborating with AstraZeneca on a transformer-based generative AI model for chemical structures used in drug discovery that will be among the very first projects to run on Cambridge-1, which is soon to go online as the UK’s largest supercomputer. The model—called MegaMolBART—will be open sourced, available to researchers and developers in the NVIDIA NGC software catalog, and deployable in the NVIDIA Clara Discovery platform for computational drug discovery.

MegaMolBART is based on AstraZeneca’s MolBART transformer model and has been pretrained on one billion molecules from the ZINC chemical compound database—using NVIDIA’s Megatron framework to enable massively scaled-out training on supercomputing infrastructure. “We used 32 DGX A100s to train these very large models,” Powell reported.

Huang reported that MegaMolBART has seen recent success with Insilico Medicine, which used the model, “to find a new drug in less than two years.” And Powell was just as enthusiastic.

“You can do amazing things with it,” Powell said. The model can handle reaction prediction, molecular optimization—synthesizing “properties that we just couldn’t imagine designing for before”—and de novo molecular generation. “We know there are more than 10⁶⁰completely intractable number of potential molecules out there,” Powell said. “If we can take our view beyond the chemical databases, we are going to discover more novel molecules that are needed to treat over 10,000 diseases that still go without treatment.”

And these tasks are just the beginning, Powell said. “Once you have these very large pretrained models, you can use them for many, many subsequent fine-tuned tasks that all will help in predictive models for drug discovery and drug development.”

University of Florida: GatorTron Reads EHRs

Thanks to a gifted DGX SuperPod of over 140 nodes—named HyperGator—the University of Florida and NVIDIA used the Megatron training framework to read 300 million unstructured notes over two million patients and 50 million patient encounters.

“The big idea here was, can we train the model to read a gigantic corpus of doctors’ notes in medical records. And we did just that,” Powell said. In seven days, the model achieved state of the art on named entity recognition, she reported. It even improved the University of Florida’s own patient deidentification or anonymization methods.

“The downstream applications of having a state-of-the-art clinical model are unbounded,” Powell said. Being able to search and query all EMR data could be used to match patients to trials, predict life-threatening diseases and intervene early, create health summaries for clinicians and patients, and for clinical decision support.

“What this really means is, the combination of NVIDIA DGX SuperPOD, which is essentially a data center in a box, with NVIDIA’s Megatron training framework, we have essentially democratized the ability for every academic medical center to be able to build their own clinical language models,” Powell added. “And they want to do that!”

In addition to these partnerships and deployments, Huang and Powell mentioned NVIDIA GPUs driving science at Oxford Nanopore with the sequencing process being trained on a DGX SuperPOD as well as inference done with AI tools; Recursion Pharmaceuticals’ BioHive-1 DGX SuperPOD, and Vyasa’s Layar, a deep learning AI data fabric architecture.

“For the first time in history, with biology being digitized, we can apply the power of computing to tackle humanities greatest challenges,” Powell said. “All of the pieces are really coming together here!”