NVIDIA Announces Plans to Expand Large Language Models for Biology

By Bio-IT World Staff

September 22, 2022 | At NVIDIA’s annual GTC event two days ago, the company made two particular announcements key to the life sciences space. First, NVIDIA announced it will make Clara Parabricks available on the Broad Institute’s Terra cloud platform. Second, the company revealed plans to expand large language models to biology, announcing BioNeMo.

Large Language Models that Speak Biology

An outgrowth of the NVIDIA-Broad partnership will be developing NVIDIA’s newly-announced large language model for biology—an extension of the NeMo Megatron framework and part of the NVIDIA Clara Discovery collection of frameworks, applications, and AI models for drug discovery.

Large language models have proven adept at meeting unsupervised learning objectives with extremely large datasets, and NVIDIA’s Kimberly Powell, vice president of healthcare at NVIDIA, in a media pre-briefing, said we are merely “scratching the surface” of what will be possible. Previously, NVIDIA developed a supercharged framework called NeMo Megatron that uses model and data parallelism to scale LLMs to hundreds of billions and trillions of parameter models. Now the company has extended the same model and tuned it for biology, reducing training time from months to days.

“Luckily, digital biology data comes with its own language amenable to these unsupervised learning approaches and language models,” Powell said. “For DNA, it’s nucleic acid sequence. For proteins, it’s amino acid sequence. And for chemicals, it’s SMILES strings.”

NVIDIA expects both Broad and NVIDIA researchers will work with BioNeMo, creating new models to add to the collection and make available in the Terra platform.