NVIDIA Launches Generative AI-in-a-Box

June 11, 2024

By Allison Proffitt 

June 11, 2024 | At COMPUTEX Taipei earlier this month, NVIDIA launched NIM—NVIDIA Inference Microservices—a simple, standardized way to add generative AI to applications.  

Data generation—rather than retrieval—is poised to take over computing, said Jensen Huang, founder and CEO of NVIDIA, in a wide-ranging keynote presentation that included this and other announcements. “In the future, your computer will generate as much as possible and retrieve only what’s necessary,” he said. “The reason for that is because generated data requires less energy to go fetch information.” 

But every enterprise doesn’t have teams of AI researchers to build their own generative AI tools. That’s where NIM comes in. NVIDIA NIMs are inference microservices that provide models as optimized containers that can be deployed on clouds, data centers or workstations, giving developers the ability to easily build generative AI applications for copilots, chatbots and more, in minutes rather than weeks. 

Huang called NIM “a new types of software”: a pre-trained model that is essentially AI-in-a-box. “Inside this container is CUDA, TensorRT, Triton for Inference Services. It is cloud native so that you could autoscale in a Kubernetes environment. It has management series and hooks so that you can monitor your AIs. It has common APIs—standard APIs—so that you could literally chat with this box.”  

NIM also enables enterprises to maximize their infrastructure investments, the company says. For example, running Meta Llama 3-8B—Meta’s openly available state-of-the-art large language model—in a NIM produces up to 3x more generative AI tokens on accelerated infrastructure than without NIM. This lets enterprises boost efficiency and use the same amount of compute infrastructure to generate more responses. 

Enterprises can deploy AI applications in production with NIM through the NVIDIA AI Enterprise software platform. Starting next month, members of the NVIDIA Developer Program can access NIM for free for research, development and testing on their preferred infrastructure. 

Early Adopters in the Life Sciences 

Over 40 NVIDIA and community models are available to experience as NIM endpoints on ai.nvidia.com, according to the company, including Databricks DBRX, Google’s open model Gemma, Meta Llama 3, Microsoft Phi-3, Mistral Large, Mixtral 8x22B and Snowflake Arctic. 

Techbio and pharmaceutical companies, along with life sciences platform providers, use NVIDIA NIM for generative biology, chemistry and molecular prediction. With the Llama 3 NIM for intelligent assistants and NVIDIA BioNeMo NIMs for digital biology, researchers can build and scale end-to-end workflows for drug discovery and clinical trials.  

Deloitte is driving efficiency for garnering data-based insights from gene to function for research copilots, scientific research mining, chemical property prediction and drug repurposing with its Atlas AI drug discovery accelerator, powered by the NVIDIA BioNeMo, NeMo and Llama 3 NIMs. 

Transcripta Bio harnesses Llama 3 and BioNeMo for accelerated intelligent drug discovery. Its proprietary artificial intelligence modeling suite, Conductor AI, uses its Drug-Gene Atlas to help discover and predict the effects of new drugs at transcriptome scale. 

Ecosystem Partners 

Platform providers including Canonical, Red Hat, Nutanix and VMware (acquired by Broadcom) are supporting NIM on open-source KServe or enterprise solutions. AI application companies Hippocratic AI, Glean, Kinetica and Redis are also deploying NIM to power generative AI inference. 

Leading AI tools and MLOps partners — including Amazon SageMaker, Microsoft Azure AI, Dataiku, DataRobot, deepset, Domino Data Lab, LangChain, Llama Index, Replicate, Run.ai, Saturn Cloud, Securiti AI and Weights & Biases — have also embedded NIM into their platforms to enable developers to build and deploy domain-specific generative AI applications with optimized inference. 

Global system integrators and service delivery partners Accenture, Deloitte, Infosys, Latentview, Quantiphi, SoftServe, Tata Consultancy Services (TCS) and Wipro have created NIM competencies to help the world’s enterprises quickly develop and deploy production AI strategies.