Ginkgo Datapoints Launches Virtual Cell Pharmacology Initiative

By Allison Proffitt

November 21, 2025 | Ginkgo Datapoints has launched V-Ref293, a novel engineered cell line designated specifically as the reference standard for virtual cell research. Along with the cell line, the group is launching the Virtual Cell Pharmacology Initiative (VCPI), the first project under The Virtual Cell bio-AI community platform from Ginkgo Datapoints.

Virtual cells, AI-powered digital twins of biological cells, have great promise as a critical tool for drug discovery. However, a lack of standardization, reliable wet lab methods and appropriate pharmacology data prevents these models from reliably predicting how drugs will impact cells, Ginkgo Datapoints contends. VCPI is built to address this foundational gap, the company says.

“We often think of AI models as “black boxes”—we can observe their outputs, and we may even understand their training data and architectures, but we don’t always know why they make specific predictions. Biology is similar. Even after decades of molecular and cellular research, we still struggle to predict how a cell will respond to a perturbation,” explained John Androsavich, PhD, General Manager of Ginkgo Datapoints, in an email to Bio-IT World.

“Virtual cell research aims to bridge these two black boxes. Even if we don’t fully understand the internal workings of AI models or cellular systems, using one to predict the behavior of the other can shift discovery from empirical screening toward rational design—and accelerate the path to new therapeutics.”

Virtual Cell Pharmacology Initiative is an open-source platform designed to build the first standardized framework for virtual cell modeling in drug discovery by bringing together researchers, pharmaceutical companies and AI developers in a community-driven effort to create the largest public dataset of its kind, aiming to test at least 100,000 compounds and generate more than 12 billion data points.

VCPI plans to allow contributors to participate before data creation by offering high-throughput RNA profiling via Ginkgo Datapoints, free of charge.

“The AI and biology communities haven’t always been aligned on what makes good training data,” added Androsavich. “Current approaches rely on large quantities of low-quality data. It’s like empty calories — lots of data, but it’s noisy and may not be reproducible. We believe VCPI will prove that quality and quantity can co-exist. We’re offering both, with a method specifically designed for the pharmacology applications that matter most to drug developers.”

A Defined Cell; Consistent Wet-Lab Techniques

The first major challenge Ginkgo Datapoints identified hindering virtual cell research is the lack of a cell standard. They offer V-Ref293 as a novel engineered cell line designated specifically as the reference standard for virtual cell research—a robust, controlled standard for building the first cellular digital twin. V-Ref293 is an engineered HEK293 derivative, the company said in an unbylined blog that posted with the news. “It has a clonal, mapped Ad5-E1A insertion to help sync proliferation and is serum-free for improved reproducibility.” Ginkgo Datapoints plans to make master cell bank vials available to the community in 2026, ensuring labs worldwide can generate comparable results.

The company identifies lab methods as the next limiting challenge and argues that DRUG-seq is the best validated high-throughput RNA bulk sequencing method that delivers clearer, more reliable data for drug screening than single-cell approaches.

“Single-cell sequencing has become the default tool for virtual cell research—not because it's always the right tool, but because it can measure a lot, so we keep using it for everything. It’s incredibly powerful for observational profiling and genetic perturbations, but it doesn’t translate well to drug screening. It's too expensive, too low-throughput, and not designed for testing large libraries of compounds,” Androsavich explained.

DRUG-Seq fills that gap. Developed by the Novartis Institutes for Biomedical Research, DRUG-seq (Digital RNA with pertUrbation of Genes) is a high-throughput bulk sequencing method. “We developed a massively parallelized, automated, low-cost next-generation sequencing-based method to profile whole transcriptome changes under chemical and genetic perturbations and successfully applied it in an industrial high-throughput screening environment,” explained its authors when it was originally announced in Nature Communications in 2018. (DOI: 10.1038/s41467-018-06500-x)

“It brings scalable transcriptomic profiling to chemical screens, where traditional scRNA-seq falls short,” he added.

DRUG-seq is also a mainstay of Ginkgo Datapoints’ work. The company processes over 100 x 384 plates a week, from cell culture and perturbation to sequencing and data analysis. The company has invested over $1 billion in automation infrastructure, enabling it to process data at scales few organizations can match — and to make that capability freely available to the research community through VCPI.

Community Consortium

Researchers and companies can contribute compounds for free testing, with data released on a rolling basis to the public domain under Creative Commons (CC BY 4.0). “We are releasing the data under creative commons license, so we expect the community will come up with really unique and wonderful ways to combine it with other data sources,” Androsavich said.

Ginkgo Datapoints is gifting the community 100,000 compounds of free RNA profiling data. Once they have joined, participants can send 50 µL of their compound at 10 mM concentration. “You cover the shipping; we’ll handle all the rest,” the company said via the blog.

Contributors also join the initiative community and will be able to vote on prioritization, share models, take part in future competitions and engage in a community discussion forum. Active contributors can achieve “super user” status and gain early data access.

Androsavich explained that for the company, two outcomes are most important: the scientific value of learning from the data directly, and the functionalities that can grow from the collaborative research of open access.

“We want to learn from the data directly—new signatures, mechanistic insights, unexpected relationships—not just treat the dataset as training fodder. If this becomes a resource that biologists cite and build upon for years, we’ve done something meaningful,” he said, also noting that, “With open access, researchers everywhere can explore different architectures and training strategies. Competitions and benchmarks will help surface the best models and drive progress.”

Interested groups can learn more about the Virtual Cell Pharmacology Initiative at https://thevirtualcell.com. Data generation will begin immediately, with the first public data releases expected in early 2026.

“Cells are deep systems,” Androsavich challenged. “This is a starting point.”