Virtual Cell Challenge Announces Prize for AI Models of Cellular Response

By Bio-IT World Staff

July 3, 2025 | In a commentary published last week in Cell, researchers from the Arc Institute introduced the “Virtual Cell Challenge,” a public competition with a grand prize worth $100,000 for the machine learning model that best predicts how cells will respond to genetic perturbations. Arc hopes to incentivize progress in artificial intelligence and biology by accelerating the creation of high-quality datasets and sparking a conversation about rigorous standards for assessing how well AI models simulate cellular behavior.

"CASP competitions transformed protein structure prediction over 25 years, ultimately enabling breakthroughs like AlphaFold,” said Arc Co-Founder and Core Investigator Patrick Hsu in a statement. “We believe Arc can use the same approach to accelerate progress toward comprehensive virtual cells that could fundamentally change how we study biology and identify targets to better treat complex diseases."

The Arc Institute’s intent is for the Virtual Cell Challenge to be a recurring and open benchmark competition that will provide an evaluation framework, purpose-built datasets, and a venue for accelerating model development, the authors wrote in the Cell commentary (DOI: 10.1016/j.cell.2025.06.008). Arc plans to repeat the Virtual Cell Challenge each year with new single-cell transcriptomics datasets comprising different cell types, and with stiffer challenges requiring entrants’ models to predict the effects of more complicated biological changes.

Year One

The Virtual Cell Challenge is sponsored by NVIDIA, 10x Genomics, and Ultima Genomics. For the inaugural challenge, Arc has generated a new single-cell transcriptomics dataset of 300,000 H1 human embryonic stem cells (H1 hESCs) with 300 genetic perturbations chosen to span a broad range of phenotypic responses. The dataset was optimized to maximize the reproducibility of observed effects.

“This effort is intended to create a level playing field, drive community engagement, and accelerate progress by providing high-quality benchmark datasets, a public leaderboard, and a mechanism for reproducible and fair comparison,” the authors write in the paper.

The cells will be deployed throughout the competition in segments for fine-tuning, validation, and testing. Competitors are invited to train models on gene expression data for over half a billion cells included in the Arc Virtual Cell Atlas, as well as other public datasets. The challenge will evaluate how well models can predict changes in gene activity when individual genes are silenced. Competitors will make predictions of these effects during the middle phase of the competition, with their interim performance shared on a live leaderboard, before the final assessment leading to a public announcement of the winners.

“A team’s success will depend on their model’s ability to generalize to a new cell context. This is a difficult task, so we have structured this first competition as a few-shot learning challenge by releasing a training subset of H1 hESCs,” said Dave Burke, Arc’s Chief Technology Officer, in the same statement. “The ability for models to generalize to new cell contexts is ultimately key to unlocking virtual cells for drug discovery and we hope this challenge will ultimately help accelerate progress towards that goal."

In developing the competition’s evaluation framework, Arc also aims to provide consistent benchmarks for virtual cell model performance for the field. While rapid advances in single-cell technologies and machine learning have created new opportunities to model cellular behavior, researchers have struggled to compare different approaches due to inconsistent evaluation methods and varying dataset quality.

Ready, Set, Model

Registration for the Challenge opened last week at virtualcellchallenge.org, with teams receiving training data at sign-up. Final rankings will be determined solely by model performance on the final test set, which will be released in late October one week prior to the final submission deadline. Winners will be announced in December.

Individual contributors as well as teams from academic institutions, biotechnology companies, and independent research organizations are eligible to participate. Competitors with experience in computational modeling or single-cell biology are particularly encouraged to enter. The three teams with the top three models will receive prizes valued at $100,000, $50,000, and $25,000, combining cash awards and NVIDIA DGX Cloud credits.

“Virtual cells are poised to become foundational tools for biology, and to ensure that they reach their potential, we need clear and rigorous evaluations,” the authors wrote in the Cell commentary. “The Virtual Cell Challenge aims to provide just that: a fair, open challenge to surface the best models, clarify the state of the art, and engage the community. We invite the community to engage with this first iteration and help better define the contours of predictive cellular modeling as a scientific discipline.”