NIST Releases First Cancer Cell Line for Public Genomic Research
By Allison Proffitt
July 22, 2025 | The National Institute of Standards and Technology (NIST) has released the first cancer cell line in its Genome in a Bottle (GIAB) consortium. The new resource features a pancreatic cancer tumor cell line along with corresponding normal tissue samples, all obtained with explicit patient consent for public genomic data sharing. They published the process last week in Scientific Data (DOI: 10.1038/s41597-025-05438-2).
The comprehensive dataset, characterized using 17 different genomic sequencing technologies, represents the most thoroughly analyzed cancer cell line to date and is now freely available to researchers worldwide. As with other GIAB initiatives, the work aims to accelerate cancer research by providing a standardized reference that scientists can use to benchmark their genomic analysis tools and methods.
"The goal of this initial paper is to basically present a really large amount of genomic data on a new pancreatic cancer cell line and the corresponding normal tissue, and make that available to the public so that anyone can do whatever they want with these data," explained Justin Zook, Co-Leader, Biomarker and Genomic Sciences Group at NIST and senior author on the paper.
This marks the first time the Genome in a Bottle program, which has been developing genomic reference materials for normal human genomes for years, has included a cancer sample.
Overcoming Consent Challenges
One of the project's most significant achievements was obtaining proper consent for public data sharing. Other cancer cell lines have been characterized by various research groups, most famously the cervical cancer cells taken from Henrietta Lacks in 1951 without her consent. HeLa cells have proven unique in their immortality, and in 2013, the National Institutes of Health announced a data use agreement for the genomic data from HeLa cells overseen by a working group including members of the Lacks family.
But there are other cancer cell lines that are also decades old, and NIH determined that researchers could publicly share genomic data from legacy cancer cell lines in the Cancer Cell Line Encyclopedia despite having no or limited consent, because most of these cell lines already had public genomic data available.
The GIAB consortium, however, requires explicit consent for genomic data sharing. NIST worked with collaborators at the Liss Laboratory at Massachusetts General Hospital to update their standard biobank consent forms to explicitly include permissions for genomic data sharing and the creation of an immortalized cell line.
NIST published portions of the MGH IRB-approved wording last week. For example:
“We plan to do genetic research on the DNA in your tissue sample. DNA is the material that makes up your genes. All living things are made of cells. Genes are the part of cells that contain the instructions which tell our bodies how to grow and work, and determine physical characteristics such as hair and eye color. Genes are passed from parent to child.”
and
“Your tissue sample may be used to create living tissue samples (including cell lines) that can be grown in the laboratory. This allows researchers to have an unlimited supply of your cells in the future without asking for more samples from you. The living tissue samples will be shared with the academic, non-profit research, and for-profit communities. Cancer researchers will use these living tissue samples to study better ways to detect and treat cancer. Companies may use the living tissue samples to develop products that are used to improve the detection and treatment of cancer to benefit patients. The living tissue samples will be sent with the random number corresponding to your original tissue and will not be sent with identifiable health information.”
The Liss Lab operates the Pancreatic Tumor Bank at MGH, and with this updated language, one patient, a 61-year-old woman with pancreatic cancer, consented to both biobanking and genomic data sharing before her tumor and surrounding tissue was removed, ensuring the data could be made publicly available without restrictions.
"We want to have all the data public so that people can easily benchmark their results without having to go through approvals access to the data," Zook said.
Comprehensive Genomic Characterization
The research team employed 17 sequencing technologies to characterize the cell line, including whole genome sequencing, Hi-C, single cell sequencing, karyotyping, and specialized library preparation methods. The dataset includes genomic data from both the tumor cells and normal pancreatic as well as bordering duodenal tissue from the same patient.
"We tried to characterize with as many of the technologies as we could," Zook explained. "Essentially all of the technologies where they're at the stage where they could make their data public are included."
NIST used sequencing from Illumina, Oxford Nanopore, Pacific Biosciences, Element Biosciences, and Ultima and other characterization with tools and kits from Dovetail Genomics, Phase Genomics, Ultima, Bionano, Aima, and more. Zook welcomed any other groups who wants to work with the data. “Anyone who’s interested in working with us on helping to analyze these data, we are happy to work with as we develop these new benchmarks,” he said.
Developing Cancer Benchmarks
NIST's ongoing work involves developing benchmarks similar to those the Genome in a Bottle program has created for normal genomes. These benchmarks will help researchers evaluate how well their tools detect different types of genetic variants that occur in tumors including small somatic variants, structural variants, and copy number variants.
"Our primary goals going forward with these data are to develop benchmarks, like what Genome in a Bottle has developed for normal cells, but in this case it will be somatic variant benchmarks," Zook said. The team has already released preliminary versions of these benchmarks to gather community feedback.
"Every tumor is different from every other tumor to some degree. But also, if you're able to measure the somatic variants in this particular tumor, it gives you more confidence in your ability to measure somatic variants in other tumors as well," Zook explained.
Challenges and Future Directions
The genomic characterization of the new NIST cell line was done at a much lower passage number than genomic analysis of older cell lines, meaning it had undergone fewer cell divisions in culture and should more closely resemble the original tumor, but that is still to be determined.
“Future work will be needed to understand the genomic stability of the tumor cell line,” the authors write in the paper. “Our initial data from different passages can be used to gain a preliminary understanding of stability, and we plan to add more data from different passages performed at different laboratories in the future.”
Developing tumor cell lines from patient samples also remains technically challenging, with most attempts failing to produce viable, continuously growing cell cultures. NIST has successfully developed cell lines from only two individuals so far, with a second pancreatic cancer cell line currently showing promise.
The team also attempted to immortalize normal cells from the first patient to create a paired normal cell line, but this effort was unsuccessful. They plan to try multiple approaches with the second patient, including traditional Epstein-Barr virus immortalization and alternative methods.
The work ahead is broad. Zook says, “The goal is to build an array of different tumor cell lines and normal cell lines from those same individuals where we’ve characterized these different types of somatic variants. But even just having this first tumor cell line that we’ve characterized deeply will start to help people to understand how well they detect these different types of variants that occur across tumor types.”