By Allison Proffitt
July 17, 2013 | On Monday, researchers released the largest database of cancer-related genetic variations—the genomes of the 60 cancer cell lines represented by the NCI-60 list. The project was published online in Cancer Research (2013;73:4372-4382. Published OnlineFirst July 15, 2013).
The NCI-60 cell lines were derived from cancers from nine tissues of origin. “These tissues of origin were selected because they are hard to treat,” and include lung cancer, melanoma, ovarian cancer, renal cancer, colon cancer and others, Yves Pommier, chief of the Laboratory of Molecular Pharmacology at the National Cancer Institute, told Bio-IT World.
Pommier and his collaborators performed whole exome sequencing on the 60 cell lines using an Illumina Genome Analyzer IIx instrument and catalogued the found variants.
“A small number of genes had been sequenced by Sanger several years ago… and when we compare, the matches are beautiful, the data match perfectly with the known few genes that had been sequenced by classical methods. But now we have the whole genomes.”
The cell lines have historically been used to screen chemicals and anticancer agents for possible development into cancer therapies. Over the years there have been thousands of compounds screened against the list, Pommier said, but having full sequences for the 60 cell lines expands the possibilities.
Cancer cell lines are certainly different from tumors, but the genetic results are still very similar, Pommier said. “When we compare the gene expression profiles of these cancer cell lines to their tissues of origin, to a good fraction they retain the tissue of origin signature. If you look at the melanoma cell line, the [cells] still look like melanoma,” he said.
“The main difference is the cell lines are homogeneous—they are clonal; they are developing all the same in the tissue culture flask. A real cancer is very heterogeneous. That has pros and cons. The advantage of the cell line [is that] because they are very homogenous, it is easier to interpret the results. The gene mutations and expressions are the results of one population, where when you have a real tumor you have an average of everything.”
The dataset is ripe for query. The authors did some initial data mining, but are releasing the entire dataset to the research community. The data are made available through the CellMiner
, NCI DTP
and Ingenuity Systems
“We’re well aware that there are so many questions to be asked,” Pommier said. “You can enter the data in different ways. You could be a drug-minded person and look at a specific drug and see how the drug activity matches different mutations and gene expression. You can be a gene person and you can enter from the gene side. There are 21,000 genes! How many possibilities are there?
“That’s why these things need to be publically available, so people can use it as a platform to ask their own questions and find their own discoveries. It’s essential that this is publically available, because it’s a tool.”
The dataset is a tool that should be accessible for all, the authors believe—not just bioinformaticians.
“One of the intents here is to enable people who don’t have a bioinformatics team to look at these data and look at biological and drug insight,” Pommier said. The website is for “regular people,” he contends. All of the data are delivered in Excel spreadsheets and can be stored and manipulated according to the user’s needs. Next steps for the team are to create tools to enable more mining by anyone.
On Monday afternoon, “in the early afternoon already 230 queries of the whole dataset had been put in and that was just a few hours from the release of the paper,” Pommier said. “It’s extremely active.”