Plotting the Discovery Curve: 33 New Cancer-Causing Mutations

By Allison Proffitt

January 28, 2014 | The Broad Institute highlighted an expanded universe of cancer mutations in a Nature paper published last week (10.1038/nature12912). The work was part of The Cancer Genome Atlas (TCGA), and expanded the list of genes known to play causal roles in tumors from 135 to 198 and modeled how that list may continue to grow.

Collaborators looked at 5,000 total tumor samples representing 21 different types of cancer. The samples were all primary tumor samples taken at diagnosis and fed into TCGA pipeline. Mike Lawrence, a computational biologist at the Broad and first author of the paper, guessed that some samples could have been five years old or more dating back to TCGA’s birth in 2008. Most samples were sequenced at the Broad Institute on Illumina GAII instruments, though some were sequenced at the Genome Institute at Washington University and at Baylor.

Lawrence highlighted that the study looked only a random mutations having arisen in tumors or blood cancers. “In our study what we looked at were not mutations you were born with… they were random mutations that happened in an individual cell in the body at some point well after birth.” These aren’t the “genes for” cancer, he stressed. By comparing tumor genomes and their normal counterparts, Lawrence said the group actually threw a lot of data away. Any point mutation that appeared in both normal and tumor cells was discarded.

What the team was left with was a list of 33 new mutations. The growth of the list by 25% reinforced to Lawrence that, “We’re still in the discovery phase.”

To try to gauge where in the discovery phase we are, Lawrence modeled the discovery curve. “We did this experiment where we artificially looked at smaller and smaller subsets of our final dataset… Given that we had 5,000 samples and we found all these genes, how many would we have found if we only had 1,000 samples or 2,000? We took sampling points all the way back to having no data and that allowed us to get a curve of what the progress of discovery looks like.”

And what does the future look like for cancer mutation discovery? “The curve hasn’t leveled out,” Lawrence said. “It’s increasing steadily.”

As far as numerical predictions for a mutation cap, Lawrence doesn’t make any. More data, of course, is needed to refine those numbers, but not that much more data. The discovery curve predicts that creating a comprehensive catalog of cancer genes for scores of cancer types is feasible with as few as 100,000 patient samples: or about 2,000 samples of each cancer type across roughly 50 tumor types.

Then, Lawrence points out, there’s a question of coverage. p53, an infamous example, is involved in almost every type of cancer; Dis3 seems unique to multiple myeloma. Earlier studies sought genes involved in at least 20% of patients of any given tumor type. Now, Lawrence said, “we’re starting to saturate the 10% level.”

The goal, he says, should be around 2%.

“There are many important genes that are involved in only a few percent of patients: 2% or even 1% of patients. That may sound like you’re starting to get down to the bottom of the barrel, but that still represents millions of people!” Lawrence said.

Most patients have somewhere on the order of 10 “driver mutations”, Lawrence said. A few of those will be the well-known players: p53 and its ilk. But seven or eight of each patient’s “driver mutations” will be, “these stragglers that will be late in discovery,” Lawrence believes. “The vast majority of driver mutations will be in the class of what we call intermediate frequency mutations—around 2% to 5%. It’s very important to understand these genes and get them into drug development pipelines.”