DNAnexus Reanalyzes Cancer Genome Atlas Data

December 9, 2016

By Bio-IT World Staff

December 9, 2016 | DNAnexus announced today that it has performed uniform reanalysis and mutation calling on the world’s largest pan-cancer dataset, encompassing 10,487 patients across 33 cancer types from The Cancer Genome Atlas (TCGA). The mutation data has undergone additional quality control and filtering and is available as an open access dataset for download at the NCI Genomic Data Commons and on Synapse. The pipelines used are also fully available to researchers who wish to reproduce this TCGA mutation discovery via DNAnexus or a GitHub repository.

“In the past, mutation calling for TCGA samples was primarily done for individual tumor types, with projects using different mutation callers or different versions of the callers, meaning the data wasn’t uniform,” said Carolyn Hutter, PhD, Program Director, Division of Genomic Medicine at the NHGRI in a statement. “We now believe the best way to do analysis is to have a uniform set of calls generated by multiple mutation callers, with quality control and filtering, across multiple cancer types. That’s why the TCGA team decided to go back and recall the over 10,000 exomes in TCGA and produce this multi-caller somatic mutation dataset.”

The necessary compute resources for the mutation calling across cancer types was not in place at TCGA member institutes. Important requirements for the mutation calling project included patient security, a scalable environment that could handle tens of thousands of exomes, and reproducibility of results. The DNAnexus Platform had this capability.

“This was a massive undertaking, and a prime example of the benefits of the DNAnexus Platform,” said Richard Daly, CEO of DNAnexus in the same statement. “Over a four-week period approximately 1.8 million core-hours of computational time were used to process 400 TB of data to yield reproducible results. Now consistent data is available to researchers worldwide to interpret genomic features, oncogenic signatures, and potential treatment targets shared across multiple cancer types.”

“Realigning TCGA data with a single methodology across new standardized mutation callers will make the tumor data much more relevant to the community. DNAnexus created uniform, and sensitive, analytical treatment through version-controlled analyses and tools, that would have been challenging to replicate at any single facility in a reasonable time frame,” said David Wheeler, PhD, Professor, Department of Molecular and Human Genetics at Baylor College of Medicine. “With this standardized set of mutation calls obtained by several callers, we’ll be able to identify genetic alterations contributing to cancer that are shared between tumors independent of the tissue-of-origin. We are optimistic that having access to such information will spur advancement in precision medicine.”

Researchers now have access to the TCGA pipelines via the DNAnexus Platform in addition to a GitHub repository. DNAnexus works to ensure mechanisms for data access requests and vending data to approved requestors meets security standards for dbGaP and TCGA data in the cloud.