Decontaminating Sequencing Samples

By Bio-IT World Staff

December 12, 2013 | When trying to pin down the mutations that underlie a cancer case, sequencing a tumor sample alone is not enough. That sample has to be compared against the normal genome of the patient, and there are inherent risks in this process. “When DNA samples are collected and worked up prior to sequencing, or even during sequencing, there’s a risk that those samples will be cross-contaminated,” says Trevor Heritage, VP of Corporate Development and Strategy at Appistry. “Some of the variants that you find may in fact not be truly related to the origins of the cancer.”

To address the problem of cross-contamination, Appistry has released a new tool, ContEst, embedded in the Cancer Genome Analysis (CGA) Suite. ContEst makes a probabilistic analysis of a sample genotype to produce an estimate of that sample's overall level of contamination. This data can be used to filter out rare variants more likely to appear as a result of cross-contamination than because they exist in the target genome – a crucial layer of validation in cancer genomics, which is especially reliant on understanding low-frequency variants.

Like the other tools in the CGA Suite, ContEst was originally developed by the Broad Institute of MIT and Harvard for internal research, and later adapted by Appistry for commercial use. This is part of a larger partnership that also produced the Genome Analysis Toolkit (GATK) for general-use variant calling and annotation. The collaboration aims to put state-of-the-art bioinformatics tools into the hands of users without data science expertise, especially in the area of patient care. “Genomics is heading down the road towards more general clinical use,” says Heritage. “The Broad as an organization is not focused on deployment to those environments… Appistry is an ideal partner for the Broad in that respect.” Appistry provides both pre-constructed workflows to ease user management of Broad-developed tools, and the security and clinical validation needed to introduce those tools to medical practice.

Presently, most software that analyzes cancer genomes asks researchers to input an educated guess as to levels of contamination in their samples. These crude estimates can result in true variants being withheld from reporting, or in inaccurate calls being included in the analysis. ContEst introduces greater precision to this process, acting as a go-between from the raw variant reads of GATK to MuTect, another Broad-developed tool in the CGA Suite that identifies rare point mutations.

Although the CGA Suite is marketed to researchers and clinical labs in the cancer space, ContEst determines contamination levels in any genetic sample; Heritage suggests forensics as another area where small levels of contamination can make a serious difference. The Broad Institute and Appistry continue to work together on developing ready-to-use genomic workflows for other research scenarios, including population studies or studies of genetic disease inheritance. Future releases will follow the model of the CGA Suite, providing a pre-assembled package of tools designed for a specific use case, which can then be repurposed by more expert users. Says Heritage, “We’ll continue to follow that same blueprint where we provide something that’s fully empowering.”