GCAT Team Publishes Methods, Results from Benchmark Tests of Aligners and Variant Callers

February 26, 2015

By Bio-IT World Staff

February 26, 2015 | GCAT, the Genome Comparison & Analytic Testing platform for benchmarking the performance of basic informatics tools, now has a formal home in the scientific literature with the publication of a paper in Nature Communications describing the platform, its purpose and certain major results. Authors Gareth Highnam, Jason J. Wang, Dean Kusler, Nir Leibovich and David Mittelman hail from Arpeggi, the Texas-based sequencing analytics company that first unveiled GCAT as a public service to the bioinformatics community at the Bio-IT World Conference & Expo in 2013. (See, “Arpeggi’s Harmonious Approach to NGS Data Analysis.”) They are joined by Justin Zook, who plays a prominent role in the Genome in a Bottle Consortium, and by Vinaya Vijayan of Virginia Tech.

GCAT emerged as a response to the proliferation of aligners and variant callers for interpreting raw next-generation sequencing data. These tools map short DNA fragments against known reference genomes to place sequencing data in context, and pinpoint where the genetic code differs from individual to individual, respectively, and are foundational to all studies using next-generation sequencing. However, it has been difficult to evaluate how well these tools perform relative to one another, both because there are few sample genomes well-characterized enough to show the absolute accuracy of any one tool, and because different sequencing experiments are not directly comparable.

The GCAT platform, freely available online at www.bioplanet.com/gcat, addresses these problems by offering highly curated reference genomes, along with sets of both real and simulated short reads taken from those genomes, to test how multiple informatics pipelines perform when fed exactly the same data. As the authors write in their Nature paper, “GCAT functions as a ‘data playground’, in which users can compare tools and then dive deep into the comparison to narrow in on benefits and limitations of various tools.” Reports generated in these test runs can then be made publicly visible to all visitors to the GCAT site.

While GCAT was originally envisioned as an internal testing ground for Arpeggi’s custom tools, it has become a widely shared resource and continued to add new reports and reference sets, even as Arpeggi has been acquired by Gene by Gene and members of the original team have moved on. The site now hosts thousands of benchmark reports, helping bioinformaticians choose the best pipelines for their experiments. Meanwhile, the GCAT Project has joined forces with the Global Alliance for Genomics and Health and plans to add new capabilities for visualizing the results of benchmarking tests, in an effort to publicize universal standards for genomics toolkits.