By Bio-IT World Staff
August 18, 2014 | A team from Cold Spring Harbor Laboratory has released an algorithm, called Scalpel, for finding insertions and deletions in next generation sequencing data sets. Scalpel, which is open source and available for download on SourceForge, outperformed the popular tools GATK HaplotypeCaller and SOAPindel in test runs on both simulated and real whole human exomes.
Like other indel callers, Scalpel works by performing de novo assembly of regions of interest, so that misalignment to the reference genome cannot obscure the presence of an insertion or deletion. Scalpel's innovation is to repeatedly check its assembly before comparing to the reference genome, to account for simple sequence repeats that are a regular source of error in indel calling. When Scalpel assembles an exon, it collects reads that map to that exon (including partial matches), splits them into k-mers, and creates a de Bruijn graph to span the exon; however, if it detects repeats in the map, it iteratively increases the size of the k-mers by one base until the repeats are eliminated. This ensures that the final assembly of the exon is highly accurate while minimizing compute time.
The Cold Spring Harbor team's validation of Scalpel, published over the weekend in Nature Methods, compares Scalpel's performance on a live whole exome against HaplotypeCaller and SOAPindel. The donor is an individual with serious neurological disorders, which may be linked to a high incidence of indels. One thousand indels from this individual's exome, called by one or more of the informatics pipelines, were selected for focused resequencing. This resequencing revealed a 77% true positive rate for Scalpel calls, dramatically better than the rates for either of the competing tools; Scalpel performed especially well with indels longer than five base pairs, a traditional weak point for indel callers.
Finally, the authors demonstrate Scalpel's use on a large set of genetic data from nearly 600 families who donated samples to the Simons Simplex Collection, a project of the Simons Foundation Autism Research Initiative. Scalpel found a very high enrichment for indels in children affected by autism, compared with their unaffected siblings, a pattern that persisted even after excluding common variants.