Truth Challenge v2: Latest Challenge Results From Genome In A Bottle

By Allison Proffitt

September 2, 2020 | PrecisionFDA and The Genome in a Bottle (GIAB) consortium have announced the results of the Truth Challenge V2: Calling Variants from Short and Long Reads in Difficult-to-Map Regions such as segmental duplications, and the Major Histocompatibility Complex (MHC).

The PrecisionFDA Truth Challenge ran from May 1 to June 15 and attracted 64 submissions from 20 teams using data from the Illumina NovaSeq, PacBio HiFi reads from the Sequel II System, and long reads from the Oxford Nanopore PromethION sequencing technologies.

This is the second Truth Challenge from PrecisionFDA and The Genome in a Bottle (GIAB) consortium, explains Justin Zook who co-leads the Genome in a Bottle Consortium’s work at the National Institutes of Standards and Technology. The first version of the Truth Challenge happened about four years ago, and a lot has changed since then.

“That [2016] version [of the] benchmark relied primarily on short read sequencing technologies to form the benchmark,” Zook explained “It excluded things that were really difficult for short reads to measure.” Those difficult regions included segmental duplications, and the medically important, highly polymorphic regions called Major Histocompatibility Complex (MHC). “After the first Truth Challenge, the methods were really accurate within the regions that were accessible at that time with the short reads. They were getting well over 99% accuracy,” Zook said. “But that was for defined regions of the genome. Outside those regions there was still a lot of room for improvement.”

Over the past few years, GIAB has been building a new version of the benchmark that includes long reads sequencing technologies—"in particular the PacBio HiFi circular consensus sequencing as well as the 10x Genomics linked reads sequencing,” Zook explained.

Zook and colleagues posted details of the newest version of the benchmark—V4.2—to bioRxiv in July. “Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16% new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., PMS2),” the authors write. “We increase coverage of the GRCh38 assembly from 85% to 92%, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and assembly errors) that should not have been in the previous version.” V4.2 still excludes highly similar segmental duplications, satellite DNA such as the centromeres, many mid-sized indels >15 bp, and structural variants and copy number variants.

Family of Data

The subjects of the Challenge and the GIAB benchmark are the genomes of an Ashkenazi trio from the Personal Genome Project: HG002, a son, and HG003 and HG004, his parents. The HG002 sample was the blinded sample in the 2016 challenge, but this time was the focus of methods development. Benchmarks for the HG003 and HG004 samples had been the starting point in Truth Challenge V1; this time these were the blinded samples.

For each of the three genomes, participants were provided three FASTQ file datasets: whole genome sequencing from Illumina, PacBio, and Oxford Nanopore (ONT). They could use one or more of the three datasets to create VCF files based on their mapping and alignment algorithms and variant callers.

“This time around we asked the participants to analyze all three of those individuals,” Zook said. “They could do a comparison to our new benchmark for the son but could not do it yet for the parents.” The new benchmarks, or truth sets, for HG003 and HG004, were released after the submission period closed.

The GIAB benchmarks or truth sets themselves are generated in ways similar to what the competition participants use. “We do use some of the same alignment and variant calling methods; we use some of the same data as well,” Zook said. But the GIAB team has more data available. Their sequencing data has more coverage than the Challenge participants’ 35x sequencing coverage, and the GIAB team can make use of other publicly available data that participants can’t use during the competition.

Winning Data, Winning Teams

Once the submission period closed, submissions were benchmarked following best practices from the Global Alliance for Genomics and Health (GA4GH), GIAB’s new V4.2 HG003 and HG004 benchmark sets, and the V2.0 GIAB genome stratifications. Performance was evaluated in three categories: accuracy in calling the MHC, accuracy in “difficult-to-map” regions and segmental duplications, and accuracy across all benchmark regions. PrecisionFDA and NIST worked together to calculate F1 scores (harmonic mean of precision and recall) for SNVs and indels together and averaged the results for HG003 and HG004.

A public table with detailed performance metrics for all submissions is available online. The table includes the combined SNV and indel F1 metrics used for the awards, as well as other metrics like precision and recall, and metrics stratified by SNV and indel. A description of the table values is available. Participants who did not wish to be identified have unique 5-letter identifier.

Participants primarily used data from Illumina (24 submissions) and PacBio data (17 submissions), with 20 additional submissions using multiple datasets. Top performers for each technology (Illumina, PacBio HiFi, ONT, or multi-technology) were named for each variant calling category: all benchmark regions, difficult-to-map regions/segmental duplications, and MHC. Winners included teams from Sentieon, USCS CGL and Google Health, DRAGEN team at Illumina, Seven Bridges, Roche Sequencing Solutions, the Genomics Team in Google Health and Wang Genomics Lab.

No one group won top performer in each category for a single technology, and the F1 scores for the multi-technology entries were consistently higher than any single sequencing technology alone, particularly for the MHC region.

Some teams only worked with one type of data. For example, the DRAGEN team at Illumina focused on Illumina data and won two of the available best performance awards: for difficult-to-map regions and all benchmark regions. Seven Bridges won the third performance award for Illumina-only data. Other groups submitted entries with both single technology and multi-technology approaches. Sentieon won the most best performance awards including for difficult-to-map regions and MHC using only PacBio HiFi reads, and also for MHC and all benchmark regions (one of a three-way tie) using a combination of Illumina, PacBio HiFi, and ONT data.

Challenge Culture

Sentieon won the most individual performance awards of any group in the Truth Challenge V2, and also won the PrecisionFDA Consistency Challenge in 2016 where they were awarded the top overall performance prize as well as the highest reproducibility award. Brendan Gallagher, Sentieon’s Business Developer Director, served on both the 2016 and 2020 teams.

“In genomics or genetics, individual samples don’t have a direct method to measure truth,” Gallagher pointed out. Sequencing results vary based on both the individual sample and the platform. That’s why these challenges are so important, he explained.

“You need the truth standard to be able to evaluate the tools and the total workflows, otherwise you don’t know what is accurate. The work that Justin does, and the team at NIST does, is really important,” Gallagher said. “It’s a really important resource for the entire world to believe in their results and believe in the results they are producing for a patient.”

Gallagher said Sentieon participates in challenges both to test and validate their work internally as well as broadcast their tool strengths to customers. The company is lean, he said, with only 13 employees. The PrecisionFDA challenges give the company the opportunity to further improve its tools thanks to the truth sets made available.

“The continued development and effort put in by the NIST team absolutely helps us improve our product and also gives our customers confidence that the product they’re using is world class,” he said. “Our goal is to provide people with easy-to-use, fast and efficient software that’s also accurate. Grading accuracy is completely reliant, basically, on NIST and the Genome In A Bottle work.”

Progress Made, Next Steps

Zook is pleased with how the field is progressing and the advances that the Truth Challenge highlights. “There has been a lot of improvement in variant calling in some of these more difficult regions of the genome,” he said, crediting both new bioinformatics and new chemistries.

“For the Illumina data, the DRAGEN team at Illumina developed some new methods for accessing difficult-to-map regions of the genome, and then the Seven Bridges Genomics team developed graph-based method that makes better variant call in the MHC,” he said. “All of those methods were not available at the time of the initial Truth Challenge; some of them were even refined or first developed as part of this challenge.”

“On the sequencing technologies side, now you can—particularly in these difficult regions—get much better variant calls from the long-read sequencing technologies like PacBio and Oxford Nanopore,” Zook added. “That’s thanks to development on the technology side from the sequencing… as well as a lot of development in machine learning and AI-based methods to analyze those data to develop really highly-performing variant calling methods relatively quickly after these new technologies are developed.”

Zook wouldn’t commit to any timeline for the next PrecisionFDA Challenge, but he did give some hints on where Genome in a Bottle has been focusing. While the Truth Challenge V2 addressed some difficult regions of the genome, Zook pointed out that there are harder regions still to navigate and there is work yet to be done.

Benchmarking larger and more complex variants is one area of focus. “Both of these challenges were targeting to small variants—SNPs and small insertions and deletions that were less than 50 base pairs in size. We haven’t had any challenges for larger variants at this point,” Zook said.

He also said the GIAB Consortium is interested in using de novo assembly methods to form benchmarks—“both how you benchmark de novo assemblies, as well as how you can form variant benchmarks from the de novo assemblies.” Within the next few months he hopes to have an assembly-based benchmark that will cover some of the difficult, medically-relevant genes that are not currently covered in benchmarks. A whole genome de novo assembly-based benchmark is “more like a year away.”