Updates from the Genome in a Bottle Consortium

August 25, 2014

By Bio-IT World Staff 

August 25, 2014 | At last week’s Next Generation Dx Summit in Washington, DC, Justin Zook of the National Institute of Standards and Technology (NIST) discussed the progress and future goals of the Genome in a Bottle project, which aims to create near-perfectly characterized human genome sequences for use as reference standards. Members of the Genome in a Bottle Consortium have chosen several human DNA samples to repeatedly sequence on multiple technologies and bioinformatics pipelines, aiming to capture variants from single-nucleotide polymorphisms (SNPs) to massive structural changes with very high accuracy. The NIST could then distribute the same samples to outside organizations for resequencing, to help identify genomic regions or types of variants that tend to be called incorrectly.

“There is not really any widely-accepted set of metrics to characterize your variant calls,” explained Zook at the summit, “so we’re developing standards to address this… We’re also developing tools and methods to use these reference materials.” He stressed the need to smooth out inconsistencies in current sequencing technologies, which are not only a barrier to confidence in genetic research, but also a major obstacle to moving next-generation sequencing into clinical practice, a process for which the FDA demands stringent standards of analytical validity. (The FDA is providing a large share of the Consortium’s funding, in part to help clarify the accuracy of NGS-based diagnostics.)

“Different sequencing technologies give different answers,” said Zook. “If you compare three different platforms across a whole genome, you wind up with about 80% of the SNPs in the intersection, and hundreds of thousands of SNPs specific to one technology or two technologies. And this is also true for bioinformatics programs, even if you run them on the same sequencing data.” This 20% disagreement is a big concern for geneticists, especially when you consider that SNPs are the easiest types of variants to capture.

Zook’s address came less than a week after the Genome in a Bottle Consortium held a public workshop at NIST headquarters to hear user feedback and consider new initiatives. Zook reviewed the Consortium’s efforts around its three major reference materials —the cell line NA 12878, plus an Ashkenazi trio and an Asian trio supplied by the Personal Genome Project — as well as the possibility of expanding into new materials. Workshop members had recommended the addition of an African American trio and a Hispanic trio, projects which Zook said the Consortium is in favor of, although these resources are not currently available through the Personal Genome Project. Zook also suggested that a large family, including five or six children, would be a valuable resource the Consortium may pursue in the future.

NA 12878 is currently the best-characterized reference in the Genome in a Bottle portfolio, with SNP and indel calls for several regions available for public use. The Consortium’s next goal is to map large structural variants in regions of this genome, to begin releasing by the end of the year. Sequencing on all three family members in the Ashkenazi trio has also begun, and short-read data will be made available in late 2014 or early 2015. Zook also mentioned that this trio will provide a test bed for emerging sequencing technologies. “We’re also doing 100x PacBio sequencing on this genome,” he said. “We anticipate that this should be really useful, especially as we get into the harder parts of the genome and structural variants.”

The Asian trio will go through a similar sequencing process in the coming months, although slightly behind the Ashkenazi trio.

Because the Consortium has already released genotyping data for SNPs and small indels in NA 12878, Zook was able to highlight ways that Genome in a Bottle references are already being used in the wider community. Among other achievements, Genome in a Bottle data played a role in the first FDA approval of a next-generation sequencer, the MiSeqDx, for clinical use last November. Mount Sinai School of Medicine has also used this data for internal validation of its clinical sequencing programs.

The Consortium has produced 8400 vials of NA 12878 DNA for distribution to users, and the cell line itself can be ordered through the Coriell Institute. However, Zook stressed that the Genome in a Bottle calls on this sample remain a work in progress. “Our high-confidence regions right now are probably the easier regions of the genome to sequence,” he said. “We’re excluding a lot of the places where none of the next-gen sequencing technologies do very well, and if you’re trying to make calls in those regions, they may or may not be correct.”

More detail on the Genome in a Bottle Consortium’s validation process, which integrates data from different sequencing pipelines to make high-confidence calls, can be found in an open access Nature Biotechnology paper published earlier this year. The Consortium plans to hold new public workshops twice a year, in January and August, to continue receiving input from the scientific community.