De Novo Human Genome Assembly Using Long Reads

November 10, 2014

November 10, 2014 | Evan Eichler of the University of Washington and colleagues have published a paper in Nature detailing a new whole human genome assembly, using the long-reading SMRT sequencing technology of Pacific Biosciences. The source of the genetic material was CHM1, a haploid human cell line currently under investigation by Eichler's team and researchers at Washington University in St. Louis as a potential alternate human reference, known as the "platinum genome." (See, "The Hunt for a New Human Reference Genome.")

The new Nature paper builds on earlier work within PacBio on a de novo assembly of CHM1, by applying further quality control to the finished product and examining areas where SMRT sequencing could provide a more complete picture of the human genome sequence than the current reference genome. SMRT sequencing, thanks to its long reads frequently of over 10,000 base pairs, has been used to resolve complex structural variations and highly repetitive regions that are difficult to capture through other sequencing methods.

Eichler and colleagues reveal that their assembly extends into over half the gaps that still remain in the reference genome, in many cases completely closing them. The authors suggest that some of the new sequence occurs in exons or plays a regulatory role in gene expression. Comparison with existing assemblies also reveals over 26,000 structural variants in CHM1, a large majority of which had never before been detected in any genome. This level of structural variation is likely typical of human genomes, but because reference-guided mapping of short reads has difficulty resolving large insertions, deletions, and duplications, it is rarely seen with this clarity.