Illumina Releases Long-Read Apps and Library Prep Kits

By Bio-IT World Staff

June 27, 2014 | Illumina is looking to expand its footprint in long-read sequencing applications, with big announcements this week about its TruSeq synthetic long-read technology. The ability to sequence DNA in fragments thousands of bases long has opened up niche applications for competitors like PacBio, whose RS II sequencer now routinely delivers reads 10,000 base pairs long. Oxford Nanopore, whose MinION sequencer is now in early access trials, is making a bid for the same applications — nearly the only field in gene sequencing not dominated by Illumina, whose MiSeq and HiSeq instruments are rapid, highly accurate, and relatively cheap to run, but split samples into reads no more than a few hundred base pairs long. (For the latest on how long-read technology is being used, see "PacBio Users Share New Tools and Applications at Meeting in Baltimore.")

Illumina took the first steps toward offering its own long-read service in December 2012, when it acquired a tiny startup called Moleculo. (See, "Moleculo Man: Mickey Kertesz on Illumina's Sub-Assembly Acquisition.") Moleculo's process, which Illumina has since rebranded as TruSeq, uses a combination of clever library prep and computational techniques to turn the short reads from HiSeq instruments into synthetic long reads. A DNA sample is first cut into fragments roughly 10,000 base pairs long, and those fragments are sorted into 384 separate wells. The contents of each well receive a different chemical barcode, which can later be used to tell from which well any given read originated. Then the fragments can be chopped up and sequenced as short reads normally, and a computer uses the barcodes to reassemble the longer fragments.

Illumina has been offering TruSeq as a service since mid-2013, but until recently customers could not perform the process in their own labs. That has finally changed, with the release of a TruSeq DNA library prep kit for HiSeq users this Monday. At the same time, BaseSpace, Illumina's cloud environment for bioinformatics built into every MiSeq and HiSeq device, launched two new apps that cover the computational side of TruSeq. One app assembles the short reads into synthetic long reads, while the second performs haplotype phasing for human genomes.

Illumina also released the first public data generated with its TruSeq assembly app this Wednesday, using the rice genome as a test run. The app was able to combine short HiSeq reads into synthetic long reads with an N50 length of over 7,000 base pairs, roughly comparable to a PacBio run; readers with BaseSpace accounts can examine all the data here. Illumina has also performed a TruSeq assembly of the model organism C. elegans.

This could be compared to de novo whole genome assemblies released by PacBio over the past year, although Illumina's TruSeq runs have not yet been assembled into whole genomes. PacBio has made public its assemblies of several bacterial genomes, Drosophila, spinach, and even a haploid human sample, all without guidance from a reference genome. This kind of de novo assembly has recently emerged as a major application for long-read technology. (A preprint paper first posted to bioRxiv this January, and updated last week, does describe a de novo assembly of Drosophila using TruSeq. The authors, led by Rajiv McCoy of Stanford, whose paper has not yet been published or formally peer-reviewed, were able to assemble the entire fruit fly genome in just over 5,000 contigs, but struggled with tandem repeats, a limitation of short-read technologies that may extend to TruSeq as well.)

It seems unlikely in the short term that TruSeq will become a go-to technology for long-read applications, but at a minimum, the new kits and apps will extend long-read use cases to the many labs that only have access to Illumina machines.