PacBio Aims for Haplotyped Whole Genome Assemblies in Partnership with RainDance
By Aaron Krol
May 29, 2015 | Pacific Biosciences has worked hard over the past year and a half to demonstrate that its DNA sequencer, the RS II, can do something no other instrument can: assemble a whole human genome from scratch, without using an existing reference genome as a template. The process, known as de novo assembly, is harder and more costly than mapping to a reference genome, but it also reveals important information about the most complex types of variation in our genetic makeup.
PacBio can pull off this feat because its sequencer produces “long reads,” fragments of the genome spanning thousands of DNA base pairs. These reads also make PacBio’s technology ideal for haplotyping ― figuring out which genetic variants in an individual’s DNA come from chromosomes inherited from the father, and which from the mother. Recently, however, some enterprising companies have taken up the challenge of turning the “short reads” produced by PacBio’s competitors, just a few hundred base pairs long, into the kind of information needed to power haplotyping and whole genome assembly. Most prominently, this February 10X Genomics introduced GemCode, a platform meant to be combined with market leader Illumina’s short-read sequencers. (See, “10X Genomics at AGBT.”)
“We are the industry leading technology in terms of de novo assemblies, and our customers are very happy with those capabilities,” PacBio Chief Scientific Officer Jonas Korlach tells Bio-IT World. “But that doesn’t mean we’re going to rest on our laurels. We always want to push the limits.”
PacBio has now announced a collaboration with RainDance Technologies to develop a solution very similar to GemCode, which will allow the already sizeable RS II reads to be extended to 100 kilobases or more. Like the GemCode platform, this solution will involve isolating extremely long DNA fragments ― using RainDance’s “digital droplets” system for isolating single molecules ― and attaching short DNA barcodes to those fragments. After the fragments have been chopped up and sequenced on the RS II, a software program will recognize the barcodes and use them to reassemble the longer DNA elements captured in the RainDance instrument.
Because this procedure is expected to work on existing devices, the development cycle may be fairly fast. Both partners say that, while the collaboration is in its early stages, the key elements are past proof of principle. They also expect their solution to be better suited to de novo assembly than the 10X Genomics pipeline, which Korlach points out is “still based on short-read Illumina sequencing, with all the limitations in terms of read length… We absolutely see that there’s a big advantage to sequencing 10 to 30kb pieces and stitching those together into longer contiguous elements, rather than relying on something that’s 250 or 300 bases.”
One difference between the two technologies may be their ability to handle regions of the genome made up of short tandem repeats, small DNA sequences that are repeated many times in a row, which among other things are relevant to genetic diseases like Huntington’s and fragile X syndrome. While GemCode has not been on the market long enough for much information on its performance in these areas to be available, studies have shown that another barcoding system based on short reads, Moleculo, has been unable to perfectly resolve short tandem repeats.
Still, the GemCode platform should be well-suited to resolving many other types of long structural variants, as well as for haplotyping ― applications that have been among the biggest selling points for PacBio in a challenging market. That makes it critical for PacBio to stay a step ahead on delivering the most complete long-range genomic information.
The resemblance of the PacBio-RainDance partnership to GemCode is not a coincidence. The microfluidics system used by 10X in its GemCode instrument works by capturing single DNA molecules in beads of oil and separating them into microwells, a process that RainDance has used for other applications since the launch of its first instrument in 2008. In fact, RainDance believes 10X is infringing on its patents and has filed a lawsuit against the company.
“We have over 175 patents in our portfolio that cover all aspects from droplet formation and manipulation to barcoding [and] analysis,” says RainDance President and CEO Roopom Banerjee. “Candidly, we’ve published on certain applications of incarnations of what 10X has done years before 10X was founded.”
Banerjee is also extremely bullish on how the final PacBio-RainDance solution will stack up against GemCode. Among other advantages, he predicts that the RainDance instrument will allow for higher throughput, longer reassembled DNA fragments, and more diverse barcodes, which will allow a greater number of genomic targets to be parallelized in each sequencing run.
Most importantly, Banerjee expects his company to undercut 10X on cost. “I’ve already heard from many customers who are evaluating 10X’s technology that $500 a sample is just not economically feasible for them to be able to deploy commercially,” he says, citing 10X’s quoted price. “You can do a whole exome today for roughly $300 to $400 a sample, with the exome capture being about $50. So if it now costs $500 to haplotype and phase that exome, who’s going to do that?”
The economic argument only goes so far: low-cost whole exomes and genomes are primarily a hallmark of Illumina, which is a major reason short-read sequencing is dominant. Even with an efficient barcoding system, de novo assembly will remain something of a luxury in genomics for some time to come. The partnership between RainDance and PacBio is non-exclusive, however, and Banerjee anticipates that more collaborations are in the cards. (It’s worth noting that there is no formal relationship between 10X Genomics and Illumina; 10X simply designed GemCode as an add-on service for the huge market of Illumina users.)
In the meantime, there are a number of genomics labs that might be willing to pay a premium for a higher quality of assembly with barcoded long reads. Production-scale centers like Human Longevity, Inc., and the Broad Institute of MIT and Harvard use RS II sequencers to support their higher-throughput batteries of Illumina instruments, and are increasingly interested in the large structural variants that short reads can’t resolve. The RainDance instruments also promise to capture useful amounts of DNA from very small samples, which could be useful in applications like sequencing tumor or pathogen DNA from whole blood.
“We’re really excited about translational applications,” says Banerjee, but notes that both PacBio and RainDance instruments are currently sold for research use only. In the short term, the partners expect to see most of their uptake in basic research, from health-related fields like oncology to far-flung areas like agricultural genomics. Banerjee suggests one exception might be HLA typing for organ and tissue transplants, which has lower regulatory barriers than diagnostic testing and badly needs long-range information to resolve the complex structure of the human genome’s immune-related complexes.
Korlach agrees that this technology will be primarily used for research, at least at the outset, although he also points out that PacBio recently completed its second development milestone in a partnership with Roche aimed at developing mass market diagnostics.
Like Banerjee, Korlach is confident that the partnership will deliver the best long-range sequencing information available, maintaining PacBio’s position as the go-to technology for assembling whole genomes. “To us, it’s quite clear, and the data is also in the scientific literature that shows there are quite a lot of limitations to trying to do this with short reads.”