September 28, 2010 | PROVIDENCE, RI—As a medical student at Brown University, Barrett Bready set up the first biotech curriculum in the country where medical students could rotate through a for-profit entity. Bready relished the experience, and quickly identified genomics and regenerative medicine as the two big drivers for the coming decades. As he set out to find the most promising technology on the sequencing side, it turns out he didn’t have to look far.
Bready is president and CEO of NABsys, one of an ever-growing crop of 3rd- (or 4th-) generation sequencing technologies vying for commercial—and in his case clinical—success.
“We are developing the first clinically relevant DNA sequencing technology,” says Bready. “We take that to mean that our instruments will have the ability to do large-scale clinical studies that allow for the elucidation of the genetic basis of complex disease and the use of that information in clinical care. Implicit in this is our belief that the current technology does not meet that standard.”
“Clinically relevant doesn’t just mean relevant in the clinic. It means relevant for the clinic,” says John Oliver, NABsys vice president of research and a former Brown University professor. “Research being done now is relevant to the clinic in the future, meaning that clinical-grade accuracy is needed in advance of the adoption of sequencing in the clinic.”
In a fiercely competitive environment, the NABsys strategy is to marry the advantages of nanopores with those of sequencing by hybridization (SBH). “In its classical array form, SBH works for de novo sequencing, but only for short fragments,” explains Bready. “For a human genome, the library size would approach 1020. So it works, but it doesn’t scale.” Nanopore sequencing, though, “scales well but doesn’t actually work. We have something we think both scales and works.”
Since winning the first NIH “$1000 genome” award for an electronic technology back in 2007, NABsys has kept a fairly low profile. The company is still relatively small, about 20 employees under the watchful eye of Bready’s pet Cavalier King Charles Spaniel, Charlie, but it is growing.
The NABsys brain trust argues that there are five critical metrics for genome sequencing: accuracy, completeness, scalability, speed and cost. Speed and cost attract most of the media’s attention, but Bready and Oliver insist that accuracy on all length scales—from single bases to large-scale rearrangements—is a critical factor (see, “State of Assembly”).
“Except for Sanger sequencing, we don’t have a clinically relevant sequencing technology,” insists Oliver. “Do you think we’re getting correct genomes out of 2nd-generation sequencing technology? You’re getting tons and tons of data for lower and lower cost, but if you can’t get a correct genome out of it, is it clinically relevant? A short read can be 100% accurate and irrelevant, because it appears many times in the genome and you don’t know where it [belongs in the final assembly].”
Another issue is scalability. “Can your platform produce enough data to meet the worldwide demand once clinical applications are enabled?” Bready asks rhetorically. A major application is certain to be oncology, given the wholesale rearrangements of a cancer cell’s genome, not merely the cataloguing of single mutations. “That heterogeneity ends up killing patients. The ability to get information on single molecules and assemble genotypes is very exciting.”
To make his point, Bready flashes a slide with a very large number on it: 1020. “This is our estimate for the number of bases of raw sequence data required each year in the developed world to handle the cancer genomics problem,” he says. (The calculation goes like this: 1010 bases in a diploid genome x 20-fold coverage x 100 genotypes/tumor x 5 million cancer diagnoses per year.) It’s also a good estimate for the number of grains of sand on the planet.
Bready follows that with a chart estimating the number of competitors’ machines that will be required to generate that amount of raw sequence—something in the millions—compared to the dozens or hundreds of each vendor’s machines currently deployed. “This won’t scale using optical technologies with Hubble-like cameras and lasers,” Bready says. By contrast, the throughput of the semiconductor industry scales beautifully. Looking to distance himself from nanopore companies such as Oxford Nanopore, Bready says: “By moving beyond nanopores to detectors that have higher data throughput, improved resolution, and larger critical dimensions, we think we’ve developed the first and only technology that can do single-molecule sequencing and is compatible with standard semiconductor fabrication techniques.”
The idea is to do SBH in solution, so that the single-stranded DNA template is studded with bound oligonucleotides. (The oligos would be short, less than 10 bases each.) By passing the complex through what Bready terms a nanoscale solid-state detector, the relative positions of the bound oligos can be determined.
Among the putative advantages of the NABsys system are: long imputed read lengths; solid-state materials; no reliance on enzymes; and single molecule templates. The signal-to-noise ratio as a bound oligo passes through the nanopore has jumped from 13 (in 2007) to about 140 in 2008—more than sufficient, says Oliver. The DNA molecules speed through the nanopore at about 1 million bases/second. Oliver says his team has slowed it down about 20-fold, which is “about as slow as we want. If you go much slower, you begin to worry about Brownian motion.”
Bready shows an unrecognizable picture of the Mona Lisa as if assembled using short reads. The NABsys approach is equivalent to reconstituting the image by overlaying different color maps, until the final image comes dramatically into focus. “What we’re doing at NABsys is like a scaled-up version of paired-end reads, where there are many known sequences separated by lengths that we measure electronically. Instead of dictating insert sizes to bridge repeats, we’re letting the genome tell us at what distances those repeats lie.”
NABsys has taken a different approach than most next-gen companies, which typically find some interesting chemistry first, scale up, and finally hand off to the informatics team. The company did much of its informatics and simulation work before building the detector. “If your goal is to have chromosome-length contigs with zero [sequence] mistakes, what kind of information would you want? What sort of resolution? What kind of probe libraries? We explored that space in silico before building anything,” says Bready.
Two key personnel in that process were Brown University computational biologists Franco Preparata and Eli Upfal. Years ago, they founded a company called GeneSpectrum with Oliver, based on SBH algorithms and chemistry, which NABsys acquired in 2006. NABsys has since bolstered its board with the addition of Lee Hood, Stan Rose, and Ray Stata (founder of Analog Devices).
The computational task of reassembling a genome sequence is illustrated in a graph, where the diagonal y=x line represents the correct assembly. Lines diverging vertically represent other mathematically valid sequences. The degree to which they flirt with the central line indicates how much CPU time is required, although Bready says that these reconstructions can be performed on a desktop.
Oliver has viral genome assemblies down to zero mistakes. In principle, he says NABsys can sequence a whole human genome—the target being to generate 100-fold coverage of a human genome in a few hours—but the probe design is not without its challenges. “As you use algorithmic and physical criteria to construct the pools of probes, it may be advantageous to pull some tricks with some of the probes to make sure the majority of the hybridizations are [satisfactory]... There’s development work ahead.” For example, the chemical backbones can be tweaked in order to maximize duplex stability.
Oliver admits he has not finalized the design of the probes, but adds, “We’ve shown that it’s possible to bind probes to DNA, see that hybridization in a nanopore, measure distances between them, and that distance measurement is accurate enough to reconstruct a human genome with extremely high accuracy. Can we now build an instrument with all the integrated electronics to do it in an hour? The answer is probably yes. But there’s still development work in the chemistry, detector design, and the algorithms.” I ask if it is likely to be a small instrument? “Could be!” chuckles Bready. “It depends on how we choose to define it, but potentially it could get very small… The detectors are vanishingly small. The only thing that takes up space is liquid handling.”
The question of how many nanopores will be in the prototype prompts a whispered conference between Bready and Oliver, like panelists conferring on a game show. “Which number do you want to give him?” Oliver asks. Finally, Bready says: “We’re not saying this is what we’re doing, but with 64 detectors per chip, you can do 30 human genomes at 30x coverage per instrument per day. In the semiconductor world, 64 is a pretty modest number.”
On the question of intellectual property, Bready says: “We are aware of nanopore IP held by other institutions. It does not affect us. We have moved beyond nanopores both in terms of the information we use electronic detection to obtain as well as the types of detectors we use.” He concludes with the notion that NABsys’ technology will change the way medicine is practiced.
“There’s this notion that DNA sequencing has or will soon be commoditized. We don’t think that’s the case at all. The only thing that’s been commoditized is sequencing-by-synthesis reads. The reason there’s not much impetus to switch from Illumina to SOLiD is, in general, they’re offering the same thing—similar read lengths and cost points. We think we’re offering something totally different. For one thing, it starts with the right answer. The algorithmic work combined with our electronic detection makes us think we can get to something like chromosome-length contigs where all the bases are correct. That’s just a totally different game than what’s being played now. This can give clinical medicine the same dominance over systemic diseases such as cancer that we now enjoy over infectious disease.”
As for the launch date, Bready hedges: “Some people think single-molecule detection is a five-year proposition. That’s not the case. It’ll be significantly less than five years.” •
State of Assembly
John Oliver bristles at the notion that the current accepted state of human genome sequencing (30x coverage) is adequate. At sequencing conferences, he says he regularly hears scientists complaining, “You can’t do anything with this data. You can’t assemble a human genome.” Even scientists tackling medium-sized genomes are shooting for 150x coverage and four different library sizes.
The media and marketing emphasis on the cost per base is a holdover from the genome wars, when it was all about which center could produce the greatest amount of data least expensively. “That’s bled over,” says Oliver. “What was lost was, when you change the technology, the metric should be its cost per correct genome, not cost per base.”
While the first complete cancer genomes are providing some useful information, Oliver insists it’s not close to what we should be getting. “How relevant is it to the questions you want to answer? Can you use short-read data to accurately reconstruct genomic rearrangements happening in cancer? There is no pipeline. You can’t take the data, stick it on a server and tell me what the assembly is.”
Assembling a medium-sized genome requires months of labor and continual manual intervention. “The data is so low in information, you have to tweak the assembler every time you do a run,” says Oliver. “On the back end, there’s a huge amount of computation and personnel time spent trying to put the data together in the most believable scenario.”
The implication is that NABsys’ ultra-long imputed reads will take the guesswork out of sequence assembly and annotation. K.D.
This article also appeared in the September-October 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.