Sept 15, 2005 | Six years ago, Jonathan Rothberg's newborn son was rushed into intensive care. The founder of CuraGen spent the night pondering new approaches to DNA sequencing that could enable personalized genome information. His new company, 454 Life Sciences Corp., was born the following year, investing about $50 million into a new nanoscale sequencing platform that was formally unveiled last month.
In an exciting advance for DNA sequencing technology, Rothberg's team at 454 has essentially sequenced and assembled a bacterial genome sequence based on a mere 4-hour run on the company's proprietary $500,000 instrument. And in a related advance, George Church and colleagues at Harvard Medical School have described a novel method for resequencing DNA that reduces costs 10-fold. The process has been licensed to Agencourt Bioscience.
454's rapid generation of 25 million bases of bacterial genome data using its "sequencing by synthesis" approach afforded the assembly of almost the complete 580,000-base genome of Mycoplasma genitalium with greater than 99 percent accuracy. The group, reporting in the online edition of Nature, claims a 100-fold increase in efficiency over traditional sequencing methods.
"This paper struck me as approaching one of the quantum leaps the National Human Genome Research Institute asked for in its vision for the future of genome research a few years ago," said Nature senior editor Chris Gunter. "Here's the first new technology since Sanger sequencing."
"It's completely analogous to personal computers displacing mainframes," enthused Rothberg. "Now, anyone can have their own genome center. If you can miniaturize something, then everything gets cheaper and faster."
DNA sequencing chemistry hasn't changed since the invention of the dideoxy method by Nobel laureate Fred Sanger in 1977. Companies such as Solexa, Helicos, and 454 have been developing new "single-molecule" approaches to virtually assemble genomes from hundreds of thousands of DNA fragments (see "Single Molecule Signals," May 2005 Bio-IT World, page 6).
The 454 approach builds a picture of newly synthesized DNA fragments on microbeads, one base at a time. Rothberg explains: "[We] nebulize the DNA into little fragments, shake it in oil and water, so each DNA fragment goes into a separate water droplet. So instead of bacteria, we separate the DNA into drops. Then we do PCR, so every drop has 10 million copies. Then we put in a bead, drive the DNA to the bead, so instead of the cloning and robots, one person can prepare any genome."
The DNA-coated beads are loaded into the microscopic hexagonal wells of a fiber-optic slide, which contains about 1.6 million hexagonal wells. In 454's benchtop instrument, solutions containing each nucleotide are applied over the wells in cycles - T-C-A-G - repeated dozens of times. As each base is incorporated into the new DNA strand, pyrophosphate is released, which generates photons that are measured by a CCD sensor under the slide.
The automated sequencing-by-synthesis reactions (42 T-C-A-G cycles) took just 4 hours. With an average read length of 110 bases and 40-fold sequence coverage, and correcting for accuracy, the researchers covered 96.5 percent of the bacterial genome with 99.96-percent accuracy from a single instrument run.
"The referees felt that the improvement in sequencing technology was very important and would make big strides in the field, to the point of changing how sequencing centers are set up and run," says Nature's Gunter. 454 instruments are being tested at the Broad Institute, the J. Craig Venter Institute Joint Technology Center, the Wellcome Trust Sanger Institute, and other leading sequencing centers.
Rothberg concedes that there are still key improvements to be made in read length and accuracy. For example, while the average read is 100 bases, the instrument can generate accurate runs of 200 bases, occasionally up to 500 bases.
Noting parallels with Moore's Law, Rothberg's team concludes: "Future increases in throughput, and a concomitant reduction in cost per base, may come from the continued miniaturization of the fiber-optic reactors, allowing more sequence to be produced per unit area - a scaling characteristic similar to that which enabled the prediction of significant improvements in the integrated circuit at the start of its development cycle."
Among the 56 co-authors on the Nature article is Gene Myers, the Berkeley geneticist who assembled the human genome at Celera Genomics.
For those who balk at a $500,000 price tag, the Harvard Medical School team led by George Church utilizes readily available tools and reagents. For one-third of that sum, Church estimates his method offers a 9-fold reduction in the theoretical cost of sequencing a human genome, from $20 million to about $2.2 million. "Improvements are coming very quickly," he said. "The [desirable] cost of $1,000 for a human genome should allow prioritization of detailed diagnostics and therapeutics."
The Harvard method, described in Science, produces very small fragments of new DNA sequence, so is better suited for DNA resequencing - comparing the new sequence to a reference sequence. Nevertheless, many likely lab applications - from genotyping haplotypes in a disease study, searching for mutations in cancer resistance, or identifying microbial strain variants - would fall into this category.
The Harvard group sequenced a novel strain of the bacterium Escherichia coli. The process adapts a commonly available microscope with a digital camera. Some 14 million DNA-coated beads are squeezed onto a slide the size of a dime. Short 9-base DNA tags, with each of the four bases present at a specific query location, are ligated onto the bead-attached DNA fragments. A different fluorescent dye identifies each of the four bases in the query position. An algorithm aligns the short DNA sequences onto the appropriate reference sequence.
The Harvard group also sees room for improvement. They write: "We collected [approximately] 786 gigabits of image data from which we gleaned only [about] 60 megabits of sequence. This sparsity - one useful bit of information per 10,000 bits collected - is a ripe area for improvement. The natural limit of this direction is single-pixel sequencing, in which the commonplace analogy between bytes and bases will be at its most manifest."
Perhaps the biggest question is the relative value of de novo sequencing to resequencing. Rothberg insists that, "In the real world - all the genome centers, most academic and commercial labs, and customers we and Roche have talked to - you must do at least some de novo sequencing to be of any real value."
Margulies, M. et al. “Genome sequencing in microfabricated high-density picolitre reactors.” Nature. Doi:10.1038/nature03959
Shendure, J. et al. "Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome." Science online 4 August, 2005. DOI: 10.1126/science.1117389