Companies unveil next gen sequencers and data.
By Kevin Davies
June 10, 2008 | SAN DIEGO—At the latest CHI next-generation sequencing meeting*, the focus was split between the ever-improving performance of current next-gen platforms, and the prospects for the future.
Patrice Milos, Helicos Biosciences’ CSO opened the meeting by noting that the company’s recent paper in Science—on the single-molecule sequencing of the M13 virus—was “proof of principle” work using old chemistry that was actually completed 18 months ago. The latest technology boasts the addition of reversible terminators and improved template density to one molecule per square micron. Moreover, a new strategy is being developed to extend the sequence reads using a “controlled dark fill” of unlabeled nucleotides, then extending the sequence for another 24 “quads.” This technique could enable paired reads for larger DNA fragments up to 8-10 kilobases.
Abizar Lakdawalla (Illumina senior product manager) surprised some in the audience by informing them that their current Genome Analyzers (GAs) were now designated “classic,” with new upgrades being designated GAII. Subsequently, representatives from genome sequencing centers at Baylor; the Broad, Venter, and Joint Genome Institutes; and Washington University in St. Louis, all said they were upgrading to GAIIs. The GAII can generate 3 Gigabases using paired-end reads, with individual read lengths poised to increase from 35 to 50 bp. By this summer, Lakdawalla said that base calling with be done in real time.
The daily output of the GAII is 750 million bases, with 80% of the individual reads having no errors. The GA is being applied to many applications, including the “killer app” of epigenomics. Lakdawalla also said that Illumina’s “$100,000 genome” HapMap sample, performed by David Bentley’s U.K. group in six weeks last Christmas, consisted of 27 runs and 70 gigabases (GB) of data. Since then, an additional 48 GB has been generated from 13 more runs. “The data will be in the public domain very soon,” he said. Lakdawalla noted that some 250 GA systems have been installed so far, but significantly, it’s the smaller non-genome centers driving much of the demand.
Todd Arnold (Roche 454 director molecular biology) also revealed platform enhancements to the GS FLX arriving later this year. The GS FLX “Next-Gen” will boost sequence output fivefold (to 500 Mb/day) and increase individual read lengths up to about 400 bases. Technical enhancements include increased microwell density, reduced chemical crosstalk/background, and new paired-end strategies.
Representatives from five leading genome centers shared their experiences on the next-gene pipeline and workflow front. At Baylor College of Medicine, Donna Muszny said the institute is running ten GS FLX instruments, producing 42 GB/month. There are also two Illumina GAs, and two SOLiDs (Applied Bio later announced they were shipping another six instruments to Baylor). Muszny said there were of course technical problems—blocked valves, leaky connectors, and some software issues, solved by moving the analysis offline. While 454 remains the workhorse for high-quality draft genome assemblies, it is supplemented by Sanger, Illumina, and SOLiD data.
Andrew Barry noted that the Broad Institute was taking delivery of its first Polonator (See, “PacBio Sparks Florida Fireworks,” Bio•IT World, March 2008). “Spring has sprung in Boston, and members are very anxious to polonate,” he joked, though he later admitted, seeing as the institute has 20 GA instruments, that it was a kind of “toy for technology development.” At the Joint Genome Institute, Daniel Rohksar relies on two GS FLX machines and a pair of Illumina GAs for mostly microbial and plant genome sequencing. The highly repetitive nature of genomes, such as maize, puts a premium on read length for plant genetic analysis.
At Washington University, Vincent Magrini and colleagues use a dozen Illumina GAs, five 454s, and 1 SOLiD (with more en route). His team has estimated the cost of human genome assemblies using various platforms, but despite recent five-figure claims, still produces figures in the $280-600,000 range. Applied Bio, meanwhile, claimed to have internal results producing 17 GB mappable sequence on its SOLiD 2.0 systems, which launched last month. Beta testers have reportedly achieved similar outputs.
Future Technologies and Applications
While Stephen Turner (CSO, Pacific Biosciences) presented stunning early data on the zero-mode waveguide platform that he predicts will produce the 15-minute human genome in five years (See, “PacBio Sparks Florida Fireworks,” Bio•IT World, March 2008), there was also considerable interest in the presentation from William Glover, president of ZS Genetics. Using transmission electron microscopy, Glover can visualize single linearized molecules of DNA, in a process he likened to “untangling spaghetti.” Whereas natural DNA is invisible under transmission electron microscopy, heavy atoms (such as iodine and bromine) can be substituted in each base to render DNA strands visible and provide reads that could potentially run to 5 kilobases or more. Some of Glover’s photographic data did, he admitted, resemble the tire tracks of a lunar lander, but it was possible to be persuaded that he was presenting stretched out ladders of DNA.
Many impressive applications were presented, including Roger Maslen (Venter Institute) discussing single-cell DNA sequencing. Steve Kingsmore (NCGR) presented mutation data on mesothelioma patients, and digital gene expression studies on schizophrenia, in collaboration with Gary Schroth’s group at Illumina. These studies have identified candidate genes that are differentially expressed in patients that were not picked up using Affymetrix chips. In some cases, these genes show association and/or gene variants correlating with the disease.
Nicholas Schork (Scripps Genomic Medicine) discussed the “GWAS (genome-wide association studies) craze,” which he said has limitations. His group is exploring GWAS strategies at the DNA sequencing level, although he admits it would be computationally demanding. (A one-hour sequence-level GWAS study in humans would need 10-8 processors!) Schork also said the Cancer Genome Atlas project “is destined for failure” unless a suitable pipeline to make sense of the data is put in place.
Also in San Diego were representatives from several interested parties, including Complete Genomics, Invitrogen, and Oxford NanoLabs, some of whom might have something to present by the time of CHI’s next next-generation meeting, this September in Providence, RI.
Will single-molecule sequencing systems eventually replace the
original next-gen platforms from the likes of 454 and Illumina?
"I don’t think the single molecule is the Holy Grail, I think the $1000 genome is the Holy Grail! If the single molecule gets you there, great, if it doesn’t, there are other ways forward potentially… We started off as a single molecule company [Solexa] way before. It’s kind of a safety in numbers issue for us. If you have many copies of a molecule, your data quality improves substantially. There’s no penalty in time per se. The big benefit of the single molecule [approach] is you save the time for the amplification process. Our cluster prep takes around five hours, so you’re not saving a lot of time doing a single molecule, but you get all the benefits of a statistical average [in our approach], and the cost of reagents goes down, because you don’t need super high purity reagents… There might be breakthroughs in the future which hopefully the phenomenal Illumina engineers will be solving. But right now, I think we’re pretty comfortable with what we have.”
Abizar Lakdawalla, Illumina
*CHI's Next Generation Sequencing Case Studies and Applications; San Diego, April 23-24, 2008.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.