By Kevin Davies
September 15, 2009 | A decade ago, researchers involved in the Human Genome Project held a party to mark the sequencing of one billion bases. This summer, the Wellcome Trust Sanger Institute surpassed 10 Terabases of mapped DNA sequence, with a current weekly output of about 400 Gigabases—equivalent to about 7 human genomes—and that’s just one genome center. Close to 1000 2nd-generation sequencing instruments (454, Illumina, Applied Biosystems, Helicos) are in use around the world. Illumina dominates for now, accounting for 75% of the sequence in the NCBI Sequence Read Archive. The stream of complete human genome papers now includes two Korean genomes, the first human genome on the Life Technologies SOLiD platform, the second cancer genome, and Stephen Quake’s sequence on the HeliScope.
This explosion of data puts added stress and pressure on the beleaguered informatics and IT teams. In this special 16-page report, coinciding with CHI’s Exploring Next-Generation Sequencing conference (September) and Bio-IT World Expo Europe (October), Kevin Davies interviewed some of the key leaders and evangelists in 2nd- and 3rd-generation sequencing informatics. On the user side, Bruce Martin at Complete Genomics [page 21] has the trivial task of building the IT infrastructure to support 10,000 human genomes in 2010. Also interviewed are the software men at two 3rd-generation companies: Clive Brown (Oxford Nanopore) [p 28] and Kevin Corcoran and Scott Helgesen (Pacific Biosciences) [p 34]. Their past experiences at Solexa/Illumina, Applied Biosystems and 454 Life Sciences, respectively, should prove instructive.
On the government/academia side, Jim Ostell and Martin Shumway discuss the NCBI’s Sequence Read Archive [p 36], the official repository of short-read sequence data. David Dooling describes how The Genome Center at Washington University, St Louis [p 30], is coping with continued expansion demands. Stanford’s Steve Quake [p 25], co-founder of Helicos, discusses the impact of his own genome, produced by single-molecule sequencing. Davies also gets the perspective of a pair of software vendors which offer solutions for handling next-gen data—Jan Lomholdt (CLC bio) [p 32] and Ron Ranauro and Richard Resnick (Genome¬Quest) [p 23].
What this report clearly illustrates is that the term “next-generation” has become obsolete. As one platform maker has noted, we really are talking “now generation.”
Complete Compute: An Interview with Bruce Martin
Taking Next-Generation Sequencing Data to the Cloud
A Single Man: Stephen Quake Q&A
What Can Brown Do for Oxford Nanopore?David Dooling: Gangbusters at The Genome Center
CLC bio Satisfies Next-Gen Bioinformatics Cravings
SMRT Software Braces for the Pacific Biosciences Tsunami
NCBI’s Sequence Read Archive: A Core Enabling Infrastructure