By Kevin Davies
August 10, 2009 | Six years ago, Stanford University bioengineering professor Stephen Quake published a new method for sequencing single molecules of DNA, in which his team (then at Caltech) proudly managed to sequence precisely five nucleotides. This week, working with a pair of colleagues, Quake has a bit more to celebrate: the sequence of 2.5 billion bases (about 90%) of his personal genome.
Apart from being a major milestone in single-molecule sequencing (SMS), Quake says his group’s paper points to the democratization of genomic research. “This is the first case you haven’t needed a genome center to sequence a human genome,” Quake told Bio-IT World on the eve of his landmark publication. “What we’ve shown is that you can do it with a pretty modest set of resources—a single professor’s lab, one person doing the sequencing, one instrument, lower cost. Those are all order-of-magnitude improvements over what’s been published recently.” (See accompanying interview, “The Single Life: Stephen Quake Q&A.”)
The Quake genome report, which is published online today in Nature Biotechnology, also marks a major milestone for Helicos Biosciences, the company Quake co-founded in 2004 to commercialize his SMS technology.
“The vast majority of the whole-genome sequencing on [other next-gen] platforms requires it to be done in genome centers, because you need that infrastructure,” commented Helicos president Steve Lombardi. “Literally three people did this work. That’s a real harbinger of what we see the direction of this market going. It’ll be very interesting to see what Francis Collins, in his officially appointed role [as NIH Director], does with that!”
It took research technical manager Norma Neff just four runs on the Stanford HeliScope, at an estimated cost of $48,000, while a physics PhD student, Dmitry Pushkarev, developed a new variant-calling algorithm called UMKA for the bioinformatics analysis. “It’s the first large statement from someone other than the company that the technology works. That’s a really important thing for us,” said Lombardi.
Kevin Ulmer, a pioneer of single-molecule sequencing and a former consultant to Helicos, commented: “Little did I realize that the young postdoc working next to me at the bench in Steve Chu's lab at Stanford in 1993 would become the first person to have his genome sequenced by direct single-molecule methods.” Ulmer says he was considered “completely crazy” to have proposed such a scheme back in 1987, but feels vindicated now.
“I think Helicos deserves some kudos,” commented Clive Brown, vice president of informatics and IT at Oxford Nanopore Technologies. “They’ve stuck with it, and they’ve made it work about as good as it can work with single-molecule fluorescence and the camera they have. People have taken it outside and they’ve used it. That’s not trivial. If I was them, I’d have stuck to that. They should stick to the high ground – you can quote me on that.”
Brown, who was formerly with Solexa and Illumina, said it was misleading to compare the three co-authors on the Stanford paper with the 250 or so on the landmark 2008 Illumina publication in Nature on the first African genome, because “that paper was the culmination of eight years work.” He noted that an earlier 2008 Helicos publication had more than 20 co-authors to sequence a tiny viral genome. Brown also pointed out that some of the platform price comparisons were out-of-date, noting that Illumina has introduced a personal genome sequencing service, coincidentally for the same $48,000 price.
More Than Zero
The identity of “Patient Zero” -- the Caucasian DNA donor -- is not actually specified in the paper. “We wanted to retain some semblance of dignity for the scientific literature,” Quake joked. Earlier this year, however, Quake penned an op-ed in the New York Times announcing that he had sequenced his own genome. One of his prime motivations, he wrote, was to try to understand why his daughters suffered severe peanut allergies.
Quake got access to the HeliScope after a machine was purchased by the Stanford University Stem Cell Institute last year, and volunteered to be the whole-genome subject. He could not obtain a machine for his own lab as Howard Hughes Medical Institute and Stanford University conflict-of-interest policies bar collaborations with biotech companies. “The reason they bought it was not to sequence my genome, but to sequence cancer, tumor stem cell genomes,” Quake explains. “Mine was just to practice, to show that we could do it and to get the informatics into place.”
In four HeliScope runs, Neff generated 148 billion raw reads ranging from 24 to 70 bases in length, with an average length of 32 bases. Of that sequence, 63% of the reads could be aligned to about 90% of the reference human genome, using an open-source aligner called IndexDP, for a total useful genome coverage of 28X.
The error rate is put at 3.5%, which is higher than other next-gen platforms; more than half of those errors are deletions, attributable to the sporadic incorporation of “dark” non-fluorescing bases. The read alignment ratio of 63% is on the low side, however the data generation was performed six months ago using only single reads, and is likely to improve quickly.
Pushkarev designed the UMKA program with the HeliScope’s known error profile in mind. It called 97% SNPs with 99% accuracy, which the authors say is slightly better than first leukemia genome and comparable to recent publication on the Chinese, Korean and Yoruban genomes, all sequenced on the Illumina GA II platform. Selecting a fairly stringent quality threshold, the authors documented more than 2.8 million SNPs, of which 76% are found in dbSNP. (Similar ratios have been reported for other personal genomes.)
By assessing the depth of read coverage along each chromosome in 1-kilobase windows, Quake’s team was also able to detect copy number variants (CNV). It found 752 CNVs totaling 16 megabases in this way, of which only 54% have previously been catalogued in the Database of Genomic Variants.
David vs Goliaths
Since going public in 2007, Helicos has struggled to overcome technical difficulties and problems convincing the market to accept a machine priced over $1 million in the face of more established, more affordable competition from Illumina, Applied Biosystems (Life Technologies) and Roche/454. Lay-offs, a tumbling stock price, and cash concerns hardly bode well. Nor did the return by contract research organization Expression Analysis last year of the first instrument Helicos shipped a customer.
However, fortunes appear to be turning around of late. After Ron Lowy took over as CEO (Lapidus remains on the board as chairman), the instrument cost was lowered below the $1-million mark, and the company recently announced its first HeliScope sale to a biotech company. Quake’s genome, following the recent publication of an African genome on the Life Technologies SOLiD platform, makes this the fourth next-gen platform to sequence a human genome.
Quake characterizes the sequencing market as a “David vs Goliath” battle. “There are four commercial platforms out there right now, and three of them are billion-dollar companies. The fourth is Helicos, which is a scrappy little bunch -- they’re trying to hang on! I think they’re fantastic, and I’m hoping they’re going to end up at the top of the heap.”
Helicos chief science officer Patrice Milos says the company is continuing to focus on improvements in reagent chemistry and patterned surfaces, “both of which will allow us to further improve the performance of the instrument, read lengths, and strand aligned yield.”
Lombardi stresses this is still the “first generation of the technology.” With improvements in strand length and alignment yields, as well as greater strands density on the flow cells, Lombardi says: “We think we could easily move from where we are today – which is 20-25 Gigabases/run -- well into the several hundreds of Gigabases/run, with just chemistry. No hardware changes at all.”
The Quake sequence data were generated earlier this year, and Quake says, “We already have three more genomes in the can related to leukemia and cancer. We’re neck deep trying to analyze those and understand what they mean.”
Although not discussed in the Nature Biotechnology paper, Quake and his colleagues have been scouring his DNA sequence and trying to draw some preliminary conclusions about health and genetic traits. “Some of the doctors are starting to poke and prod me to see how they can couple my genome with medicine,” he said.
Among the early discoveries are a rare mutation associated with a heart disorder, for which there may be some family history. “If you know your uncle had something, you kind of discount that you can get it, but to see you’ve inherited the mutation for that is another matter altogether,” he said.
Quake also carries a variant in the CLOCK circadian rhythm gene tentatively associated with increased disagreeability. “You don’t need my genome to tell you that,” Quake quipped. “My wife could have told you that and certainly the dean could have as well.”