YouTube Facebook LinkedIn Google+ Twitter Xinginstagram rss  

A Frightening Computational Problem

Oct. 10, 2007 | Tim Harris, Helicos’ research director, is a veteran of the single-molecule sequencing world. The Bell Labs veteran joined SEQ — the first commercial single-molecule biotech firm, founded by Kevin Ulmer in 1987 — as director of research in September 1996. During a recent talk at Harvard University, Harris asked rhetorically, “So why did this [Helicos technology] work? People have been trying to do this since 1987... When Stan Lapidus called me up, and said ‘Tim, I got just the job for you, when can you come to Boston?’ I said, ‘Stan, my friends and I have been failing at this for 15 years, are you sure?’ His answer was, ‘Yeah, I think it’ll work this time.’”

To sequence 90 million bases/hour, Harris describes the recipe as follows: first, shear genomic DNA with DNAses into fragments of about 100 basepairs. Melt the strands and add terminal transferase to produce polyA tails, with a fluorescent tag at the end. Meanwhile, on a glass surface randomly immobilize a forest of polyT primers - about 1 per square micron. Then swish in the sample to hybridize to the slide. The “off rate” at 37 degrees C is about 1 week. “I can do a lot of sequencing [in that time],” says Harris.

According to Harris, there is 1 molecule of DNA per square micron, or one million per square millimeter. (Each spot is localized to a mere 15 nanometers.) The camera takes a picture, the dye is removed, the next base is added, and the cycle repeats 100-200 times. A 25-base read takes two days and is the median length; 20 percent of DNA strands grow longer than 30 bases. But even under the best conditions, the platform still fails to detect incorporation 1-2 percent of the time, a problem that is actively being addressed.

One of the barriers to getting single-molecule sequencing to work has been preparing a surface that can handle micromolar concentrations of dyes and then rinse the dyes out so the only 10-20 remain per 1000 DNA templates. Harris hired Mirna Jarosz (“a real smart lady”) who solved this rinse failure background fluorescence issue. Harris notes two other major keys to the success of the Helicos platform: an active nucleotide analog/polymerase combo that will incorporate bases, and better purified reagents than are commercially available

Helicos has worked hard to solve the homopolymer problem. The commercial solution is to use kinetically engineered nucleotide analogues that have “no run through.” Because incorporation errors are random, Harris says it’s very unlikely to happen twice. But Helicos is exploring two-pass sequencing, in which a new primer would be added after the first run to the other end, allowing the strand to be sequenced in reverse direction.

Align and Repeat
Sequence alignment is “easy if the genome is small, and requires real creativity if the genome is big,” says Harris. “You take 7 or 9-mer pieces of your read, you go shooting through the genome, and ask, does the rest of the sequence look like it fits there? It’s a substantial computational problem.”

Harris calls the informatics infrastructure “pretty frightening.” The HeliScope will generate 20 terabytes of image data/day. “There’s not a plan to save it, which scares me, I have to say. I’ve never thrown away my data in real time,” Harris laughs nervously. “It generates 1 TB of actually analyzed data (read strands). It’s a bit of a frightening computational problem.”  -- K.D.

Return to main article.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact Angela Parsons, 781.972.5467.