By Kevin Davies
February 26, 2010 | MARCO ISLAND, FL—Pacific Biosciences will introduce what it describes as the first “third-generation” DNA sequencing instrument at the Advances in Genome Biology and Technology (AGBT) meeting later today. “It’s really the world’s most powerful real time, single-molecule microscope,” said PacBio CEO and chairman Hugh Martin.
During a preview of the highly anticipated instrument for the press, Martin hailed the machine as “a quantum leap” for the field. “We sold the ten beta units just like that—fully paid for!” PacBio intends to ship those first ten instruments to customers, all in North America, by this June.
The first thing one notices about the $695,000 machine is that, for an instrument sequencing single molecules of DNA, it’s awfully big. The floor-standing machine weighs about 1900 pounds and is 6 1/2 feet wide and 29 inches deep. The instrument is accompanied by a separate blade center for the real-time data processing, which sits apart from the main instrument.
“If you look at what’s inside there, it’s packed,” says Martin by way of justification. Much of the instrument’s girth is taken up by the robotic sample mixing and staging system. The base is packed with four high-speed cameras, optics, and a carbon-fiber stage good to +/- 5 nm in six dimensions. Martin says the PacBio machine’s spec sheet states it can be installed on any floor, but admits: “You need a pretty heavy duty floor.”
Talking about 3rd Generation
Martin argues that 2nd-generation technology is flattening out, despite the healthy competition between Life Technologies and Illumina. “454 Life Sciences is not moving very much relative to longer read applications. Helicos, I think, has become sort of irrelevant. Complete [Genomics]—it’ll be interesting to see what happens to their cost model, because I think a lot of people in the world are going to be enabled to be service providers using [Life and Illumina] boxes to compete with Complete.”
As for other companies preparing to launch new detection systems, such as Ion Torrent Systems and Avantome (Illumina), Martin argued they are still 2nd-generation technologies that involve pausing the DNA polymerase and doing some sort of an inspection. “You can speed that up a bit, you can make it inexpensive in the hardware and consumables, but you’re still on that performance curve of 2nd gen. What the world really needs is to move to a whole new curve… Customers definitely want longer read lengths and have huge issues with 8-10 day run times.”
So what exactly is his definition of “third-generation sequencing”?
Surprisingly, perhaps, Martin did not cite some profound conceptual differentiator such as the real-time hallmark of the platform. For Martin, his platform is “everything that 2nd gen is—throughput, cost per base, etc.—with the addition of very long read lengths, extremely low reagent or consumable cost and very fast run times. Those three.”
Of course, PacBio can expect some company soon: Martin acknowledged that systems under development at Oxford Nanopore, Life Technologies (StarLite), and perhaps Halcyon check the same three boxes as PacBio. In his view, however, Complete Genomics, does not, as it employs short reads and longer run times. “It’s essentially a SOLiD system on steroids,” says Martin.
For the past six months, PacBio has used a dozen prototype systems for internal development and external customers, working with researchers at Stanford, The Genome Center (Washington University) and elsewhere. Martin promises a range of applications that go beyond DNA sequencing, including the study of transcription (RNA) and translation (ribosomes). And there is scalability with the promise of increased polymerase performance, yield, and multiplex time.
Welcome to the Machine
The PacBio machine is a marriage of advances in semiconductor processing, enzymology, surface chemistry, synthetic chemistry, hardware, parallel processing, optical and camera design, bioinformatics and software, not to mention product design that produces an undeniably handsome machine—an impressive achievement in just two years, even allowing for the $250 million raised and more than 300 employees.
According to PacBio’s Geoff Otto, the DNA sample prep takes place off the instrument in only 5-6 hours, requiring just 500 nanograms of starting material. The starting DNA is sheared into double-stranded linear structures sizes ranging from 200 bp to 10 kb, then attached to the SMRT adapters, which produce a topologically closed circle enabling consensus sequencing of the same template if desired. For longer read lengths of several kilobases (kb), it is likely the sample would be read linearly. The DNA template is complexed with the polymerase, a stable assembly that does not require immediate processing, before loading onto the machine. (The method by which the DNA polymerase is fixed to the bottom of each well was not disclosed.)
The front of the machine contains two drawers for sample loading, which open with a satisfying whirring noise. One is for DNA and reagents, the other for up to 96 SMRT (single molecule real time) cells. Each SMRT cell houses 80,000 zero-mode waveguides (ZMWs), about one third of which are used in a given experiment. The SMRT cells are lined into strips of eight (dubbed ‘8-packs’); 12 of these strips can be loaded at a time, in a 96-well format. Each SMRT cell is individually sealed, so an instrument run could involve just a single SMRT cell, returning to the other cells in the 8-pack strip later.
The run time for an individual SMRT cell is about 15 minutes, depending on the desired read length. In the current version, the polymerases runs at about 1-3 nucleotides/second. “As soon as you start collecting data, you start processing. You can make iterations or changes in real time. If there’s a change in the protocol, you can make that happen in real time,” says Otto. From sample to an instrument run to retrieving data takes less than a day.
In addition to linear and consensus sequencing, PacBio offers a strobe method, which would be used for longer read lengths from 3-10 kb, producing multiple shorter reads interspersed with ‘dark’ segments to preserve the enzyme. “If you have 10-kb fragment, you can target a 3-kb span, but then change the instrument specs without having to change the prep,” says Otto. In other words, a combination of all three modes can be programmed to run on different SMRT cells in the same run.
The strobe mode overcomes photophysical damage to the polymerase inflicted by excited fluorophores that occasionally “go into a bad state,” as Martin puts it. PacBio hopes to minimize such damage going forward by: limiting oxygen exposure, introducing protective additives, and mapping and re-engineering the surface residues on the polymerase that are most prone to damage. Martin’s goal is to create “a sunburn-proof enzyme” that in 2-3 years, will be able to produce read lengths of 40,000 bases and more than 100,000 bases under strobe conditions.
Using a remote workstation running Windows, a user can design a run, monitor the instrument, and view completed results. The maximum run time for the machine initially is 12 hours, although that will increase.
Product manager Dana Underwood demonstrated a typical sample run. On the touch screen, from a list of plates created on a remote interface, he instructs the instrument what to run. In one drawer, he loaded two reagent plates, the mixing plate, and a sample plate. In the other drawer, he loaded the final 8-pack of SMRT cells. The instrument checks for any missing or misaligned plates, and if everything is in order, the “Start” button becomes active.
After the first SMRT cell is extracted from the 8-pack and moved to the prep station, one of two pipettors deposits reagents in the SMRT cell and sequencing begins. The run time remaining ticks down on the front display, along with a hypnotic portion of the multi-colored ZMWs flashing in real time. When the run is complete, a gripper removes the old SMRT cell, and the next one is swapped in.
Dwarfed by the instrument itself is the blade center, which handles the robotics and real-time data processing. Edwin Hauw, senior product manager, said PacBio’s software package “covers everything in the sequencing workflow from run design to instrument loading, run monitoring, primary and secondary analysis.”
The blade center, unusually slim (26 inches wide) and mounted from the top down to minimize the footprint, handles the instrument robotics as well as real-time data collection and signal processing. The system uses a hardware accelerator for the movie to trace step performed in real-time, but PacBio has not yet decided whether to settle on a GPU (graphics processing unit) or FPGA (field programmable gate array). There are four blades, each blade has dual Intel Nehalem quad-core processors, with a total of 192 Gigs RAM and 12 Terabytes. One blade handles the robot, the other three data analysis, movie-to-trace, and trace-to-pulse-to-base call. There is sufficient storage for 24 hours worth of base calls and quality values.
Real Time Sequencing
Martin did not preview trace data or discuss error rates in the machine preview. However, at full release later this year, he said the average read length will be 1000-1250 bases, fractionally longer than 454 or Sanger sequencing, with 5% reads between 3-5 kb. For a targeted sequencing experiment, “you’ll get 5% of 30% [the Poisson limit] of 80,000 [ZMWs]—so you get 1,000 reads in the 3-5 kb range for $99.” Despite the lower throughput compared to the high-end second-generation machines, Martin pointed to an advantage in flexibility, for example allowing diagnostic samples to be run without having to wait until sufficient samples make it worthwhile for a run on a 2nd-gen box.
“We’re the first mover. There’s no one else out there in 3rd gen.” Martin noted that it has taken PacBio two years to progress from presenting a 50-base sequence run (at AGBT 2008) to unveiling an instrument. “Oxford [Nanopore] is a long way from showing you a 50-base trace,” he said. “Life [Technologies] may make some noises, but I don’t think they have one yet. So we have got at least 2-3 years of free running room, which is very exciting.”
PacBio is targeting customer shipments in the second half of 2010. For the most part, the customers “want to add end value to Illumina reads,” Martin said, “or they’re unplugging 454 or [Sanger sequencers], i.e. longer-read lengths. I think it’ll be additive to the market.”
Martin says he could have sold 30 instruments to early access clients, and admits there are some unhappy people not include in that first tranche, notably the Wellcome Trust Sanger Institute in the UK, and the Beijing Genomics Institute in China. “It’s difficult for us in a beta environment,” he says. “We want a lot of feedback from our customers. So we’re not doing ‘rest of world.’ There are some major centers not getting machines. They will. We’d planned on going rest-of-world in 2011. We’ve changed that: we’re going rest-of-world in 2010.”
Even with future upgrades, Martin says the current machine will not be the one that delivers the ‘15-minute genome,’ as PacBio founder Stephen Turner claimed two years ago. Although the number of ZMWs on a SMRT cell will be doubled to 160,000 ZMW over time, PacBio will need 1 million to get the genome. “It’s probably [capable of delivering] a 2- or 3-hour genome.” The V2 instrument will reach the 15-minute target, but that isn’t scheduled for release now until 2014.
Martin hinted strongly that he is positioning PacBio for a public offering, possibly in 2010. “We wanted to build a really big company, a publicly independent company. Everybody here probably came from one of the companies that might [hypothetically] acquire us, but we all left for a reason and none of us want to go back!”