Chad Nusbaum, Rob Nicol, and Toby Bloom share the secrets of the Broad Institute’s long-term sequencing success.
By Kevin Davies
September 28, 2010 | During a recent lecture, author/investor Juan Enriquez claimed that the center of the universe in Boston is not, as some might suggest, the pitcher’s mound at Fenway Park, but actually just across the Charles River, at the heart of Kendall Square, Cambridge. This is the site of world-renowned research institutes and departments bearing the names of Whitehead, Picower, McGovern, Koch, Stata and Broad, not to mention the likes of Novartis, Google and Microsoft eager to rub shoulders with potential academic collaborators and recruits.
For all the award-winning architectural splendor of the Eli and Edythe Broad Institute, the DNA sequencing prowess for which it is best known is actually housed a few blocks to the north, in an unassuming low-rise building next to a Checker taxi stand.
Chad Nusbaum joined what was then the Whitehead Institute/MIT Center for Genome Research in 1996. As co-director of the genome sequencing and analysis program, Nusbaum has seen it all, and had a major say in recruiting Rob Nicol, the Broad’s director of sequencing operations and technology development.
Nusbaum originally trained as a developmental biologist, but says his job is mostly to be an amateur engineer. “I could tell you I work on malaria, TB, cancer, human variation, human evolution, fungal evolution. All of that is true... but [that’s] not the major part of my job.” Every Monday morning, Nusbaum meets with a handful of colleagues to set sequencing priorities. Asked if there’s enough demand to keep the fleet fully occupied, Nusbaum laughs. “Oh, heavens yes! It’s very rare that a sequencer goes hungry.”
The Broad is currently replacing its 94 Illumina GA II sequencers with 51 HiSeq 2000s. Much of the sequencing fleet is devoted to research in the 1000 Genomes Project, The Cancer Genome Atlas and other cancer projects, either exome or whole genome shotgun sequencing. There are follow-ups for GWAS (genome-wide association studies), and projects in the microbiome and viral sequencing, including HIV, Dengue, and West Nile.
On a recent tour of the Broad sequencing floor, Nicol pointed out a row of HiSeqs under evaluation. The cute fluorescent strips on the front of each box lend a certain “Knight Rider” vibe, he said. These machines are already generating more than the promised 200 Gigabases of sequence per run, but they won’t be officially placed into production until Nicol gives the OK.
Although the Broad routinely tests every legitimate next-generation sequencing (NGS) platform that comes along, Nusbaum says it doesn’t make sense to run a “split shop.” It was that way 15 years ago, when the Applied Biosystems 3700 made its debut. “As soon as we decided we liked the 3700, the slab gels were gone in a month,” recalls Nusbaum. “When we passed to the 3730, the 3700s were gone. We really moved the 3730s out pretty fast as we moved to Illumina.” The Broad still employs 454 Life Sciences instruments for viral sequencing and some niche applications.
Rob Nicol came to the Broad in 2001 with an unusual resume. He previously spent years building refineries and power plants, and had just finished a program at MIT focused on leadership in manufacturing. At the time, Nusbaum was looking for someone who knew how to run a production operation. But there was a catch: “We knew everyone in the sequencing field and we realized we knew as much about sequencing as anyone else”
Nicol admitted he didn’t know anything about sequencing per se, but as he started discussing the lessons to be learned from Japan’s approach to process, operations and manufacturing, Nusbaum was sold. “Sequencing just ain’t that hard. I can do it! What’s hard is doing process really well,” he says. Nusbaum credits Nicol with changing the face of genome sequencing by professionalizing the factories. “We started learning from manufacturing experts like Toyota and Boeing, people who know how to run a factory.”
Nicol has designed the organization’s processes to be as adaptable as possible. “We try to keep our eye on emerging technologies through testing and collaborations but also design existing processes for flexibility where possible.” Nicol takes the capability to be nimble very seriously.
Nicol’s group oversees the implementation of any new technology into production, including the sequencers themselves and the upstream sample preparation, which he says is increasingly “where the action is.” Nicol’s team systematically conducts a “failure mode and effects analysis” for any new technology. Whenever a problem or “failure mode” is observed, Nicol says, “we’ll catalogue it, describe what we think caused it, try to reproduce it and understand the variables that caused it.” The issue is either reported to the vendor or built into the Broad’s knowledgebase and eliminated. The goal is to continuously improve performance, maximizing each instrument’s “up time.” It gets progressively harder to keep improving performance, but by then, “some new generation of instrument has probably come in.”
One of Nicol’s keys to successes was merging the technology development group and the production group into a homogeneous outfit that is constantly looking to make their jobs easier and more effective. Long gone are the days of having a separate R&D team develop protocols, test instruments, then hand off to production staff. The Broad’s production scientists “understand how their process works and want to make it better. Stuff comes from the ground up as much as it comes from formal R&D,” says Nusbaum. A sabbatical system enables them to cycle onto other projects from time to time. “It’s a philosophy that runs through the whole organization of process improvement,” says Nusbaum.
Nicol’s team maximizes instrument up time by observing and reporting any minor glitch or problem during the sample prep and machine runs. Those observations are incorporated into the Broad’s standard procedures, perhaps cleaning a stage or refocusing the camera if the image has a certain pattern. Capturing and then disseminating that information is critical, says Nicol. “There’s a vital connection between the person paying attention but also thinking with an R&D mode.”
If anything, Nicol says his team has improved upon his original process ideas. “Initially, you could almost compare large-scale sequencing to a Henry Ford assembly line. The reason it’s not done that way anymore is you give up a lot of flexibility.” Sequencing has a “mind boggling” cycle time that is faster than semiconductor development or any other industrial process.
Nicol seems quite pleased when I remark how quiet, almost peaceful, the factory floor is. A meticulous level of organization is evident in the copious amounts of masking tape on the floors and benches, marking out the set positions for every instrument and reagent. Bench spaces are immaculately organized and uncluttered, enhancing productivity and minimizing mistakes.
The Broad enjoys a positive two-way relationship with Illumina, but it’s not a special relationship, notes Nicol—it extends to other vendors as well. Both user and vendor benefit from the early sharing of hardware, software, and information. The dialog with Illumina began in the early days of Solexa (see page 52). Before the British company had anything, Nusbaum recalls, he was pleading with the firm to “show us the data.” Nusbaum credits Solexa’s Clive Brown for eventually agreeing to share results and recognizing the benefit of letting Broad scientists such as David Jaffe analyze proprietary data.
“When Illumina acquired Solexa, we hoped that the Solexa corporate culture [would be] maintained in the sequencing—and it largely has been,” says Nusbaum. The Broad has organized an occasional training course with Illumina to help disseminate best practices. “It’s been very well received. We could give it every week and be sold out,” says Nusbaum.
The Broad is inevitably the first destination for any new sequencing prototype. It already has four Ion Torrent sequencers to play with, months ahead of commercial launch, and is preparing for the first Pacific Biosciences sequencer. “We purposely want to get very early-stage machines,” says Nicol. “Most of the machines we get are the first ones built, ever. That’s by design... As soon as a new platform is available, we need to understand it [and] to help guide where it’s going,” says Nusbaum. “It’s important to do as much of a mind meld with the vendors, so they understand what we’re thinking.” Consequently, the vendors benefit because, Nicol asserts, “The ultimate instrument that goes out into the community is that much better for it.”
“The HiSeqs didn’t work perfectly on Day 1. Every machine that comes in the building doesn’t work on Day 1,” says Nusbaum. “You plug it in, test it out, mess around with it, call the service engineer.”
Just because the Broad has relied on Illumina technology—and will continue to do so for some time—Nusbaum says it’s always possible that there will come a day, or a cost point, when it is time “to change horses,” no matter how disruptive. “If a technology showed us now that, in six months, it has a good chance of being way ahead of Illumina, we’ll be watching it. Of course, the cost of switching the actual hardware out and amortization has to be factored in, but if it still beats it, then the dollars say you got to go.”
For anyone starting a new NGS operation, Nusbaum says he “would test them head-to-head and run the numbers.” The right choice comes down to instrument cost (of course), but also whether the machines are bought or leased, the reagent contract, maintenance, labor and support costs, and the scientific goals. And that’s before the informatics considerations.
Head of informatics Toby Bloom faces many of the same pressures to adapt quickly to new platforms and applications (see, “Bioinformatics in Full Bloom”). “She needs to synchronize with the downstream algorithms, feeding the massive amounts of data we’re producing through software pipelines that are changing just as rapidly and have tremendous complexity,” says Nicol. “Plus she also has the additional dimension of coordinating the necessary IT.” A minor concern is that the acceleration in sequencing output is accelerating faster than storage.
“The cost of storage is coming down very slowly compared to sequencing costs,” says Nusbaum. “It’s not very hard to foresee a time when storage is half the total cost of sequencing.” Nicol offers another idea: why not store the data as DNA and just resequence it? It’s not as crazy as it sounds. “It’s been a couple of years since we saved the primary (raw image) data,” says Nusbaum. “It’s cheaper to redo the sequence and pull it out of the freezer. There are 5,000 tubes in a freezer. Storing a tube isn’t very expensive. Storing 1 terabyte of data that comes out of that tube costs half as much as the freezer!”
Nusbaum says scientists including Ewan Birney (European Bioinformatics Institute) are working on elaborate algorithms for storing data, “because you can’t compress bases any more than nature already has. The new paradigm is, the bases are here, only indicate the places where the bases are different... In 2-3 years, you’ll wonder about even storing the bases. And forget about quality scores.”
Who’s on Third?
Nusbaum scoffs at the term “3rd-generation sequencing.” A veteran of 1st-generation Maxam and Gilbert sequencing, Nusbaum reckons we must be approaching the seventh or eighth generation. Ignoring the inevitable hype, Nusbaum says the emergence of any new technology “doesn’t mean they’re going to come in and take over the world in a week. But it does mean they create a platform on which there may be real game-breaking opportunities.”
Take the much anticipated PacBio single-molecule sequencer, which claims to offer experimental runs of 20 minutes and read lengths up to 1,000 bases. “Those two things right there, there’s nothing else that can do that in a remotely affordable way. If you do nothing else with this machine, you can get your answer over lunch!” The Broad’s perch as the largest genome center in the world is getting crowded, as BGI fills its Hong Kong facility with more than twice as many HiSeqs as the Broad (see p. 44). Nusbaum, however, says the Broad and BGI enjoy a friendly (if slightly competitive) relationship. “We’re building ongoing collaborations with them. Ideally we want them to be a sister center with us,” he says. “There’s so much sequencing in the world that needs to be done, right now, I don’t see any need to compete with them.”
While Nusbaum concedes the emergence of BGI “upsets the balance of power,” he thinks the added sequencing capacity is a positive trend. Of course, the spread of sequencing democracy in countless small labs also tilts the balance of power, perhaps even more disruptively.
Whatever the generation, Nusbaum hopes that the newest technologies won’t just provide “better, cheaper, faster” sequencing, but “will give us more power to answer the biological questions we want to ask.” It’s not enough to sequence human genomes affordably. “I’d like to be able to sequence human genomes really, really well, with structural accuracy—so you can really understand polymorphism,” he says. Of course, adds Nicol, “None of that is going to matter without the upstream sample prep.”
This article also appeared in the September-October 2010 issue of Bio-IT World Magazine. Subscriptions are free for qualifying individuals. Apply today.