April 1, 2008 | The Broad Institute of Harvard and MIT is running 20 Illumina Genome Analyzers, three 454 GS FLX instruments in production, and three ABI SOLiDs, according to Toby Bloom. She manages the informatics pipelines for the Broad's sequencing platforms - old and new - for applications ranging from medical re-sequencing to epigenomics to pathogen genome sequencing.
Bloom says most next-gen vendors provide "fairly sophisticated pieces of software," much of which the Broad staff uses, including image processing, while also recommending improvements with certain vendors, for example on quality scoring. "We may come up with our own algorithms and feed that back to the vendors," she says. "Of course, for assemblies, alignments, mutation calling, we're looking at our own software as well."
Despite its considerable resources, Bloom's team has made sweeping changes to its data pipeline of late. "On the data management side, the old pipeline dealt with one read at a time," says Bloom. "Now, we deal with plate by plate or region by region or lane by lane. The data aren't stored in individual files but in batches." Another issue is that, "You're dealing with large numbers of small reads, not small numbers of large reads."
Bloom says the core LIMS includes "added information about the new steps to help the lab track what their orders are. It's very different managing the lab to do large numbers of small projects. A mammalian genome would take several months to go through the lab using older technology... They now need more support for keeping track of everything."
To handle the storage demand, the Broad has 300 TB of Isilon high-speed parallel access storage, with more on the way. "We do a bunch of our work on SunFire 4500s, or Thumpers," says Bloom. These are reasonably inexpensive file-server units that have 15-20 usable TB per unit. "We actually use them to pull the images off the machines as they're being generated, so we don't have to stop the sequencers to do any processing on them between runs."
Bloom says the SunFires have "enough processing capability that we can do cycle by cycle processing." Once the image data are processed, the results are fed into the Isilon storage and core compute facility. Bloom says the images are stored "in case we need to go back to them, for a month or two. We leave them behind on the Thumpers - they never go anywhere else."
But even the Broad Institute can't store image files forever. "I don't think it's particularly useful; it's rare we'd ever go back to them," says Bloom. "What we do store forever is a sampling of the images on each run." Archiving a few images from each cycle enables troubleshooting of potential machine problems. --K.D.
Return to main article.
This article appeared in Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply Today.