June 8, 2011 | More than 2,000 attendees poured through the doors of Boston’s World Trade Center in April. As usual, a conference highlight was the “trends from the trenches” review by Chris Dagdigian (BioTeam).
“Next-generation sequencing (NGS) is still causing a lot of pain in data handling and storage. As sequencing continues to be commoditized, this will only get worse,” said Dagdigian, who noted the risk of friction between scientists and IT managers over the disposition of assets. Scientists might demand too much compute power for a given rack, but Dagdigian said they were “entirely justified [in their compute demands]! They need them.” When it comes to storage, “The sky is not falling,” he said. “Storage does not scare me too much in 2011.” That said, most users don’t operate at Broad Institute levels (200 TB/week Tier1 storage). “Peta-scale storage is not scary. It’s just an engineering and budget exercise. Six vendors on the [Bio-IT World Expo] floor will sell you petascale storage for less than $1 million.”
But as NGS platforms get cheaper, scientists are running more and more experiments. Even as the rate of essential data generation shows signs of abating, the rate of downstream storage consumption is only going up. “It makes me mad as an IT person: I can’t model requirements of a Ph.D. bioinformatician 2-3 years down the road. Human beings are harder to predict and slot into a five-year roadmap.”
While showing a slide labeled “BIG SCARY GRAPH,” Dagdigian said, “Something is going to break: this is not sustainable.” The scientist is the ultimate curator who decides what to keep. “Scientists have no clue what the true cost of storage is. IT is not communicating the true cost.”
In a couple of consulting products, Dagdigian said he has had success recommending affordable, medium tier nodes (such as Isilon’s NL series) as primary storage, substituting the top-of-the-line systems when needed. “This works great, even if the Isilon sales reps are not thrilled,” he said. In a similar vein, Dagdigian said he likes BlueArc’s expensive Titan series, but doesn’t always need it. The reasonably priced Mercury line means he no longer fears replacing expensive systems if capacity demands suddenly explode.
Other trends from the trenches:
- The move of pNFS out of supercomputing and into data centers is one of the most exciting trends for 2012.
- Private Clouds are “still stupid in 2011.” Dagdigian blasted private Clouds as “90% empty hype and cynical marketing… I don’t see the value. It’s a thinly veiled sales pitch aiming to replace everything in your data center.”
- “Amazon is the infrastructure Cloud. Period. Time has gone. I don’t think anyone can catch up with Amazon.”
Dagdigian said his favorite data center of 2010 belonged to Biogen Idec, which was discussed by the pharma’s Carl Sipowicz. In 2009, Biogen Idec had six data centers—four in the U.S., plus Denmark and Tokyo. “That’s a lot to manage, lots of overhead,” said Sipowicz. The firm eyed considerable savings in cooling and power consumption by consolidating. In the firm’s data center at its new corporate headquarters in Weston, Mass., the IT staff built a green building with an in-row cooling solution. Cooling water is pulled from a nearby stone quarry that is 400 feet deep and 40 degrees year round.
More than 100 servers were moved to a new data center in Research Triangle Park, which features a dual glycol/water cooling system. Biogen Idec is vacating its San Diego data center this summer. Of Biogen’s remaining 1,200 machines, 65% are virtualized with a priority to optimize WAN and network resources.
Another successful data center story came from Vijay Samalam (HHMI Janelia Farm). The Howard Hughes Medical Institute’s data center is five years old, but investment decisions take into account cost, cultural constraints, and multidisciplinary scientific directions. Virtually all the data are open-source, so there are few relatively few security concerns. They include data on RNA folding, video and image analysis, MatLab, and Perl and Python data management workflows.
Infiniband is cheaper, but against it is that it needs specialized staff. 10 GigE has a large knowledge base, but it is more expensive. “I didn’t want this to be a Boston Big Dig project,” said Samalam, but he said the 10-GigE networking system brings new iWARP technology (Internet Wide Area RDMA Protocol). In the end, he took inspiration from—of all thing—the fashion industry. “Ethernet has been adopted on catwalks in Milan. If they feel comfortable, shame on us!” As a result, the HHMI cluster recently sat at #349 on the Top500 supercomputing list.
The CIO of Genomic Health, Paul Aldridge, said the creator of the gene-expression Oncotype DX breast cancer diagnostic test plans to make the transition to NGS. The company hopes to “change the dialogue between physician and patient,” he said, noting that diagnostics make up only 2% of the pharmaceutical market, worth almost $500 billion.
Genomic Health currently tests 250 patients/day. “We’re moving away from 21 genes [expression] to whole transcriptome and whole genomes potentially,” said Aldridge, although the transition would likely be accompanied by changes in the regulatory environment and patients’ access rights. NGS was “a race to the bottom,” but in Aldridge’s opinion, sequencing costs won’t be critical, but rather access to clinical samples.
Genomic Health stores about 50 Gb combined genomic and transcriptome data per patient, which amounts to 12.5 terabytes/day or 1 petabyte/quarter. It would take 15 exabytes to sequence every person in the US, or 3.5 zettabytes for the world population. But the aggregate storage manufactured in 2010 was only 600 exabytes, said Aldridge.
At Infinity Pharmaceuticals, Keith Robison was tasked with importing NGS technology for cancer genomics, particularly targeted sequencing, copy number analysis, transcriptome, expression profiling, and methylation. Planning for the future, Robison said, was like Alice in the Red Queen’s race. “You have to run as fast as you can to stay in place.”
“Illumina is the new IBM—no-one ever got fired for buying Illumina,” said Robison. But he outlined some major logistical obstacles for a medium pharma to run NGS in house. There’s the upfront investment in a platform (or two) that could soon be rendered obsolete. Amortizing a next-gen sequencer is slow. “Will you really keep a sequencer busy? If I don’t use it, I can’t save it for next week. If you have constant flow, it’s OK. But in my environment, I’m swamped then there’s a lull.”
There is now “cutthroat” competition in human exome sequencing. Complete Genomics is adopting a Henry Ford model—sequence any DNA as long as it’s a complete human genome. “I think it’s a brilliant business model. Scientifically it’s frustrating,” said Robison. Some users might not care about sending samples offshore, but on the other hand, Robison said that for some clinical trials, it is necessary to state where the samples will go, and that could pose a problem.
Joseph Szustakowski (Novartis Institute of BioMedical Research) said “It’s not NGS if it doesn’t break something. Quality control is really, really important.” NIBR’s interest in NGS is in clinical applications—finding markers to stratify patients in clinical trials or baseline predictors of response, tracking physiology of patients in drug treatments, pathogen sequencing, HLA-typing, metagenomics, and more.
For every terabyte of “release” data, NIBR needs 3 TB of long-term storage. Reliance on the “Nike network” leaves much to be desired: not only is trying to get 1-2 TB data onto disk non-trivial, but also shipping is an adventure. Hard drive boxes have arrived at their destination covered in tire tracks or, worse, frozen.
Szustakowski’s advice: “Take your finance people to lunch!” One needs to ask: “How much data? Where to store it? How to get it there? And who will analyze it?” •
Steve Guise (Roche) provided a literal “hands on” demonstration of the exciting touch screen technology that was described in the keynote by Roche’s Bryn Roberts (see p. 26)—think Minority Report for molecules. The screen itself is manufactured by Perceptive Pixel.
The project started in 2009. Guise credited early initiatives from Microsoft and Mitsubishi, among others, before Perceptive Pixel captured attention (CNN’s John King used the screen during the 2008 presidential election). The device used at Roche features a large 90” screen and fast, scaling graphics that are optimized for GPUs. After visiting the Perceptive Pixel office in New York, Guise said his team concluded, “We got to have one of these!”
The initial unit was a 1-meter deep rear projection device, prompting Roche to build a room around the screen. (The unit on display in the crowded Accelrys booth on the Expo floor was a lot thinner.) “We could have this technology in every corner of the lab,” said Guise. But the initial goal is to leverage the technology, not just show off, and to prove that it bolsters research and innovation. “We’re a research informatics group. This isn’t just gimmicky but it adds value.”
One of the benefits of the touch screen is that it abolishes hierarchy. “No-one is in control, everyone is equal,” said Guise. To be certain that the device adds value, Roche is collaborating with a group at Oxford University to study human interactions around the touch screen compared to a regular meeting. “Until we do, it’s hard to justify further investment,” said Guise.
The screen features an infra-red light that shines onto a mirror and a camera that transmits coordinates though a server/software layer to the main box. Limitless simultaneous touches can be processed, allowing developers to work multiple events simultaneously. 3-4 people can stand in front of the screen and interact independently. Roche began rolling out the device to researchers in Basel in April. K.D