By John Dodge
July 11, 2002 | CAMBRIDGE, MASS. – It’s December 2001. You have three months to procure and install an IT infrastructure for the world-class scientists who will be working in your organization’s sparkling new building. What do you do?
Consultant Chris Dagdigian faced just such a situation when Harvard University’s Bauer Center for Genomics Research dropped the IT ball onto his lap late last year. Established in 1999, the center was operating out of Harvard’s Biological Labs across the street from its new building, which opened in March.
To this point, the center’s IT infrastructure, consisting of e-mail and file servers and an installation of Rosetta Resolver, wasn’t exactly primed for computational biology, according to George Busby, the center’s director of administration.
“We had envisioned computational biology down the road when the need in the center was great enough,” says Busby. “Our goals are twofold -- to provide resources to the larger Harvard community and to foster collaborative work on problems requiring a number of disciplines.”
As Dagdigian launched phase one of the project to identify the systems and network architecture, he was guided by three broad directives. Don’t spend too much money (yes, this is the same Harvard with an $18-billion endowment). Build something collaborative that can easily be tapped by outside researchers.
“And they told me ‘Don’t be too radical. We don’t have an IT director yet,’” Dagdigian says.
Busby maintains Harvard was a bit more magnanimous. “He basically was given a pretty free rein to build what he thought would be an appropriate system to handle the needs of in-house fellows and support of the faculty,” Busby says. After all, the center had been training and supporting 200 research groups within the university on microarrays and wasn’t about to cut corners.
Unlike most bioscience companies, the center isn’t preoccupied with drug discovery. Rather, it performs basic research, and everything that research yields will be put in the public domain.“We have people trying to understand cell signaling and other aspects of the cell cycle such as protein-to-protein interactions. We’ll be doing some of the basic understanding that needs to be there in the field of drug discovery,” Busby says. For that, the center needed high-performance server clusters, reliable backup, terabytes of storage, and all of the server room trappings, such as power, air conditioning, and fire suppression. Dagdigian, a bio-IT consultant and partner in the new consulting firm BioTeam.net, went to work.
Dagdigian had already built several Linux clusters on commodity hardware as a technical consultant at Blackstone Computing. And he knew the applications -- BLAST, GeneWise, and database searching to find gene patterns -- were straightforward enough. By the end of January, Dagdigian was soliciting bids from hardware and software vendors. He had no idea how much the center wanted to spend, but no one blinked when he handed them the bids.
Dell was selected to provide 30 dual Pentium III PowerEdge 1550 servers (since succeeded by the 1650), plus four dual processor PowerEdge 6450 Pentium III Xeon-based servers for administration, database (MySQL from a company by the same name), cluster management, and backup. The operating system is a “highly customized” version of Red Hat 7.2.
“The cluster elements are practically disposable. If a node goes down or breaks, the job goes somewhere else [thanks to Platform Computing’s Platform Load Sharing Facility software],” says Dagdigian. “They could be breaking left and right, and none of the users would notice a thing. They’re cheap, too. In the grand scheme of things, they’re the cheapest component at $2,200 each.”
The big money was spent on backup and the Network Appliance F840 filer system, which will provide 4TB of Network Attached Storage (NAS). NAS was chosen over Storage Area Network (SAN) technology because it’s cheaper and more flexible for data sharing, which is a prerequisite in genomic and proteomic research.
“Scientists have an overwhelming need for shared read and write access to the same data,” says Dagdigian. Extreme Networks provided the networking.
Dagdigian estimates he’s spent between $300,000 and $400,000, but Busby says it’s north of $500,000. “I know because I sign the bills,” Busby says.
Regardless, the center got a powerful IT infrastructure for short money, by leveraging the Harvard name and by getting generous educational discounts. “If you paid list prices, you’d be well over $2 million with this system,” Busby says. “[Harvard] is advertising that money cannot buy.”
Other shrewd moves were made as well, such as purchasing gear at the end of several suppliers’ fiscal year when the companies were anxious to move products. Also, the center bought a 360-tape/seven-drive Qualstar media library for the price of a 180-tape unit because another customer had just returned it, Busby says.
“We overbuilt the system. The center will never need another tape system, knock on wood,” says Dagdigian.
It’s tempting to say this powerful system could meet the needs of any organization doing genomic and proteomic research, but that’s not the case. Some outfits might want more powerful Unix servers such as AlphaServers or supercomputers. But it is safe to say the infrastructure Dagdigian has put together could serve as a model for small, cost-conscious organizations interested in gene expression research.
Where does the system go from here? Literally, it goes to the new server room from Dagdigian’s office, where insufficient power and cooling allowed only a third of the servers to be fired up. The server room itself had to be doubled in size from the original drawings, and the center is facing many of the usual shakedown problems.
“They really missed the boat in the air conditioning. We need 15 tons of cooling and the ceiling unit was taken up with one 3-ton unit,” Busby says.
Until the eight fellows already on board, a staff of yet-to-be-hired computational biologists, and all of the outside groups at Harvard really thump the system, Dagdigian and Busby won’t know if 30 servers will be enough.
“We don’t know if it will be powerful enough. It may be perfectly adequate, and it may be too small. Only time and users will tell,” says a circumspect Dagdigian.
Busby, too, is careful not to call Dagdigian’s design a slam-dunk at this point.
“What Chris has built is a prototype,” he says. “As the scientists get more involved with it, we might find a need for algorithms that don’t exist yet. This prototype will direct us in terms of what other hardware and software we need to develop.”
By the way, the center’s new IT director, Al Daneau, started work on April 15.
“All I had to do is come in admire the handiwork,” a beaming Daneau says.
Sidebar: Harvard’s Bauer Center: A Broad View of Genomics Research
Harvard University has 130 research programs, centers, and institutes, and the Bauer Center for Genomics Research, founded three years ago, reinforces that long tradition. In the area of biology and medicine, Harvard also has the Institute of Chemistry and Cell Biology and the Institute of Human Genetics, both in the Medical School, and the Center for Cancer Prevention in the School of Public Health.
The goal of the Bauer Center, an interdepartmental initiative at Harvard, is to find general principles underlying the structure, behavior, and evolution of cells and organisms, according to Laura Garwin, the former physics editor of Nature magazine and the center’s director of research affairs. “Scientists in the center define genomics in the broadest sense and are using approaches from many disciplines, including biology, physics, chemistry, mathematics, computer science, and engineering,” she says.
The center’s research is carried out by a group of academic fellows -- independent scientists with a broad range of expertise. Eight fellows have been appointed since January 2000, and the center expects to increase the number to 11 in the next 12 to 24 months. The center also has a technical staff to help researchers throughout the Harvard community learn and use new genomic technologies. This staff will be expanded during the next year to include a computational biology group, which will be the primary user of the just-built IT infrastructure. Ultimately, the center expects to have 70 to 80 people working within its new walls.
The director of the Bauer Center is Andrew W. Murray, a distinguished cell biologist. The center was founded by two Harvard faculty members and Howard Hughes Medical Institute investigators: Douglas A. Melton of the department of molecular and cellular biology and Stuart L. Schreiber of the department of chemistry and chemical biology. Melton is a renowned developmental biologist who, after his son’s diagnosis with juvenile (type I) diabetes, has devoted his research to the study of pancreatic development and stem cell research. Schreiber is a pioneer in the burgeoning field of chemical biology; it was at his bidding that the historic department added “chemical biology” to its name. –J.D