Marshall's IT Plan for Janelia Farm



By Kevin Davies

Oct. 16, 2006 |  Driving north from Washington Dulles Airport towards the Potomac River, it's easy to miss Janelia Farm. The only road sign faces the opposite direction, belatedly guiding lost taxi drivers retracing their route in search of the campus. Outside a makeshift hut in the middle of a construction site, the security guard waves a visitor's taxi down a long, winding dirt road appropriately named Helix Drive. Around a corner, however, the scene changes dramatically.

In 2000, the Howard Hughes Medical Institute (HHMI), the nation's largest medical institute, paid $53 million to acquire the Janelia Farm estate, an historical riverfront property. HHMI has since expanded and transformed the near-700-acre estate into a spectacular Rafael Viñoly-designed research campus, which officially opens this month. The first of some two dozen scientific group leaders are already settling in - neuroscientist Karol Svoboda from Cold Spring Harbor, bioinformaticist Sean Eddy from Washington University St Louis, and former Celera informatics chief Gene Myers from UC Berkeley.

HHMI's goal is simply to set up outstanding investigators with the environment and resources to push back the boundaries of life sciences. Of critical importance in that quest - especially given a scientific ensemble that is expected to drive the data-intensive fields of imaging, visualization, and bioinformatics - is the IT infrastructure. And so to design and build a data center for the ages, HHMI turned to the best - Marshall R. Peterson.

One would expect the new VP of IT to call Janelia Farm the most exciting project he has ever participated in. But coming from the man who oversaw the impressive IT infrastructure to assemble the human genome at Celera Genomics six years ago, that is particularly noteworthy. "I firmly believe this will rapidly be the premier research facility in the life sciences on the planet," says Peterson. "My marching orders are to give these people what they need to get the job done."

Asked to describe his career highlights, Peterson quietly mumbles something about being an aeronautical engineer and his time in Sweden building SAP. Nary a mention of his helicopter sorties in Vietnam or six Purple Hearts. It was another Vietnam vet, J. Craig Venter, who recruited Peterson to Celera, where as Vice President for Infrastructure Technology, he designed the Compaq Alpha network that assembled both the fruit fly and human genomes.

The fruit fly project was a curious collaboration between Venter, who had few friends in the academic community, and Berkeley geneticist Gerry Rubin. After Peterson left Celera to join the Venter Institute, Rubin hired Peterson to consult on the Janelia Farm project, and subsequently hired him (and several of his ex-Celera IT colleagues) in December 2005 (see "Rubin's Risky Business").

Tour of Duty
It feels like a five-minute walk from Peterson's office, overlooking the duck pond, to the majestic 5,000-square-feet data center - and that's just half of the space at Peterson's disposal. He says it's a trip he shouldn't have to take too often if he's doing his job.

HHMI staff claim the data center network capacity is bigger than ESPN's, with the core matching Google and Yahoo as the fastest single-site networks. Indeed, with the exception of certain three-letter [US Government] agencies, Janelia Farm might be the biggest 10-Gb aggregation in the world. Visitors are unanimously impressed.

"We want to look professional - part of this is marketing too," Peterson admits. "As much as we like to think anyone who comes here wants to work here, there's lots of competition. They want to see that whatever they want, we're going to give them. I have a reputation for customer service - it's not going to stop."

The data center is completely fiber and boasts a multi 10-Gb network. "That's a constant question," says Peterson. "Am I going to get the data to my desktop fast? If I can't, then I'm going to start having people buying their own supercomputers and sliding it under their desk. I don't want that - it's not cost effective, and you can't manage it." He adds: "We're going to have very high-resolution graphics, and people are going to see it very fast. Just one set of microscopes will be generating 500 GB data/day. 24x7x365."

Although he says that, "networking changes a lot," Peterson relied extensively on his Celera and Venter Institute experience in designing the data center, as well as feedback from Myers, Eddy, and other group leaders. Janelia Farm has as much storage and "a lot more horsepower" in 2,000 square feet as Celera had in five times the space, for a lot less money. 

Four Vendors
Selecting the vendors involved extensive rounds of competition, with a view, says Peterson, to "obviously trying to get the best price and the best technology." Peterson stipulated the key requirements, notably 10 Gb in the core and stringent security, given the plethora of lab instruments, administrative systems, and visiting scientists. He challenged the vendors: "How would you design a network that is high performance, scalable, and by the way, if we don't pick you for the whole thing, make sure that it's modular so that we can select you for the core, someone else for distribution, wireless, and so on?"

The allure of being selected for HHMI's new campus meant Peterson benefited from some "extremely aggressive pricing." Says Peterson: "This is an incredible place to have technology. What a showcase!" He might even let vendors bring potential customers in for a tour. But he points out another attraction: "My team has a history of being extraordinarily successful."

Ultimately Peterson went with a four-vendor solution:

  • Force 10 for the core and distribution
  • Foundry for the edge
  • Juniper for security firewalls
  • Meru for wireless.

A hallmark of the design - "signature Peterson data center" - is what's under the floor. "Nothing," says Peterson. "Stuff under the floor is tough to troubleshoot, blocks airflow. Power's overhead - the only thing under the floor is air."

The power supply runs to 200 Watts/square foot - not as high as Peterson wanted, "but for budget purposes that's what it is." Peterson says that could be doubled "without bringing down the data center," but he hopes that won't be necessary.

With some 1,200 64-bit Intel Xeon processors in all, cooling was a major concern. Peterson explains: "We ended up going with Dell and Xeons, which are hot, but we did a calculation: given the price we got with them and given the increased power requirements, it still came in price effective. Having said that, we're very interested in the new generation of Intels and obviously AMD." The data center uses 142 tons of air conditioning.

Everything in the data center is designed to be ripped out and replaced if needed. "The idea is to design infrastructure that is cost effective and easy to replace. We try to be open source - everything is Linux-based, low stress. It helps hugely with the maintenance."

Storage Demands
Peterson selected three tiers and 150 TB of spinning disk storage from EMC. "We started small... seriously!" Peterson smiles. Tier 1 is 30 TB of SAN. Tier 2 is 70 TB of NAS. Tier 3 - the archive - consists of more NAS on disk plus tape. Peterson wants to expand tier 3. "We have capability of over 1 PB of tape," says Peterson. "I can grow to multi petabytes without adding another cabinet." He opens one of a long row of EMC cabinets to show rows of vacant racks.

"We have lots of empty space," says Peterson. "I can start adding storage incrementally. I want to match my demand curve with the cost curve." He wants to make it easy for his "customers" to move data back and forth. "We'll get reports on what they're using and there are budget issues, but what we hope to encourage them to make effective use of storage."

"In many respects this is like Celera - we don't know what we're going to do, and we're not sure how we're going to do it."

The IT staff is holding extensive meetings with the incoming group leaders. "But remember, a lot of what they want to do is stuff that has never been done before," says Peterson. "We're going to do things that are risky. I don't want to buy a bunch of stuff and find out that it's wrong. So we talked with the vendors about long-term levels, and working with them - it's more of an engineering relationship than a vendor relationship."

As for the scientists' desktop preferences, Peterson is agnostic "They can be anything they want," he says. "We give them Linux on the desktop, Mac, Windows... if you want X, we give them X. Our goal is to try to say, 'Don't tell us what you want, tell us what you want to do.'"

Peterson enthuses about Sun Grid Engine, which runs the compute cluster: "It's open source, we love it. Lots of people have experience with it. The idea is to develop a shared facility. It's hard to go to the COO and request a couple of million dollars worth of processors when you're only using 30 percent of what you've got. Using Sun Grid Engine, it's a shared facility. Stuff goes in a queue, maybe one time you get 10 processors, the next time 1,000 processors." And when there's a fault in a node, Sun Grid Engine re-routes jobs and pages the IT team.

Visualize This
Peterson has barely filled half of the 5,000 square feet data center, but he has as much space again available when he needs it. "I've allocated a lot of money for visualization. I might take part of that other site and turn it into a cave."

Before long, Peterson hopes that Janelia Farm scientists will be virtually tracking neurons around 3-D representations of the fruit fly brain. Apple would be a logical partner in such an endeavor, but there are many others. "This is Phase 0," says Peterson. "One of Michael Dell's big things is video. They're really excited about working with us."

Peterson is clearly relishing his newfound freedom at HHMI. "We really are pure research," he says. "Thanks to [HHMI's] superb investment group, we can focus on enabling research, giving people the tools they need, and not dotting i's and crossing t's." A $14 billion endowment certainly buys a lot of freedom.

For now, Peterson says that file systems are his biggest concern. "EMC has a very interesting parallel file system. Panasas has a parallel file system, very fast. I've been testing this stuff for four years. Because of the demands of visualization, this is stuff I want to look at."

"The reason HHMI built this is because we want to give people resources they didn't have. In a lot of cases, they've never been exposed to someone coming saying, how many TB storage do you need?" He says he aims to give "Whatever they want within reason - and reason here is a capital R."

Ultimately, the question facing Peterson and colleagues will be: "How do we store and annotate this image data? Imagine 'flying' through an image that's 500,000 by 400,000 pixels, tracing a neuron, trying to see where it fires. The complexity we're facing is mind-boggling. In many respects, it makes sequencing and assembling the human genome look trivial!" 

Email Kevin Davies.

Subscribe to Bio-IT World  magazine.

 

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 

For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .