Going Up: Cycle Launches 50,000-Core Utility Supercomputer in the Cloud

April 19, 2012

By Kevin Davies  

April 19, 2012 | In the latest in a string of cloud computing firsts, Cycle Computing has collaborated with Schrödinger and Nimbus Discovery to spin up a 50,000-core supercomputer in the Amazon Web Services (AWS) cloud.  

Cycle Computing CEO Jason Stowe and Schrödinger CEO Ramy Farid briefed Bio-IT World on today’s announcement, made at the AWS Summit in New York.   

The supercomputer, named ‘Naga’ after an Asian serpent god, is the world’s largest utility supercomputer to date, says Stowe. “We thought we’d have to crawl through AWS infrastructure to run this once, and the notion of a snake kind of stuck,” he explained.  

Stowe says he is not aware of any comparable claims for the largest utility supercomputer. “In order to be utility supercomputing, it must have a normal scheduling environment so you can run the same apps the same way without reprogramming anything… You have to push a button, provision, pay for what you use, and turn it off.”  

Following successful projects over the past 12 months to spin up a 10,000-core computer in the cloud with Genentech, followed by a 30,000-core computer for a major pharma client, Stowe wanted to raise the bar again.   

The idea Schrödinger brought to Cycle was to conduct a virtual screen of 7 million compounds in multiple conformations—a total of 21 million ligand structures compared to a protein target using a docking application called Glide.   

The new run surpassed 50,000 cores distributed across seven AWS sites around the world—three in North America, and one each in Europe, Brazil, Singapore and Japan. About 80% of the workload was distributed across 5,000 servers at Amazon’s east coast facility in Virginia.  

The experiment—the equivalent of 12.5 processor-years—was conducted in a mere three hours. The final cost was $4,828/hour, or 9 cents/core/hour. Previously, it would take Schrödinger about 11 days to run a similar analysis on its in-house 400-core cluster—stopping all other work in the process.  

Going Up  

“The challenge in computational chemistry modeling as pertains to drug discovery has been the same: we’re always having to make difficult decisions to balance accuracy with speed,” said Farid. “We’re always limited with compute [power]… In the ‘80s, we couldn’t include hydrogen atoms in structures because computers were too slow! We’re still cutting other corners and not doing the best science possible.”  

Farid says the cloud screening was an attempt to do “very good science” without being limited by compute resources. “We want to develop methods to search for more drug candidates, much more cheaply and quickly. Normal high-throughput screening can take months and millions of dollars. We did something very similar, in a few hours, on a much larger set of compounds.”   

Working with Nimbus Discovery—a company founded two years ago by Schrödinger and Atlas Ventures that has also received funding from the Bill and Melinda Gates Foundation—Schrödinger set out to search for leads against a cancer target. But Farid points out that the rationale for building Naga was not merely to save time and money but also to run a more accurate mode of Glide, Schrödinger’s docking software, to include many more structures.   

With 50,000 cores available, Schrödinger was able to include three different conformations for a virtual library of 7 million compounds, essentially giving each compound more attempts to dock to the protein target—emulating the process in nature and reducing false negatives. “We identified compounds we would have missed [doing the analysis internally] because we were able to use the more accurate version of the program.”  

Farid says his company is involved in several collaborations with big pharma companies including Janssen. “This is the starting point. Are we guaranteed this virtual screen will lead to a drug on the market? Absolutely not. This is the start, but we’ve stacked the deck to get as many shots on goal as we can.”  

Sky’s the Limit  

Stowe praised Farid’s team as “way out in front thinking about this problem. We need to get everybody else thinking in this mode.”   

What excites the Cycle CEO is the ability to ask intractable or even impossible questions. “This was something you could never do practically internally,” says Stowe. “You couldn’t use the dedicated cluster; the prospect of doing this in house was really not feasible. We think we can continue to do this class of computation for new targets and conformations [in under a day]. It’s a classic example of what we were trying to get at.”  

One of the Cycle’s keys to success is the software to automate the distribution process. “It took about two hours to get the full cluster up and running—the time to acquire 6,700 servers. We’re trying to make that time as fast as possible,” says Stowe.   

There are plenty of researchers interested in running at ever larger scales, including the winners of Cycle Computing’s recent Big Challenge competition. So how many cores could realistically be assembled? Farid says that while marshaling 50,000 cores “allowed us something fantastic, if we had 500,000 or 1 million cores, we could do even better science. I’m not sure where the limit is.”  

Stowe said much depends on scaling the technology. Cycle already introduced key changes to enable the software to work at this scale. “Many of those changes were planting a floor at this [50,000] number… We don’t see much limit. If someone comes up with a number of cores, we can write software to hit that.”  

Stowe credited Opscode’s Chef as “a fabulous way of configuring and managing server infrastructure at large scale. We showed it can run on 6,700 servers that we didn’t have 90 minutes earlier.” Cycle supports various scheduling environments including GridEngine and Condor, which he said “works very well in cloud-based environments. It’s very tolerant of nodes coming and going.”