By Kevin Davies
November 10, 2009 | By his own admission, BioTeam co-founder and technology director Chris Dagdigian is a cynical sort, dedicated to the productive use of IT resources, focused on results and cost—and pathologically averse to marketing spin. That makes his positive reception to cloud computing all the more compelling. Just don’t call it “cloud computing.”
Delivering a masterful overview of the cloud at the inaugural Bio-IT World Europe conference in October, Dagdigian confessed: “‘Cloud computing’ is to me exactly the word I hate, just like I hated ‘Grid computing’ in the ’90s. The word itself has been perverted by way too many marketing organizations.” Multi-site Grid computing was “98% hype and 2% usefulness,” Dagdigian recalled. So why did Dagdigian “drink the Kool-Aid” over cloud computing?
The tipping point came in late 2007. Dagdigian and his BioTeam colleagues realized that, without any managerial mandate, the whole group of consultants was independently experimenting with Amazon Web Services (AWS) to solve a customer problem. The cost of EC2 is ridiculously cheap, with almost infinite ways of controlling it.
“It works because the pricing is smart,” said Dagdigian. “Amazon blew people out of the water. Others were charging $1-2/hr. They disrupted the market by starting at 10 cents/hr.” Experimenting was addictive and affordable. After a week of messing around, Dagdigian’s first bill from Amazon was less than $10.
While the pace of AWS development impresses even a cynic such as Dagdigian, the primary virtue of the cloud, he explains, boils down to this: “If you have a 100 CPU-hour research problem, you can fire up 10 servers and solve the problem in 10 hours for $40. Or fire up 100 servers and solve the problem in 1 hour for the same money. That’s the ‘Aha’ moment.”
Dagdigian asked rhetorically: “Do you have that type of virtual scaling power in your facility? And if you do, how much of your infrastructure are you willing to keep idle and running so you can actually burst onto that scale?”
Dagdigian tries hard not to use the term ‘cloud,’ preferring instead utility computing or simply “The C word.”
“Amazon Web Services is the cloud,” said Dagdigian (see, “Amazon’s Big Cloud”). “Anybody who tries to claim otherwise is fooling themselves or believes their own marketing. Amazon has a multiple year head start on everybody else. Google is probably not going to catch up. Microsoft is probably not going to catch up. They probably have six months to catch up, and if they don’t do it in six months, [AWS] will rule the world.”
That could change, of course. The Googles and Microsofts of the world have the money, smart people, infrastructure, and there could be disruptive changes ahead. “But Amazon has been doing this long enough and they’re improving their product rapidly enough, we’re talking a multiyear head start.”
Not Rocket Science
When Dagdigian talks about cloud computing, he says he’s not talking about SaaS, or SalesForce.com, or magical cloud terms. “I’m talking about utility computing as it resonates with an infrastructure person—servers, systems, workflows, scientific application pipelines… At the end of the day, I’m interested in replicating and duplicating complex systems.”
Cloud computing “really isn’t rocket science,” he says. Unlike Grid computing, Dagdigian says “it’s actually very easy to get your head around. You’ll know intuitively what parts of your research workflow make sense” to try in the cloud. The AWS tutorial takes less than half a day.
But Dagdigian’s assessment of ‘private clouds’ is blunt: “Absolute rubbish,” he said, arguing the term is being slapped onto virtualization services in another marketing makeover.
“There’s a lot of people trying to sell you the vision of, ‘Ooh, look at what you can do inside Amazon. You can do that inside your own datacenter. That’s probably a really cool future a couple of years down the road, but right now it is people trying to get IPO/venture funding… Nothing’s been extensively tested. To build a private cloud in ’09, you take your VMware stuff, all your virtualization stuff… wave the magic marketing pixie dust, and excrete a press release. Give it a year.”
That said, Dagdigian admits there are some interesting academic pilot projects in private cloud computing going on, and it’s “a great topic for a thesis.” But one of the “inconvenient truths” of private clouds is that the ability to move systems internally to meet business and scientific needs requires re-architecting the network, and in his view it’s not ready for prime time.
Dagdigian ticked off four primary areas in which his company is already using the cloud:
- Part of BioTeam’s own business, such as development software engineering, is done on the cloud;
- Training sessions for resources such as Grid Engine. The cloud enables everybody in the class to essentially get their own cluster, turning an expense of potentially thousands of dollars using a conventional cluster into a few dollars using AWS. “The most expensive bill for training was under $300,” says Dagdigian, and that was because he forgot to shut down the machines overnight.
- Proof-of-concept projects for companies such as Sun and Univa.
- Directed efforts for ISVs porting applications, and some pharma clients doing real science such as molecular modeling (see, “Antibody Docking on the Amazon Cloud,” Bio•IT World, May 2009).
Amazon’s cloud is much more than just EC2—it is a collection of discrete services available on demand that can be combined in interesting ways to build workflows. Dagdigian spoke enthusiastically about Amazon’s rapid product development cycle and steady stream of new features, such as auto-scaling, Simple DB (for SQL-like queries), two storage products (one of which is Amazon’s S3, or Simple Storage Service), and the simple queue system (SQS) to build pipelines and pass messages.
Coping with the data tsunami today is more challenging than acquiring compute power to solve problems. AWS now has a physical ingest/outgest service. Dagdigian described how one can mail Amazon a SATA or USB disk, delivering data much faster than over the Internet. That should ensure there isn’t a repeat of a “very nasty problem” Dagdigian faced earlier this year, when tasked with quickly moving 20 TB of data out of the Amazon cloud to be delivered to a client. “Amazon had no solution for bulk export,” he said. It would be virtually impossible to lease very fast Internet connections for short periods. Fortunately, friends at 2NPlus1, a local datacenter, and an Ivy League university provided the bandwidth (ultimately a Gigabit Ethernet connection) to solve the problem.
Overall, Dagdigian is enthusiastic about the cloud, but he freely admits there has been a “ridiculous amount of hype.” He points potential users to a white paper by McKinsey that outlines the downsides to utility computing, which he calls a “fantastic read if you’re very excited about cloud computing.” It is also imperative for would-be users to have “a solid understanding of their own internal operating costs, ranging from servers, to electricity to buildings to administration. In many ways, utility computing is about economics and saving money. You must have a solid way of modeling your IT operations… Unless you do that, you’re probably going to lose a lot of money in the cloud.”
The question of cloud security is both “interesting and overblown,” says Dagdigian. “There’s absolutely a whiff of hypocrisy in the air,” which he suggests comes from IT staff trying to protect their jobs. “It’s very funny to see people demanding security practices on the cloud that they’re unable to run in-house.”
Dagdigian argues that Amazon, Google, and Microsoft have better internal controls than any biotech, pharma, or academic IT manager. “They’re not stupid. They’ve hired incredibly smart people… They will answer every question you can throw at them.” Moreover, Amazon allows users to implement their own security practices. “There’s no technical barrier to having you encrypt every single packet that goes to and from the cloud.” Amazon has published white papers on cloud security and HIPAA compliance.
Dagdigian’s colleague Adam Kraut worked with Pfizer to solve a molecular modeling workflow that typically required 48 hours on 72 CPUs. Shifting that work to the cloud, marshalling some 500 CPUs, could produce an answer within 3.5 hours (see p. 32). Dagdigian said that, “in many cases, protein docking is an ideal application for cloud computing.”
Cloud computing is not nirvana. One bottleneck for some is “the ingest problem,” but if that can be solved, Dagdigian predicts “there’s a possibility of petabyte-scale volumes of data moving into the cloud. They can post it and store it cheaper than I can do it safely and securely at home.” Another interesting model is that of user downloads, where researchers funded by public grants could share multi-TB datasets and meet sharing requirements by asking the downloader to pay.
“This is a very hyped and trendy area, and not a solution for everybody,” said Dagdigian. Users need to find the middle ground between Luddites and evangelists. The bottom line: “Start small, stay targeted, and go for the easy stuff first.”
Amazon’s Big Clouds
Perhaps because experts such as Dagdigian say Amazon is virtually synonymous with the cloud, the firm declined to make its web services experts available for this story. But Amazon spokesperson Kay Kinton offered Bio•IT World an overview of Amazon’s cloud capabilities. Amazon Web Services (AWS) provides scalable compute infrastructure that enables organizations to requisition compute power, storage, and other application services on demand. “A customer doesn’t need to think about controlling them, maintaining them or even where they are located,” says Kinton. The AWS cloud “enables companies of all sizes to focus on the differentiating factors of their business as opposed to the infrastructure required to run it.”
The AWS cloud is based on Amazon’s own back-end technology infrastructure, which has been honed over the past decade into an ultra-reliable, scalable, and cost-efficient web infrastructure. “AWS gives any software developer the keys to this infrastructure, which they can use to build and grow any business,” says Kinton. Amazon has hundreds of thousands of registered developers, including major organizations such as Eli Lilly, Pfizer (see p. 32), the New York Times, NASDAQ, and ESPN. Partners include Red Hat, Oracle, Sun, My SQL, IBM, SalesForce, and CapGemini. Besides Amazon’s Elastic Compute Cloud (EC2), AWS includes:
- Amazon Simple Storage Service (Amazon S3)
- Amazon SimpleDB
- Amazon Simple Queue Service (Amazon SQS)
- Amazon Flexible Payments Service (Amazon FPS)
- Amazon CloudFront
- Amazon Elastic MapReduce
Among satisfied users of EC2 and S3 is Peter Tonellato, from Harvard Medical School’s Laboratory for Personalized Medicine. He runs simulations to assess the clinical value of new genetic tests by creating patient avatars—“virtual” patients—for different genetic tests. He hopes to “dramatically reduce the time it usually takes to identify the tests, protocols, and trials that are worth pursuing aggressively for both FDA approval and clinical use,” he says. “The combination of Oracle and AWS allowed us to focus our time and energy on simulation development, rather than technology, to get results quickly.” Pathwork Diagnostics, a molecular diagnostics company, using Univa UD’s UniCloud to build HPC clusters in EC2. Pathwork thinks it saved hundreds of thousands of dollars by avoiding having to purchase and manage its own HPC hardware. Illumina has also bought in. It has deposited the full genome sequences of a Yoruban trio (parents and child) in the cloud.
The recently reduced costs for EC2 range from 8-68 cents per hour for Linux instances, and 12 cents to $1.16 for Windows. (Costs are about 10% higher in Europe.)
This article also appeared in the November-December 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.