February 8, 2012
| Bio-IT World > The ‘C’ Word


The ‘C’ Word



By Kevin Davies

November 18, 2009
| By his own admission, BioTeam co-founder and technology director Chris Dagdigian is a cynical sort, dedicated to the productive use of IT resources, focused on results and cost—and pathologically averse to marketing spin. That makes his positive reception to cloud computing all the more compelling. Just don’t call it “cloud computing.”

Delivering a masterful overview of the cloud at the inaugural Bio-IT World Europe conference in October, Dagdigian confessed: “‘Cloud computing’ is to me exactly the word I hate, just like I hated ‘Grid computing’ in the ’90s. The word itself has been perverted by way too many marketing organizations.” Multi-site Grid computing was “98% hype and 2% usefulness,” Dagdigian recalled. So why did Dagdigian “drink the Kool-Aid” over cloud computing?

The tipping point came in late 2007. Dagdigian and his BioTeam colleagues realized that, without any managerial mandate, the whole group of consultants was independently experimenting with Amazon Web Services (AWS) to solve a customer problem. The cost of EC2 is ridiculously cheap, with almost infinite ways of controlling it.

“It works because the pricing is smart,” said Dagdigian. “Amazon blew people out of the water. Others were charging $1-2/hr. They disrupted the market by starting at 10 cents/hr.” Experimenting was addictive and affordable. After a week of messing around, Dagdigian’s first bill from Amazon was less than $10.

While the pace of AWS development impresses even a cynic such as Dagdigian, the primary virtue of the cloud, he explains, boils down to this: “If you have a 100 CPU-hour research problem, you can fire up 10 servers and solve the problem in 10 hours for $40. Or fire up 100 servers and solve the problem in 1 hour for the same money. That’s the ‘Aha’ moment.”

Dagdigian asked rhetorically: “Do you have that type of virtual scaling power in your facility? And if you do, how much of your infrastructure are you willing to keep idle and running so you can actually burst onto that scale?”

Dagdigian tries hard not to use the term ‘cloud,’ preferring instead utility computing or simply “The C word.”

“Amazon Web Services is the cloud,” said Dagdigian (see, “Amazon’s Big Cloud”). “Anybody who tries to claim otherwise is fooling themselves or believes their own marketing. Amazon has a multiple year head start on everybody else. Google is probably not going to catch up. Microsoft is probably not going to catch up. They probably have six months to catch up, and if they don’t do it in six months, [AWS] will rule the world.”
That could change, of course. The Googles and Microsofts of the world have the money, smart people, infrastructure, and there could be disruptive changes ahead. “But Amazon has been doing this long enough and they’re improving their product rapidly enough, we’re talking a multiyear head start.”

Not Rocket Science
When Dagdigian talks about cloud computing, he says he’s not talking about SaaS, or SalesForce.com, or magical cloud terms. “I’m talking about utility computing as it resonates with an infrastructure person—servers, systems, workflows, scientific application pipelines… At the end of the day, I’m interested in replicating and duplicating complex systems.”

Cloud computing “really isn’t rocket science,” he says. Unlike Grid computing, Dagdigian says “it’s actually very easy to get your head around. You’ll know intuitively what parts of your research workflow make sense” to try in the cloud. The AWS tutorial takes less than half a day.

But Dagdigian’s assessment of ‘private clouds’ is blunt: “Absolute rubbish,” he said, arguing the term is being slapped onto virtualization services in another marketing makeover.

“There’s a lot of people trying to sell you the vision of, ‘Ooh, look at what you can do inside Amazon. You can do that inside your own datacenter. That’s probably a really cool future a couple of years down the road, but right now it is people trying to get IPO/venture funding… Nothing’s been extensively tested. To build a private cloud in ’09, you take your VMware stuff, all your virtualization stuff… wave the magic marketing pixie dust, and excrete a press release. Give it a year.”

That said, Dagdigian admits there are some interesting academic pilot projects in private cloud computing going on, and it’s “a great topic for a thesis.” But one of the “inconvenient truths” of private clouds is that the ability to move systems internally to meet business and scientific needs requires re-architecting the network, and in his view it’s not ready for prime time.

Dagdigian ticked off four primary areas in which his company is already using the cloud:

  • Part of BioTeam’s own business, such as development software engineering, is done on the cloud;
  • Training sessions for resources such as Grid Engine. The cloud enables everybody in the class to essentially get their own cluster, turning an expense of potentially thousands of dollars using a conventional cluster into a few dollars using AWS. “The most expensive bill for training was under $300,” says Dagdigian, and that was because he forgot to shut down the machines overnight.
  • Proof-of-concept projects for companies such as Sun and Univa.
  • Directed efforts for ISVs porting applications, and some pharma clients doing real science such as molecular modeling (see, “Antibody Docking on the Amazon Cloud,” Bio•IT World, May 2009).

Amazon’s cloud is much more than just EC2—it is a collection of discrete services available on demand that can be combined in interesting ways to build workflows. Dagdigian spoke enthusiastically about Amazon’s rapid product development cycle and steady stream of new features, such as auto-scaling, Simple DB (for SQL-like queries), two storage products (one of which is Amazon’s S3, or Simple Storage Service), and the simple queue system (SQS) to build pipelines and pass messages.

Coping with the data tsunami today is more challenging than acquiring compute power to solve problems. AWS now has a physical ingest/outgest service. Dagdigian described how one can mail Amazon a SATA or USB disk, delivering data much faster than over the Internet. That should ensure there isn’t a repeat of a “very nasty problem” Dagdigian faced earlier this year, when tasked with quickly moving 20 TB of data out of the Amazon cloud to be delivered to a client. “Amazon had no solution for bulk export,” he said. It would be virtually impossible to lease very fast Internet connections for short periods. Fortunately, friends at 2NPlus1, a local datacenter, and an Ivy League university provided the bandwidth (ultimately a Gigabit Ethernet connection) to solve the problem.

Overall, Dagdigian is enthusiastic about the cloud, but he freely admits there has been a “ridiculous amount of hype.” He points potential users to a white paper by McKinsey that outlines the downsides to utility computing, which he calls a “fantastic read if you’re very excited about cloud computing.” It is also imperative for would-be users to have “a solid understanding of their own internal operating costs, ranging from servers, to electricity to buildings to administration. In many ways, utility computing is about economics and saving money. You must have a solid way of modeling your IT operations… Unless you do that, you’re probably going to lose a lot of money in the cloud.”

The question of cloud security is both “interesting and overblown,” says Dagdigian. “There’s absolutely a whiff of hypocrisy in the air,” which he suggests comes from IT staff trying to protect their jobs. “It’s very funny to see people demanding security practices on the cloud that they’re unable to run in-house.”

Dagdigian argues that Amazon, Google, and Microsoft have better internal controls than any biotech, pharma, or academic IT manager. “They’re not stupid. They’ve hired incredibly smart people… They will answer every question you can throw at them.” Moreover, Amazon allows users to implement their own security practices. “There’s no technical barrier to having you encrypt every single packet that goes to and from the cloud.” Amazon has published white papers on cloud security and HIPAA compliance.

Dagdigian’s colleague Adam Kraut worked with Pfizer to solve a molecular modeling workflow that typically required 48 hours on 72 CPUs. Shifting that work to the cloud, marshalling some 500 CPUs, could produce an answer within 3.5 hours (see p. 32). Dagdigian said that, “in many cases, protein docking is an ideal application for cloud computing.”

Cloud computing is not nirvana. One bottleneck for some is “the ingest problem,” but if that can be solved, Dagdigian predicts “there’s a possibility of petabyte-scale volumes of data moving into the cloud. They can post it and store it cheaper than I can do it safely and securely at home.” Another interesting model is that of user downloads, where researchers funded by public grants could share multi-TB datasets and meet sharing requirements by asking the downloader to pay.

“This is a very hyped and trendy area, and not a solution for everybody,” said Dagdigian. Users need to find the middle ground between Luddites and evangelists. The bottom line: “Start small, stay targeted, and go for the easy stuff first.” 

Amazon’s Big Clouds

Perhaps because experts such as Dagdigian say Amazon is virtually synonymous with the cloud, the firm declined to make its web services experts available for this story. But Amazon spokesperson Kay Kinton offered Bio•IT World an overview of Amazon’s cloud capabilities. Amazon Web Services (AWS) provides scalable compute infrastructure that enables organizations to requisition compute power, storage, and other application services on demand. “A customer doesn’t need to think about controlling them, maintaining them or even where they are located,” says Kinton. The AWS cloud “enables companies of all sizes to focus on the differentiating factors of their business as opposed to the infrastructure required to run it.”

The AWS cloud is based on Amazon’s own back-end technology infrastructure, which has been honed over the past decade into an ultra-reliable, scalable, and cost-efficient web infrastructure. “AWS gives any software developer the keys to this infrastructure, which they can use to build and grow any business,” says Kinton. Amazon has hundreds of thousands of registered developers, including major organizations such as Eli Lilly, Pfizer (see p. 32), the New York Times, NASDAQ, and ESPN. Partners include Red Hat, Oracle, Sun, My SQL, IBM, SalesForce, and CapGemini. Besides Amazon’s Elastic Compute Cloud (EC2), AWS includes:

  • Amazon Simple Storage Service (Amazon S3)
  • Amazon SimpleDB
  • Amazon Simple Queue Service (Amazon SQS)
  • Amazon Flexible Payments Service (Amazon FPS)
  • Amazon CloudFront
  • Amazon Elastic MapReduce

Among satisfied users of EC2 and S3 is Peter Tonellato, from Harvard Medical School’s Laboratory for Personalized Medicine. He runs simulations to assess the clinical value of new genetic tests by creating patient avatars—“virtual” patients—for different genetic tests. He hopes to “dramatically reduce the time it usually takes to identify the tests, protocols, and trials that are worth pursuing aggressively for both FDA approval and clinical use,” he says. “The combination of Oracle and AWS allowed us to focus our time and energy on simulation development, rather than technology, to get results quickly.” Pathwork Diagnostics, a molecular diagnostics company, using Univa UD’s UniCloud to build HPC clusters in EC2. Pathwork thinks it saved hundreds of thousands of dollars by avoiding having to purchase and manage its own HPC hardware. Illumina has also bought in. It has deposited the full genome sequences of a Yoruban trio (parents and child) in the cloud.

The recently reduced costs for EC2 range from 8-68 cents per hour for Linux instances, and 12 cents to $1.16 for Windows. (Costs are about 10% higher in Europe.)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


This article also appeared in the November-December 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

sgi whp 2
Managing the Modern Genomics Data Flood
Sponsored by SGI

Managing and storing the perfect storm of multi-disciplined data pouring from next generation sequencers and other omics instruments is a central challenge in life sciences. Discover in this paper how the SGI ArcFiniti storage solution, optimized for unstructured genomics and life sciences data can: 

  • Reduce costs, proactively protect data integrity, and deliver the high performance I/O required for genomics data processing and analysis.  
  • Effectively manage capacities from 156TB to 1.4PB as a disk based, integrated hardware and software platform 


sgi - whp 1
Turning Genomics Data into Practical Insight
Sponsored by SGI

With worldwide sequencing capacity approaching 13 quadrillion DNA bases annually turning genomics data into knowledge is a true computational challenge. Read this paper and learn how the SGI UV coherent shared memory platform can:  

  • Speed results time while cost competitively tackling the most difficult computational problems across all omics disciplines. 
  • Push performance by scaling to extraordinary levels, up to 256 sockets (2,560 cores, 4,096 threads) per single system (one OS image). 

Provide support for up to 16TB of coherent shared memory in a single system image enabling extreme efficiency across a wide range of compute demands. 



accerlys-logo_2012_wh
New Complimentary Market Survey…
Collaborations and Communications Within Drug Discovery Research
Sponsored by Accelrys
This survey was conducted by the Cambridge Healthtech Media Group in January, 2012. It was sponsored by Accelrys related to their HEOS initiative to gather valid information around externalizing collaborative research while improving communications in the cloud. With 310 qualified industry respondents the survey findings reveal useful usage and trends patterns.  An insightful follow-on discussion and webinar related to this survey, and the HEOS by Scynexis SaaS portal is also available on the Bio-IT World website for complementary viewing.
 


Job Openings

tessella logo 
Scientific Software Engineer
Boston MA
$70,000 to $95,000
 

Tessella delivers software engineering and consulting services to leading pharmaceutical and biotech companies. We are recruiting Software Engineersto work with skilled bioinformaticians and scientists to identify business needs and recommend and develop technical solutions. Applicants require BS, MS or PhD in bioinformatics, biology or chemistry and 2+ years of software development in either: Java, C#, C++, C or VB.NET. 

Apply at http://jobs.tessella.com   

 

oxford nanopore logo 


 Early Access Collaborations Managers
Oxford Nanopore Technologies is developing a novel technology, GridIONTM for the direct, electronic analysis of DNA/RNA and other analytes.  As the system approaches the market, we are building a team of technically knowledgeable, highly motivated candidates with excellent customer service and facilitation skills to join our company as Collaboration Managers.  This is a unique opportunity to work with world-leading genomics customers throughout the early adoption phase of a new generation of DNA sequencing technology.. This is a facilitative, enabling role with responsibility for managing technology development collaborations with key customers at leading genomics institutions.  It will include long term management of the collaboration plan and milestones and associated meetings and documentation. Click here to find out more and apply   

Oxford Nanopore's GridION technology, VP, Sales and Marketing Oxford Nanopore Technologies is a fast-moving technology company that is developing a novel electronic molecular analysis technology. The technology is adaptable for the analysis of DNA/RNA, proteins, chemicals and other molecules.  It is therefore suitable for use in a variety of markets including scientific research and clinical applications.  As the technology approaches the market, Oxford Nanopore is seeking a visionary VP of sales and marketing to join the senior team.  The candidate will embrace the opportunities afforded by entering the market with a truly disruptive technology that has the potential to expand the number of users and the variety of applications in each target market.  This is a rare opportunity to influence the commercial strategy at an early phase of its commercial lifetime, in a well funded company.  Oxford Nanopore welcomes applications from candidates with a track record of high-level strategic commercial  leadership, who wish to apply a fresh approach to existing markets.  Experience in Life Sciences/DNA sequencing is central to this role, however we will consider your application if you have experience of disruptive technologies in other related industries.  We are particularly interested in candidates with strong expertise in the use of digital technologies for sales and marketing of scientific/technical products.  Click to  Apply  


 





For reprints and/or copyright permission, please contact  Tim McLucas, (781) 972-1342, tmclucas@healthtech.com .