By Adam Kraut
January 20, 2010 | Inside the Box | Over the 2009 holiday shopping season, Amazon.com processed an estimated 73 items per second and 6.3 million items on a typical day. That may not come as much of a surprise. But what you may not know is that it’s all powered by cloud computing from Amazon Web Services (AWS). And Amazon isn’t even the biggest AWS customer, according to CEO Werner Vogels. Amazon’s Infrastructure as a Service is so far ahead in the cloud that competitors are struggling to keep pace. AWS is not about to rest on its laurels. Fortunately for the rapidly growing number of AWS users, it is aggressively innovating and improving its services. One might even say that during the 2009 holiday season, it left us some presents under the tree.
One of the major barriers to cloud adoption in enterprise IT departments, as with running data through any hosted service, comes down to security. Most of it boils down to IT staff protecting their jobs or downright paranoia. As my colleague Chris Dagdigian said at Bio-IT World Europe (see “The C Word,” Bio•IT World, Nov 2009), “It’s very funny to see people demanding security practices on the cloud that they’re unable to run in-house.”
To address those customers’ concerns, Amazon recently announced Virtual Private Cloud (VPC) service. VPC offers a secure bridge between existing IT infrastructure and cloud resources through a Virtual Private Network (VPN) connection. With VPC enabled, your company’s security services and firewalls can extend to cover AWS compute resources running on Elastic Cloud Compute (EC2). While this should ease concerns at pharmaceutical companies and other security-conscious IT departments, it does come with the caveat of VPN overhead. If your internal IT bandwidth is poor, it will become the bottleneck when pushing terabytes of data through VPC.
The Data Flow Problem
Another major announcement from AWS is the Import/Export service for Amazon’s Simple Storage Service (S3). Let’s say you have 200 TB of data you want to load into S3 for analysis on EC2. No matter how fat your link to Amazon, it’s going to take a long time to move that data. Dagdigian alluded to this service last April at the Bio-IT World Expo, when he said: “If the ingest problem can be solved… I see petabyte-scale datasets that would flock to utility storage services.” Amazon has answered the call with S3 Import/Export. You put your data on USB or SATA hard drives and send it to Amazon through standard mail. Amazon takes those disks and physically loads them up in their datacenter running S3. After a couple of days, your data are available for processing on EC2 or distributing to customers and colleagues.
When it comes to raw data S3 is a fantastic solution, but what about all the data living in relational databases? Amazon knows that an infinitely scalable relational database is impossible to engineer. As an alternative, AWS has offered SimpleDB, a non-relational distributed database service. In addition there’s Elastic Block Store (EBS) providing elastic disk storage that many customers use underneath of their own managed MySQL, Postgres, and Oracle instances. Amazon realized their customers were spending too much time managing MySQL on top of EC2 and EBS. Relational Database Service (RDS) gives AWS users an API for a self-contained MySQL database instance without having to launch new EC2 servers or deal with EBS volumes and snapshots. Currently in beta, RDS supports up to 20 databases per customer each allowing up to 1TB of storage. There’s nothing to install, configure, or tune. Simply issue a few commands to launch a fully functional database server with the same on-demand pricing we’ve come to appreciate from AWS.
Perhaps the most interesting announcement from AWS is a new pricing model called Spot Instances. While the 10 cents/hour on-demand pricing of EC2 is what made the service initially so popular, James Hamilton, vice president of AWS, calls Spot Instances “a fundamental innovation in how computation is sold.” Spot pricing allows customers to bid on instances effectively balancing the peak and off-peak capacity of EC2. Under this model the spot market drives EC2 pricing. If demand is low you pay less, if demand is high you pay more. Workloads with soft time constraints such as compression, encryption, and exhaustive sampling can be processed at a potentially lower cost than standard EC2 rates.
Amazon is actively engaged in making life easier in the AWS world. According to Vogels, it’s not just about enterprise cost savings but agility in the cloud. “This is not a standard product that is finished. It’s been a continuous improvement process since Day 1.”
Adam Kraut is a scientific consultant at the Bio Team. He can be reached at firstname.lastname@example.org