Uber-Cloud Experiment Provides Insight into Computing-as-a-Service

December 6, 2012

By Wolfgang Gentzsch and Burak Yenier  

December 6, 2012 | Guest Commentary | After a fast-paced three months, Round 1 of the Uber-Cloud Experiment concluded last month, revealing roadblocks in the computing-as-a-service landscape, solutions, and setting the stage for Round 2, which launched December 1.  

End users can achieve many benefits by gaining access to additional compute resources beyond their current internal resources. Arguably the most important are agility gained by speeding up product design cycles through shorter simulation run times, and superior quality achieved by simulating more sophisticated geometries or physics, or by running many more iterations to look for the best product design. 

Tangible benefits like these make remote Computing-as-a-Service very attractive. But how far are we from an ideal Computing-as-a-Service or HPC-in-the-Cloud model?  

To address this question, we organized the Uber-Cloud Experiment to explore the end-to-end process of accessing remote resources in computer centers and in HPC Clouds, and to study and overcome the potential roadblocks. The experiment kicked off in July 2012 and brought together more than 160 organizations representing industry end-users with their applications, software providers, computing and storage resource providers, and experts, organized into 25 international teams.   

Tackling Real World Problems  

Teams were built around ongoing projects chosen by the end-users. End-user and software providers were paired, assigned an expert, and matched with a suitable resource provider. Each team’s goal was to complete the project while charting their way around the hurdles they identified. Teams communicated with the organizers at certain points in the process, although generally the teams were autonomous.  

Intentionally, we performed the first round of this experiment manually—that is, not via an automated service—because we believe the technology is not the challenge anymore; rather it’s the people and their processes, and that’s what we wanted to explore.  

Reports from across the teams revealed common roadblocks and proposed solutions, the most important are: To set the right level of expectations, define goals that are incrementally better than the current capabilities of your organization, technology infrastructure and processes. Selecting a reliable resource provider with adequate available resources is paramount.  

Start-up concerns included security and privacy of raw data, models, and results. The teams’ experiences showed success when end-users clearly document security and privacy requirements at the beginning of the project. Unpredictable costs were often a major problem in securing a budget for a given project. Therefore, automated, policy-driven monitoring of usage and billing were essential to keep costs under control. Lack of availability of resources sometimes led to long delays. To speed up the resource allocation process, we recommend resource providers set up queues specific to the end-user needs and assign the queue during the registration process. Also, resource providers could develop self-service knowledgebase tools, which increase the efficiency of their support processes.  

Finally, in some cases, incompatible software licensing models hindered adoption of Computing-as-a-Service, but several successful on-demand licensing models exist from some forward looking ISVs, from which we believe the others can learn.  

Next Steps  

Round 2 of the experiment launched December 1, with significant improvements over Round 1. A professional project management tool, including more automation, will serve more participants and more teams. Application areas will now include the life sciences, and teams will be given better guidance through an end-to-end process broken down into 20 single and well-defined steps.  

Over 250 participants have already registered for Round 2 and we are still seeking computational biologists and any others who would like to be involved. To participate in Round 2 or just monitor it closely—and to receive the final Round 1 report with all findings—register at http://www.compbioexperiment.com  


Use Case: Simulating new probe design for a medical device  

By Chris Dagdigian, The BioTeam and Felix Wolfheimer, CST    

Front-end showing a simulation 
model and two solvers in action.

 Our end user’s corporation is one of the world's leading analytical instrumentation companies. They use computer-aided engineering for virtual prototyping and design optimization on sensors and antenna systems used in medical imaging devices.  

Periodically, the end-user needs large compute capacity in order to simulate and refine potential product changes and improvements. The periodic nature of the computing requirements makes it difficult to justify capital expenditure for complex assets that may end up sitting idle for long periods of time. To date the company has invested in a modest amount of internal computing capacity sufficient to meet base requirements. Additional computing resources would allow the end user to greatly expand the sensitivity of current simulations and may enable new product & design initiatives previously written off as "untestable". 

Our software is CST STUDIO SUITE, a popular commercial application for 3D electromagnetic simulations. We are currently operating in the Amazon cloud and have successfully completed a series of architecture refinements and scaling benchmarks. Our hybrid cloud-bursting architecture allows local computing resources residing at the end-user site to be utilized along with Amazon cloud-based resources. At this point in the project we are still exploring the scaling limits of the Amazon EC2 instance types and are beginning new tests and scaling runs designed to test computing task distribution via MPI (the Message Passing Interface). The use of MPI will allow us to leverage different EC2 instance type configurations. We are currently at the point in which we are routinely running simulations that would not be technically possible using the local-only resources of our end user.  We also began testing the use of the Amazon EC2 Spot Market in which cloud-based assets can be obtained from an auction-like marketplace offering significant cost savings over traditional on-demand hourly prices.