By Wolfgang Gentzsch and Burak Yenier
March 12, 2013 | Guest Commentary | The UberCloud CompBio Experiment explores the end-to-end process of accessing and using remote computing resources as a service, and learning how to resolve the many roadblocks. The experiment launches its third round on April 1, and the call is open now for participants.
Teams are built around ongoing projects chosen by the end-users. End-users and software providers were paired, assigned a computing expert, and matched with a suitable resource provider. Each team’s goal was to complete the project while charting their way around the hurdles they identified.
The open experiment started last July with 160 organizations and 25 teams, helping industry end-users explore access and use of remote computing resources available from computing centers and from the cloud. The second round of the experiment will conclude this month, and currently has more than 350 organizations and 35 teams. (see, Uber-Cloud Experiment Provides Insight into Computing-as-a-Service)
For round 3 beginning April 1, we again invite industry end-users, software providers, computational biology experts, and resource providers from computing centers and from the cloud to join the experiment and collaboratively explore the end-to-end process of remote Computing-as-a-Service.
We are expecting about 400 organizations collaborating on 50 new teams built around industry end-users’ applications on remote computing resources. In addition to our current focus areas of life sciences and CAE, we now also invite big data end-users and software, services, and consulting providers to join this experiment.
This round will be conducted professionally, with more automation, and will be more user-friendly than the previous two rounds. We will also provide 3-level support: front line (within each team), 2nd level (UberCloud mentors), 3rd level (software & hardware providers), and finally further grow the interactive UberCloud Exhibit services directory.
The Benefits of Participating in the Experiment
The UberCloud Experiment has been designed to drastically reduce many of the barriers of adopting cloud computing. By participating in this experiment and moving their engineering or big data application onto a remote computing resource, end-users can expect a long list of real benefits, such as professional match-making with suitable service providers; free, on-demand access to hardware, software, and expertise during the experiment; crowd sourcing by building relationships with community members, helping each other, and providing valuable feedback to optimize the platform of the experiment; and free access to the services directory (the interactive UberCloud Exhibit) with a growing number of engineering cloud services.
On the other hand, the list of benefits for service providers (software, resources and expertise) to participate in this experiment is similarly rich. To name a few benefits for service providers: getting immediate constructive feedback from the experiment end-users on how to fine-tune your services; gaining deeper and practical insight into a new market and service-oriented business model; risk-free no failure experimenting allowing you to improve your services during the experiment, on the fly; getting in touch with potential customers; and gaining public attention by becoming part of widely published success stories.
Lessons Learned, Recommendations
Result of one of the CompBio Experiment teams:
Development of stents for a narrowed
artery after balloon angioplasty to widen
the artery and improve blood flow.
During the first round, our 25 teams reported the roadblocks they experienced during the course of their projects. They also provided information on how they resolved them (or not), and offered lessons learned and recommendations. We believe these experiences are useful in informing other work in the Cloud.
Information security, privacy
Guarding the raw data, processing models and the resulting information is paramount to successful remote HPC. Although the HPC experiment clearly declared information security as being out of scope, we’ve discovered that it is a requirement even for an experiment like this.
Any project in industry has a certain level of security, high or low. The end-users should start with clearly documenting security and privacy requirements at the beginning of each project. This strongly influences the selection of the resource provider and the HPC experts who will be working on the project. We also recommend NDA’s to be signed with outside providers.
Businesses operate around budgets, and HPC projects are not any different. Unpredictable costs can be a major problem in securing a budget for a given project. During the 1st Round of the Experiment we’ve ran into multiple projects that ran out of their experiment budget and had to terminate before reaching the desired goals confirming that this roadblock is real.
If not monitored closely pay per use billing can result in unpredictable cost structures. Automated, policy driven monitoring of usage and billing is essential to keep costs under control. Select a monitoring system early on and do not scale up without configuring it to match your needs. Also, in the HPC market software licensing costs may be connected to usage, such as amount of data processed, CPU core hours used. In such scenarios the monitoring and billing solutions must be flexible enough to monitor and account for the cost of software licenses to be able to provide a complete view of the cost structure.
Lack of easy, intuitive self-service registration and administration
A successful cloud computing model depends heavily on on-demand service availability through highly automated, intuitive, self-service registration and administration processes designed for mass public usage. Traditionally HPC systems are designed for repeated use by a small number of power users and sophisticated self service capabilities may not be such a strong need. Some resource providers have bureaucratic registration and administration processes, where the end-users application for access or resources has to be reviewed and approved through manual means. We’ve noticed that such processes can be rather slow and present a roadblock. Some of the HPC Experiment project teams ran into severe registration and administration related delays resulting from the lack of self-service based systems.
Automated rules based instant decision making capabilities should be utilized. To speed up the resource allocation process, we recommend resource providers to consider setting up queues specific to the end-user needs and assign the queue during the registration process. From this point on relatively simple mechanisms such as tokens, coupon codes, and credit card based automated billing mechanisms can be used to track the usage per end-user without necessarily having to approve each resource allocation request manually.
Incompatible software licensing models
It is clear that without the participation of the software vendors in developing compatible software licensing models, the adoption of HPC in the Cloud will be significantly slower. We looked closely at how much friction was caused by software licensing models of the providers which were willing to participate in the Experiment. Although many software providers are diligently working on making on-demand licenses available, the landscape is difficult to navigate. It’s hard to predict which software provider has on-demand licensing models fully developed and it’s even more difficult to know how to work with their significantly different models. Many of our teams ran into license management challenges and required extra help to move forward.
Many software providers are already working on compatible licensing models. We recommend end-users to contact their software providers early on and include them in their HPC initiatives. Alternatively, the HPC Experiment is a great way to work hand-in-hand with a resource provider on testing an HPC in the Cloud compatible licensing model. There are already existing successful on-demand licensing models from some forward looking ISVs (also part of our experiment) which we believe the others can benefit from.
High expectations, disappointing results
As consumers, we want it all and we want it now! High expectations are great as long as they are reasonable, but they quickly become a roadblock if they are unattainable. A couple of HPC Experiment teams defined projects where the goals were simply set too high and the team was unable to get satisfactory results.
Set goals that are incrementally better than the current capabilities of your organization, technology infrastructure and processes. HPC Experiment is a great way to experiment how on-demand remote capacity can be used to supplement your current infrastructure. On the other hand improving the meshing of your models can be another project. Similarly jumping from no experience with remote HPC to running production jobs on remote HPC is not advised; consider starting with limited scope test jobs.
Reliability and availability of resource providers
Resource providers get peaks and valleys in load. They have to deal with technology upgrade cycles, new software roll-outs, and high-profile projects that consume just about every CPU core they have. Some of our teams had to wait for weeks before capacity could be allocated by the resource provider. Their projects were severely delayed, several weeks on average. As the number of projects assigned to a single resource provider increased the teams ran into more schedule conflicts.
Selecting a reliable resource provider with adequate available resources is paramount. However, this is easier said than done, since it is a moving target. We recommend seeking information on the reliability and availability of each vendor before partnering with them; plus following a multi-vendor strategy.
The need for a professional HPC cloud provider
In Round 1, one end-user worked with an HPC center. After Round 1, this end-user requested to work with a professional HPC Cloud services provider, for several reasons. A primary interest to participate in the Experiment is to be able to continue using the resource provider after the experiments are done, through a commercial arrangement. We are not sure how this works with our current HPC center team partner. We assume that it is not possible in the near term. The team didn't have the permission to install new packages or (re)configure existing ones at the HPC Center, because this basically requires root/administrative access.
If you are interested in participating in Round 3 of the experiment—starting April 1—find more details on our website: http://www.compbioexperiment.com/.