The UberCloud Experiment - Exploring Life Sciences in the Cloud

June 25, 2014

By Wolfgang Gentzsch and Burak Yenier 
Editor’s Note : The UberCloud Compendium with Life Sciences Case Studies is an invaluable resource for engineers, scientists, managers and executives who believe in the strategic importance of technical computing as a service in the cloud. The Compendium is a collection of selected case studies from the participants of the UberCloud Experiment. You will benefit from the candid descriptions of problems encountered, problems solved, and lessons learned. This second UberCloud Compendium is available for free download.   
June 25, 2014 | Guest Commentary | Cost savings, shorter time to market, better quality, less product failures: the benefits that engineers and scientists can expect from using technical computing in their research, design, and development processes can be huge. Yet, still relatively few scientist and manufacturers use servers when designing and developing their products on computers. The vast majority still perform virtual prototyping or large-scale data modeling on workstations or laptops. Not surprising, many of them face problems due to the lack of performance of their machines. More accurate geometry or physics, for instance, require much more memory than a desktop can be fitted with. 
There are two realistic options today to acquire additional computing: buy a server or use a Cloud solution. Many system vendors have developed a complete set of products, solutions and services for high performance computing (HPC), but buying an HPC server for a small to medium enterprise may still be more of an investment than many wish to make.  
A Cloud solution allows engineers and scientists to keep using their workstation for daily design and development work and to "burst" the larger, more complex jobs into the Cloud when needed. Thus, users have access to quasi-infinite computing resources that offer higher-quality results. A Cloud solution helps reduce capital expenditure, offers greater business agility by dynamically scaling resources up and down as needed, and is only paid for when used. 
The UberCloud Experiment Accelerates Computing in the Cloud  
The UberCloud Experiment provides a platform for scientists and engineers to explore, learn and understand the end-to-end process of accessing and using Cloud resources, to identify concerns and resolve roadblocks. The principle is that end-users, software providers, resource providers, and computing experts collaborate in teams to jointly solve the end-user’s application in the Cloud. 
Started in July 2012, the UberCloud Experiment has attracted 1500 organizations from 72 countries. Since then the organizers were able to build 152 teams in fluid dynamics, material analysis, computational biology and other technical computing domains, and to publish many case studies reporting on the different applications, experience, and lessons learned. The UberCloud TechTalk and a virtual exhibition have been added, and the first Compendium, sponsored by Intel and published in June 2013, includes 25 case studies from digital manufacturing in the Cloud. 
The second Compendium of UberCloud case studies has just been published, sponsored by Intel and Bio-IT World, and can be downloaded for free from the UberCloud Website. 
Cloud Case Studies from the UberCloud Experiment 
For insight into the wealth of practical use cases, here are four of the 152 UberCloud Experiment teams demonstrating the wide spectrum of applications in the Cloud; teams 61, 62, 70, and 89 use open source software like Gromacs and Nimbus’ SeqWare. All of these case studies are available for free download in the Compendium. 
Team 61: Molecular dynamics of the mutant PI3Kα protein 
The goal of this project was to gain insights into the oncogenic mechanism of two commonly expressed PI3Kα mutants by studying their conformational changes with molecular dynamics (MD) simulations, in comparison with the PI3Kα wild-type (normal, non-cancerous) protein. The use of cloud computing in performing MD simulations of mutant PI3Kα with Gromacs was examined in this case study. 
The protein PI3Ka is depicted in ribbons and is placed in a water box, shown as red dots. 
Team 62: Cardiovascular medical device simulations in the Cloud 
The project investigated flow through a patient-specific blood vessel and represents a typical use case of computational fluid dynamics for cardiovascular flow.  The patient-specific geometry is extracted from CT image data obtained during a normal medical imaging exam. The geometry, which is a triangulated surface mesh, contains the inferior vena cava (IVC), the right and left iliac veins, and the right and left renal veins. 
Computational domain including the iliac veins (inflow), renal veins (inflow), and the inferior vena cava (IVC).  The pressure on the surface of the vessels is shown; a slice, colored by velocity, down the center of the IVC is also shown. 
Team 70: Next Generation Sequencing Data Analysis  
In this experiment, the team explored using cloud computing for next-generation sequencing data analysis. They used SeqWare, an open source framework, running on AWS to perform variant calling. Variant calling is the reporting of differences between input sample DNA and a reference genome, using targeted exome data generated in-house on Ion Torrent's Personal Genome Machine. 
Team 89: Performance Analysis of GROMACS Molecular Dynamics for Simulating Enzyme Substrate in the Cloud 
This team focused on evaluating the performance of double precision MPI-enabled GROMACS 4.6.3 on 25 Bullx 510 blades (each 16-core, total 400 cores) for scaling molecular dynamics simulations. The activities were organized into three tasks: install and optimize GROMACS performance on the Bull extreme factory cluster; install accessory tools to analyze simulation data; and test different multi-scale molecular systems involving enzyme substrate complexes on the cluster. The starting point of eEF (enzyme engineering framework), Polyclones’ framework for enzyme engineering, is to conduct molecular dynamics (MD) studies and calculate different parameters using the MD trajectories. 
Protein enclosed in a box of water and ions. We can study proteins in atomic detail, down to the movements of individual water molecules (red/white balls and stick model), ions (purple), and the protein itself shown in the surface model (blue and green represents macromolecular dimer form of the protein).