Toward Ubiquitous Computational Life Sciences

By Wolfgang Gentzsch

February 23, 2016 | In April 1956, a refitted oil tanker carried fifty-eight shipping containers from Newark to Houston and changed global trade.

Today’s emerging software application containers and their growing importance for computational science and engineering are poised to launch a similar revolution. We have never been so close to ubiquitous computational life science (CLS) for every researcher and engineer, accessible everywhere.

High-performance computing (HPC) and life sciences software tools continue to progress, but the next big step toward ubiquitous CLS will come from novel software container technology which will dramatically facilitate software packaging and porting, ease of access and use, and drastically simplify software maintenance and support. Finally, CLS will be in the hands of every engineer.

Ubiquitous Computing and Xerox PARC’s Mark Weiser

In 1998, Mark Weiser from Xerox PARC said:

“Ubiquitous computing names the third wave in computing, just now beginning. First were mainframes, each shared by lots of people. Now we are in the personal computing era, person and machine staring uneasily at each other across the desktop. Next comes ubiquitous computing, or the age of calm technology, when technology recedes into the background of our lives.”

Weiser looks at ‘ubiquitous computing’ with the eyes of an engineer and scientist. According to Weiser these users shouldn’t care about the ‘engine’ under the hood; all they care is about ‘driving’ safely, reliably, easily; getting in the car, starting the engine, and pulling out into traffic.

In a simplified high-level view CLS technology is split into two parts: software and hardware. Both are immensely complex in themselves, and their mutual interaction is highly sophisticated. For the computing part to be ubiquitous from an end user’s point of view, Weiser suggests making it disappear into the background of our (business) lives.

Indeed, in the last decade, we were able to make big steps toward reaching this goal: we removed some complexity from accessing and using CLS by developing user-friendly interfaces, with a trend toward what some people call ‘appification’. We abstracted the application layer more and more from the complex physical architecture underneath, through server virtualization with Virtual Machines (VMs). This achievement came with great benefits for both the IT folks and the end users. Servers provision faster; security is enhanced; hardware vendor lock-in is reduced; uptime increases; disaster recovery is improved; the life of older applications is extended; and moving to the cloud gets easier. With server virtualization we came closer to ubiquitous computing.

Ubiquitous Computing in the Life Sciences—with Software Containers

But server virtualization did not gain a foothold in High Performance Computing (HPC), especially for highly parallel applications requiring low latency and high-bandwidth inter-process communication. And multi-tenant servers, with VMs competing among each other for hardware resources such as I/O, memory, and network, are often slowing down compute-intensive parallel application performance.

Because VM’s failed to show presence in high performance life sciences computing, the challenges of software distribution, administration, and maintenance kept CLS systems locked up in closets, available to only a relatively small community.

That changed in 2013 with the introduction of Docker Linux Containers. The key practical difference between Docker and VMs is that Docker is a Linux-based system that makes use of a userspace interface for the Linux kernel containment features. Rather than being a self-contained system in its own right, a Docker container shares the Linux kernel with the operating system running the host machine. It also shares the kernel with other containers that are running on the host machine. That makes Docker containers extremely lightweight and well suited for computational life sciences.

Still, it took us at UberCloud about one year to develop the macro-service production-ready container for HPC. We based it on micro-service Docker container technology, enhanced with many useful HPC features, and tested it with a dozen applications and life sciences workflows, on about a dozen different single- and multi-node HPC cloud resources. These high-performance interactive software containers, whether they are on-premises or on public or private clouds, bring a number of core benefits to the otherwise traditional HPC environments:

Packageability: Bundle applications together with libraries and configuration files:

A container image bundles the needed libraries and tools as well as the application code and the necessary configuration for these components to work together seamlessly. There is no need to install software or tools on the host compute environment, since the ready-to-run container image has all the required components. The challenges regarding library dependencies, version conflicts, and configuration challenges disappear, as do the huge replication and duplication efforts in our community when it comes to deploying life sciences software.

Portability: Build container images once, deploy them rapidly in various infrastructures:

Having a single container image makes it easy for the workload to be rapidly deployed and moved from host to host, between development and production environments, and to other computing facilities. The container allows the end user to select the appropriate environment such as public or private cloud, or an on-premises server. There is no need to install new components or perform setup steps when using another host.

Accessibility: Bundle tools such as SSH into the container for easy access:

The container is set up to provide easy access via tools such as VNC for remote desktop sharing. In addition, containers running on computing nodes enable both end users and administrators to have a consistent implementation regardless of the underlying compute environment.

Usability: Provide familiar user interfaces and user tools with the application:

The container has only the required components to run the application. By eliminating other tools and middleware, the work environment is simplified and the usability is improved. The ability to provide a full featured desktop increases usability (especially for pre- and post-processing steps) and reduces training needs. Further, the CLS containers can be used together with a resource manager such as Slurm or Grid Engine, increasing the usability even further by eliminating many administration tasks.

Test Cases

In addition, the lightweight nature of software containers suggests low performance overhead. Our own tests with real applications on several multi-host multi-container systems demonstrate that there is no significant overhead for running high performance workloads as an application software container.

artery tree We’ve tested the concept, using Docker containers for molecular dynamics, open source clinical cancer genomics pipelines in the cloud, and more. In one case, the Department of Electrical & Computer Engineering at Carnegie Mellon University was able to model vascular hemodynamics using UberCloud’s OpenFOAM/ParaView software container on AWS. Modeling is useful for identifying patient-specific biomechanical traits of vascular disease, and begins with 3D segmentation and reconstruction of a smooth surface model of the vascular region of interest from patient-specific medical image volumes obtained as a series of tomographic slices using either magnetic resonance imaging (MRI) or computed tomography (CT) imaging. The resulting 3D surface model is used to generate a discretized volume mesh for the purpose of analysis of flow using computational fluid dynamics solution techniques.

In this study, blood flow inside a patient-specific right coronary artery tree was studied under unsteady flow conditions after segmentation from tomographic MRI slice images obtained across the whole human body of a male volunteer, at 2mm intervals. The figure visualized the hemodynamics at the instantaneous peak-flow instant. (More details about this and other case studies can be found in UberCloud Compendium.)

During the past two years, UberCloud has successfully built software application containers for GROMACS, ANSYS, LS-Dyna, CD-adapco, COMSOL, NICE, Numeca FINE/Marine and FINE/Turbo, OpenFOAM, PSPP, Red Cedar’s HEEDS, Scilab, and more. These application containers are now running on cloud resources from Advania, Amazon AWS, CPU 24/7, Microsoft Azure, Nephoscale, OzenCloud, and others.

Together with recent advances and trends in application software and in high-performance hardware technologies, the advent of lightweight, pervasive, packageable, portable, scalable, interactive, easy to access and use software containers running seamlessly on workstations, servers, and any cloud, is bringing us ever closer to what Intel calls the democratization of high-performance technical computing. We have arrived at the age of Ubiquitous Computing where computing “technology recedes into the background of our lives.”

More information about portable application software containers can be found HERE. Container case studies sponsored by Intel and Bio-IT World with life sciences applications in the cloud are available for Download. And, quite useful for Software Providers is the site Building Your Own ‘Software as a Service’ Business in the Cloud.

Wolfgang Gentzsch is president and co-founder of the UberCloud Online Marketplace for engineers and scientists to discover, try, and buy computing on demand, in the cloud. Recently he was co-chairman of the International ISC Cloud & Big Data Conference series, Advisor to the EU projects EUDAT and DEISA, directed the German D-Grid Initiative, and was a member of the Board of Directors of the Open Grid Forum and of the US President's Council of Advisors for Science and Technology, PCAST.