HPC In Life Sciences: Why Cloud Computing Is Now Indispensable And How Organizations Can Prepare

May 26, 2020

Contributed Commentary by Rob Lalonde

May 26, 2020 | Few disciplines have seen more rapid adoption of cloud computing than life sciences research. Datasets are easily anonymized; pipelines are generally comprised of cloud-friendly open-source tools, and data are often shared, making the cloud a convenient meeting point. Overshadowing all of these reasons, however, is an insatiable demand for computing power.

The recent COVID-19 pandemic has shone a bright light on this challenge. In February of 2020, teams from the University of Tennessee and ORNL published a paper on Repurposing Therapeutics for COVID-19. Because SARS-CoV-2 had recently been sequenced, researchers had a working model of the virus. Although complicated to execute, their idea was pretty simple. Why not run computer simulations to find compounds that inhibit the virus interacting with the human ACE2 interface thus disabling the virus’s potent capabilities? This would potentially yield a path to identifying possible therapeutics.

This is precisely what researchers did, modeling roughly 8,000 known compounds in SWEETLEAD, a database used for computer-aided drug discovery. As a result of this effort, researchers identified a shortlist of 77 promising compounds for future study.

The amount of computing power required for this kind of molecular dynamics simulation is staggering. Engineers at NVIDIA recently conducted GROMACS 2020 benchmarks—the molecular modeling tool used in the COVID-19 simulations above—illustrating the scale of the problem. Modeling just 43.1 nanoseconds of molecular interactions with a single compound can tie up a state-of-the-art server with four NVIDIA V100 GPUs for 24 hours. The ORNL Summit Supercomputer modeled ~8,000 compounds in a matter of days.

Unlike the team at University of Tennessee, most pharmaceutical companies don’t have a friend down the street with about 6,400 compute nodes and about 28,000 GPUs available for loan. Periodically renting capacity as needed in public clouds is the primary means to gain scale for most organizations.

If not done correctly, scaling to the public cloud can lead to a host of security, cost, and data consistency problems. Over the past few years, we have worked with several leading pharmaceutical firms, helping extend on-premises HPC clusters to the cloud and deploying dedicated, but elastic clouds. Here’s a list of design best practices that will help your organization make the most of public cloud while avoiding the risks.

Design for portability. Bioinformatics clusters involve lots of software with complex interdependencies. Examples include operating environments, MPI versions, workload managers, container runtimes, CUDA libraries, and more. A good practice is to build custom machine images that mirror your on-premises environment and that can be deployed in the cloud. With ready-to-run images, you will be able to get up and running that much faster in the cloud. Also, by deploying a cloud environment similar to your on-prem environment, you will be less likely to run into compatibility issues.

Leverage automation. Even with ready-to-run images, assembling clusters in the cloud can be hard. Users need to worry about details like filesystems, VPCs, security groups, DNS, VPNs, and more. There are a variety of solutions that automatically build clusters. Look for solutions that operate consistently across multiple clouds. Future research may involve collaboration or accessing datasets residing in different clouds, so you will want the flexibility to run anywhere. Cloud instances start costing money from the moment they are deployed, so fast and accurate provisioning is essential. Also, make sure your chosen automation solution can support the custom images described above.

Containers are your friend. Just as custom images minimize differences between on-prem and cloud infrastructure, containers encapsulate applications and make them portable. A container runtime (typically Docker or Singularity) should be deployed in the VM image, along with workflow and pipeline management tools. Whether containers are stored in a public or private registry, using containers will help ensure that applications run consistently on-premises and across multiple clouds. Using the same workload manager across on-premises and cloud resources can simplify user adoption for researchers and other end users.

Anticipate issues at scale. We have learned through experience that there is a big difference between a 1,000 vCPU cluster and a 100,000 vCPU cluster. Scalability is an essential consideration because getting to scale is often the main reason that organizations look to the cloud in the first place. In our experience, infrastructure services such as DNS, cloud APIs, and file systems can tend to struggle at a large scale, and different approaches may be needed.

Keep an eye on the bottom line. While cloud computing is convenient, a potential downside of having so much compute capacity on demand is cost. As clusters scale in the cloud, it is all too easy to overshoot cloud budgets dramatically. You will need tools that not only monitor cloud spending, but automate which workloads can run where. Policy-based automation and cloud-spend association are essential to ensure that enthusiastic research teams don’t accidentally exceed spending limits.

Whether firms are devising new diagnostic techniques, therapeutics, or working on vaccines, HPC is an essential tool. Genomics and molecular dynamics simulations demand vast amounts of computing power and often specialized GPUs. The level of capital investment required to maintain large, state-of-the-art HPC environments is a challenge, especially with technology evolving so quickly. Tapping cloud resources for at least some workloads seems all but inevitable for most organizations.

Robert Lalonde is Vice President and General Manager, Cloud at Univa. He brings over 25 years of executive management experience to lead Univa's accelerating growth and entry into new markets. Rob has held executive positions in multiple, successful high tech companies and startups. He possesses a unique and multi-disciplined set of skills having held positions in Sales, Marketing, Business Development, and CEO and board positions. He can be reached at rob@univa.com.