Optimizing Storage to Address Workflow Challenges in Biotech and Life Sciences Applications

Contributed Commentary By Dale Brantly

September 9, 2019 | Digital transformation is no longer just a buzzword: it is an inescapable truth for many industries, including life sciences, and data storage is at the heart of this digital revolution. The volume of life sciences data is growing at a phenomenal rate, driven by disciplines such as genomics, predictive biology, drug discovery, bioinformatics, and computational chemistry—where data volumes are increasing due to more extensive population studies, more in-depth sequencing coverage, and new instrumentation systems. Indeed, the sequencing of the first human genome in 2003 took 10 years and cost $3 billion. Today, the same task can take less than a day and costs under $1,000. This means many more sequences need to be processed, in less time, and at lower costs.

Meeting Storage Expectations And The Battle For Talent Is A Growing Challenge

Data in life sciences is expected to grow exponentially and will be beyond the capability of the traditional methods of data management. Moreover, only 20% of biopharma and life sciences organizations are digitally maturing to keep up with the changing markets. Legacy storage systems can be a bottleneck that starves high-performance processors of the data they need to run life sciences applications. This situation, in turn, limits the productivity of the life scientists whose breakthroughs are vital to the organization’s success. Life science organizations want to increase their research capabilities, yet often feel overwhelmed by the complex workflows and enormous amounts of data they encounter.

In many organizations, the workflows between scientific instrumentation and IT resources are organized haphazardly. Researchers and IT staff are often caught off guard by upgrades to existing devices or new devices with increased output. These rapid changes in scientific instrumentation are responsible for poorly designed infrastructure at organizations. Data management remains a significant challenge among organizations of all sizes.

In order to keep up these organization’s growing processing demands, storage systems need to deliver data faster than ever, at petabyte capacities, but doing so at lower and lower costs—a conflicting requirement to most storage system vendors. Not to mention the complexity of many high-performance storage systems. To date, many organizations have leaned on open source systems to provide the performance and scale to support their genomics processing, while also keeping costs in check. But many now are reporting that the battle for technical talent has made recruiting and training competent storage experts to manage these complex storage systems their biggest challenge.

Life science organizations want to embrace the possibilities of increasing their research capabilities and accelerating discoveries. However, they often feel overwhelmed by the complex workflows and enormous amounts of data they encounter that must be processed, analyzed, shared, reanalyzed, and stored. Stable, high performance storage systems will be needed moving forward to address the currently conflicting requirements of speed, economics, and simplicity now being faced by life science organizations. These not only leverage affordable industry standard commercial, off-the-shelf technology for low-acquisition costs, but are also simple to set up and operate with minimal technical staff. Optimal storage system solutions should also deftly blend the latest NVMe and flash SSD technology for speeding up small file processing required for emerging AI and ML workloads, with tried and true hard disk drives for supporting large genomic file processing at high capacities economically.

How High-Performance Data Storage Addresses Workflow Challenges

In an environment that thrives on the collection, analysis, and distribution of data, it’s essential to have a data management system in place that reliably supports streamlined workflow with immense computational power. Having the right information is necessary to properly conduct research within the life sciences sector, but it’s not enough—researchers must be able to perform sophisticated processing-intensive analyses to extract the scientific insights hidden within the data. Information accessibility must be instant and intuitive to facilitate highly productive collaboration.

Legacy storage infrastructures can create a significant bottleneck that chokes performance of compute-intensive life science-related applications and limits the productivity of your most valuable resource: your top scientists’ time. Life scientists increasingly rely on data shared by institutional peers, as well as publicly available information, to supplement their organization’s intellectual property. When competitors use the same data, the edge goes to the research groups that can make more productive and creative use of standard information. Unlike legacy storage infrastructures, modern high-performance data storage delivers fast, reliable, and manageable experiences, and maintain this performance for all users as they conduct their valuable research concurrently. Having an infrastructure that performs at a high level is key to success in this competitive landscape.

High-performance storage functions allow life science teams to enjoy consistent access to vast amounts of data, while also providing reassurances in managing a consistent repository of data that can be backed up, protected and kept secure. This can all happen at a lower price-point than, for example, cloud-based or pure flash-enabled storage solutions. To stay competitive in the long run, life science organizations need the right information, as well as the ability to perform sophisticated processing-intensive analyses to extract the scientific insights hidden within the data. They need to reduce the time and costs of discovery by investing in easy-to-manage, high-performance data storage infrastructures to limit the risk of losing significant revenue and profits and ensure that diagnoses and treatment plans are delivered on-time.

Dale Brantly is the Director of Systems Engineering at Panasas, where he leads the team responsible for architecting and delivering ActiveStor storage solutions to customers worldwide. He has more than 30 years of experience in defining engineering requirements and achieving architectural and quality excellence for complex storage solutions deployed in a large high-performance computing and enterprise environments. Prior to joining Panasas, Dale managed global solutions development at Silicon Graphics (SGI) and ran SGI’s Information Lifecycle Management (ILM) product line. Other appointments include field and customer site technical positions at Cray Research, Control Data Corporation, and ETA Systems. He can be reached at dbrantly@panasas.com.