NVMe: Powering Genomics And Life Sciences

November 7, 2018

Contributed Commentary By Ron Herrmann

November 7, 2018 | Most organizations require high speed computing capabilities, whether it’s to run critical applications, gain real time analysis, support web-based services or remain competitive. NVMe—a storage protocol created to accelerate the transfer of data between enterprise and client systems and SSDs, and remove any bottlenecks, providing extremely high speed and low latency—delivers this capability to businesses by tapping into the power and speed of flash technology and increasing data processing speeds by up to 100 times when compared to spinning disk media.

NVMe has opened up new ways of working for many parts of the economy. For the financial and banking sector this means critical analysis of emerging market and fund trends. For retail, it’s opened up the ability to crunch big data sets in order to understand and predict consumer shopping behavior. Perhaps most interestingly, it is also the catalyst for some revolutionary scientific and medical advances. These include personalized medicine and advanced cancer treatments, which are becoming possible thanks to the speed that NVMe enables researchers to explore and analyze the human genome.

NVMe is the latest in a long line of technology advances that have improved our understanding of the human genome ever since the first research commenced in the early ‘90s. Back then, when the science was in its infancy, sequencing and analyzing just one person’s genome took over a decade and cost several hundred million dollars.

Breakthroughs and Bottlenecks

When you consider that an initial analysis of one person’s genome produces approximately 300GB-1TB of data, then you can begin to see why it’s important for researchers to be able process the data quickly and in a cost-effective way. NVMe, however, really comes into play for genomic research during secondary genomic analysis. A single round of secondary analysis on just one person’s genome can require upwards of 500TB storage capacity. Without NVMe, this can take five days or more to complete, especially if the data is stored on spinning disk or even on all-flash arrays. By comparison, NVMe makes it possible to complete a single secondary genome analysis in just one day.

However, to make the medical breakthroughs that genome research and life sciences companies are working towards, they need to process, compare and analyze the genomes of between 1000 - 5000 people per study to find ‘variants.’ That means many genome and life science research organizations are looking for a storage architecture with a minimum of 2PB of NVMe SSD. It’s an important step to take, as analysis of these ‘variants’ can help to lead to some of the most prevalent and high-profile research on diseases in the western world.

Pushing Genomics to the Next Level

The amount of data storage required to push genomic research to the next level is a major challenge and is why the sector has started to experience network bottlenecks and latency issues. In the past, genomic processing was done through HDD arrays, which created a huge bottleneck.

Fortunately, the cost of SSDs has fallen to a point where a 0.5PB SSD footprint, in order to accommodate the processing of 500-1500 concurrent genomes, is now an affordable and viable solution for many businesses and scaling up that architecture for larger life sciences corporates is a feasible, cost-effective solution.

With the correct supporting storage in place, which can deliver maximum performance through NVMe, genomic research can continue to grow and expand. Technology has advanced to be able to accelerate genomics processing with speeds up to 100x times faster. In addition, NVMe can maximize the processing power of both the genomics applications as well as the file system and storage processing.

NVMe technology when applied to genome research has life-altering outcomes. It’s the catalyst for enabling DNA sequencing at a much lower price point, and the only architecture on the market that can currently enable fast processing of the large data sets needed to further explore and understand the human genome. The technology should also have a democratizing effect on genome sequencing research, making it possible for nimble start-ups to work in specialist research areas where IT and technology costs would have previously made it prohibitive - driving the availability of the research techniques to anyone, anywhere in the world.

Ron Herrmann is a veteran in the storage and networking industry. Prior to joining E8 Storage, Herrmann worked at IBM (after the Diligent Technologies acquisition). He also held key positions at several start-ups including Prominet and Agile (both acquired by Lucent) and Cereva (acquired by EMC). Herrmann also served as the Systems Group Leader for the Michigan Supreme Court and worked at Chipcom (acquired by 3Com) and Timeplex (acquired by Unisys). He is currently Director of Sales Engineering and can be reached at rherrmannE8@gmail.com.