MemVerge’s Vision for Big Memory Computing
By Allison Proffitt
November 11, 2021 | The vision, Charles Fan explains, is big memory. Fan and two other co-founders launched MemVerge, an all-memory storage start-up that Fan calls quite disruptive and high risk—but also the storage architecture of the future.
Fan and I spoke at the recent Bio-IT World Conference and Expo, and he fit right in, dropping one of the top buzzwords of the event. “With digital transformation, more and more applications are data-centric and driven by data that is bigger and bigger as well as faster and faster,” he said. “It requires a compute infrastructure that can deal with both volume and velocity at the same time.”
It’s a problem that others have highlighted as well. Most notably, Chris Dagdigian has flagged the challenge of dealing with data that are both big and being generated very rapidly in his past few Trends from the Trenches talks.
Fan summarized the same point Dagdigian has made. “When you only have data that’s bigger, no problem. There are plenty of storage solutions to deal with bigger data. When your data just needs to be faster, also not too much of a problem. If that’s the only requirement, there are memory-centric systems that are very fast that can deal with it. But when it’s bigger and faster at the same time,” Fan said, “that’s where we think it’s an open and unsolved problem.”
That’s the problem MemVerge hopes to solve.
Not-Storage Storage System
Typically, memory is faster while storage is slower but bigger; memory is volatile while storage persists. Memory Machine, MemVerge’s first product, delivers the capacity and data services you’d typically expect of storage, but the software is based in-memory.
“Our solution is creating a much bigger memory solution, so you combine the best of both worlds,” Fan said. The speed gains that Memory Machine offers come from avoiding the round trip of data from memory to storage and back when needed—storage IO, Fan explained. “By converging the two buckets of data into one—just in memory without the storage—we eliminate that storage IO.”
Memory Machine is built on Intel Optane Persistent Memory (PMem) hardware well as DRAM, dynamic random-access memory. “The overall memory capacity is multiplied. Plus, through our software, we deliver some of the persistence functionalities,” Fan added. “The combined big memory is both bigger and cheaper and faster and it is persistent. It really combines the benefits of memory and storage while eliminating the storage IO.”
MemVerge doesn’t sell the hardware, instead offering a compatibility list of hardware and recommended configurations. “It’s pretty broad,” Fan said. “It’s actually any servers you have, and any of the major three cloud service providers you use.”
Reaching the Life Sciences
Fan comes from a deep storage background. His first startup—storage virtualization software—was bought by EMC. After his tenure there, he moved to VMware and created vSAN software storage, he said. Now CEO of MemVerge, Fan started the company with Shuki Bruck, Chairman of MemVerge and the Gordon and Betty Moore professor of Computation and Neural System and Electrical Engineering at the California Institute of Technology, and Yue Li, Chief Technology Officer of MemVerge and previously a senior post-doctoral scholar in memory systems at Caltech.
Four years ago, when the three envisioned all-memory storage, the idea was too early for bigger companies, Fan said. The company has since raised $43.5 million according to Crunchbase through four rounds of funding with investments from Intel Capital, NetApp, Cisco, SK Hynix, and more.
The Memory Machine product first shipped about a year ago, and Fan lists several life sciences organizations among early users including Analytical Biosciences, Sention, TGEN, and Penn State.
He highlights secondary and tertiary genomic analysis and single-cell analytics as ideal use cases within the life sciences but emphasized that he’s still learning and discovering more potential use cases.
And the company is also pursuing opportunities in the financial sector, cloud service providers, and other verticals. “Anywhere where either the memory size is a bottleneck or the IO speed is a bottleneck, our solution could be a potential fit,” he said.
In single-cell analytics, for example, he says Memory Machine can reduce the time to complete a job by more than 60%. “We can also enable a much faster and easier experience when people need to roll back and do ‘what if’ analysis,” he said. “These jobs are typically multi-stage, and we can allow them to roll back to previous stages very quickly and very easily, change some parameters and run the test again.”
Building Big Memory Computing for Single-Cell Analytics
The single-cell analytics example that Fan gives comes from Analytical Biosciences, an early user. The company was recently recognized by IDC for innovation in cloud-centric computing to enable digital infrastructure resiliency for its configuration of DRAM, Intel Optane Persistent Memory, and MemVerge Memory Machine software in a Big Memory Computing environment. The company reports that the environment loaded data up to 800x faster, eliminated 97% of IO-to-storage, and slashed overall pipeline time by 61%.
Analytical Biosciences was founded in 2018 and is primarily focused on single-cell technologies as an engine for discovery of new drug targets and therapeutics, Chris Kang, Head of Bioinformatics, explained.
The founder of Analytical Biosciences, Zemin Zhang, and Fan are good friends, Kang explained. “When they were one day talking about how to make use of the next generation computation technology developed by MemVerge, they thought, ‘Ok, we are using a lot of new technologies. Let’s see if we can get some synergistic effect.’”
Single cell sequencing has dramatically changed the magnitude of data that researchers are generating and using, Kang said. “The number of cells contained in one study has grown exponentially throughout the years. With that growth in size there comes the growth in computation stress and demands,” he said. Current studies include one million cells, but Kang predicted studies of two, three, or even 10 million cells in a few years. “The last generation computation techniques are insufficient for mounting data these days.”
And more people want to use it. Once the data are generated, different groups within Analytical Biosciences—biology and immunology, data science and machine learning, and computer science and software development—all needed access to the single-cell data.
Biology is now a data discipline, Kang said. “In last generation bioinformatics, we were still analyzing sequences; the bioinformatics was still niche. But now since we’re dealing with a lot of numerical data and we’re applying machine learning and neural networks and lots of the new fancy technologies from data science domains into bioinformatics. We’re now dealing with ultra-big datasets and we’re analyzing them using very fundamental data science techniques, for example principal component analysis, and dimensionality reduction.”
These are the types of workflows at which Kang says Memory Machine excels. For example, he said, long workloads can be easily paused via a snapshot and relaunched without lost time, offering a great deal of flexibility and cost-savings without requiring unlimited computing resources.
“And down the line,” he added, “when there are new use cases, it may be more powerful and maybe can change the way we analyze data and utilize the committed resources.”