VA Data And DOE Supercomputers Converge

January 24, 2018

By Paul Nicolaus

January 24, 2018 | To envision the size of the largest supercomputing systems overseen by the Department of Energy (DOE), picture a basketball court, explained Morgan Luttrell, the DOE’s senior advisor for veterans relations, and then imagine that space filled with 40 to 50 computer drives standing as tall as players.

And these systems are loud. They hum, said Dimitri Kusnezov, chief scientist with the DOE’s National Nuclear Security Administration. "These are formidable assets."

The biggest of the bunch are Sequoia at Lawrence Livermore National Lab in California and Titan at Oak Ridge National Lab in Tennessee, which are roughly the same in size and power. They are in the process of being replaced, however, with Sierra (at Livermore) and Summit (at Oak Ridge), which is expected to be the world's fastest and most powerful supercomputer.

While their physical appearance and the sounds they emit can be described and visualized, their computing power has in many ways moved beyond human comprehension. (Titan and Sequoia are capable of processing tens of million billion calculations per second, according to Kusnezov, and Summit and Sierra will handle over 100 million billion calculations per second.)


We can understand the improvement of human health, however, and thanks to a joint initiative between DOE and the U.S. Department of Veterans Affairs (VA), these supercomputers will be used with that goal in mind. The VA brings its wealth of health data to the table, and DOE has the technological expertise in analytics, artificial intelligence, and high-performance computing.

Based in the DOE’s National Laboratory system, the centerpiece of the VA-DOE partnership is the Million Veteran Program—a national research program funded by the VA's Office of Research and Development. To enable high quality research and large-scale computing using the MVP and VA data, the VA-DOE partnership established the MVP Computational Health Analytics for Medical Precision to Improve Outcomes (MVP-CHAMPION) initiative.

The goal is to collect genetic, military exposure, lifestyle, and health information about veterans that can be used to understand how genes affect health and illness. To date, over 620,000 have provided DNA samples, completed surveys about their health, lifestyle, and military experiences, and granted access to their electronic health records.

Genotypes are stored on all samples to provide basic, surface-level genetic information, and expansions have taken place as the budget allows. A subset of about 30,000 exomes go deeper into the DNA, for example, and roughly 45,000 whole genomes will be sequenced over the next couple of years to provide the deepest genetic information possible on this second subset.

With the eventual assistance of one million volunteers in all, MVP is in the process of building a massive medical database for research on diseases like diabetes and cancer as well as military-related illnesses such as post-traumatic stress disorder.

The VA-DOE partnership will involve approved research studies that look at data from the electronic health records of roughly 24 million veterans who have used VA care over the past two decades as well as data from the Defense Department, the Centers for Medicare and Medicaid Services, and the Centers for Disease Control's National Death Index.

"The Veterans Administration has a remarkable richness of data that goes back decades in various formats from rich genomic information to health records and things that have really not been analyzed at the scale that one could imagine doing with the DOE's supercomputers," said Kusnezov.

Planned Studies

The initial focus of the partnership centers on three early demonstration projects soon to be launched. One aims to build algorithms to generate personalized risk scores for suicide to help combat the loss of over 20 veterans per day, noted MVP Program Director Sumitra Muralidhar. The suicide rate among veterans is far higher than that of the general population.


These risk scores could be used by VA clinicians and researchers to help predict which patients are at the highest risk for suicide and to evaluate prevention strategies. The researchers will work with the VA's Office of Suicide Prevention to improve algorithms already in use and potentially add genomic data into those calculations to see if there is an added ability to more accurately predict and intervene.

Another project will focus on differentiating between lethal and nonlethal forms of prostate cancer. Because the VA is 92 percent male, she explained, this is a significant problem and one of great relevance to veterans. The hope is that a primary care physician would be able to use the analytics to determine who would need a prostatectomy and who could avoid the procedure.

A third will explore which sets of risk factors are the best predictors of heart disease. The intent is to come up with a unified predictive algorithm to determine who would end up with a specific type of cardiovascular disease, she explained, and to intervene sooner and tailor treatment based on patients' individual genetic profiles.

Eight VA funded research projects kicked off in 2017 leading up to these three larger projects, and some of the early scientific results have begun to emerge. "This year we had a splash at The American Society of Human Genetics meeting," Muralidhar said, which took place in October and included the presentation of 23 abstracts that came out of MVP.

"The good thing is we have a diverse population," she said. "About 17% of our MVP enrollees are African American, and 7% are Hispanic," which has allowed for population-specific discoveries. There's one project on chronic kidney disease, for example, that has led to new insights pertaining to the African American population.

Findings have also been presented at other meetings such as the American Heart Association and the Society of Biological Psychiatry, and the related research will be coming out in published form soon. Beyond that, seven additional projects have recently been funded and rolled into the larger initiative.

Opening the Spigot

It's hard to believe this much progress has been made to date, Muralidhar said, considering MVP all began about eight years ago as nothing more than a small group of people sitting in a room, thinking about building a program of this sort. She credits the altruism of the veterans who tend to view this as a second opportunity to serve their country.

The collaboration between VA and DOE began during the Obama administration, she explained, when the two secretaries came together and signed principles of agreement. More recently, in May, the departments announced the VA-DOE Big Data Science Initiative (BDSI) and the intent to combine the health and genomic information collected as part of MVP.

To deal with the sheer amount of data, high performance computational methodologies and resources will be needed for the analysis, which is where the computing infrastructure and expertise of the DOE comes into play.

Secure high-speed network connections are being set up between the VA data warehouse in Austin and Oak Ridge National Lab to allow the data to move between the two more easily with a goal of refreshing the electronic health record data on a nightly basis to keep the information up to date. The bigger vision is to enable remote access so that VA researchers and DOE scientists at national labs across the country can analyze the data.


"We are slowly opening the spigot to let more and more people come in," she said. The first set of researchers were all involved in consortium projects, and the VA scientists were all allowed to bring in researchers from their academic affiliates. "Now we're going to expand it to scientists from DOE, and we're looking at potential collaborations with scientists from NIH and DOD."

Even if the datasets are cut up they cannot be sent out to individual scientists for analysis on their computers. Instead, researchers are brought to the data using a central computing environment. "They are not allowed to pull any data out," Muralidhar added, noting the importance of data security and privacy. "They can only take the results out."

Scaling Up

The partnership with DOE will expand the VA MVP capabilities to allow more academic and federal researchers access to the data as the capability to compute is expanded, and a private cloud will help expand that access to even greater levels.

Part of this related work has fallen on the shoulders of Seven Bridges. In April 2016, the biomedical data analysis company announced a signed Collaborative Research and Development Agreement (CRADA) with the VA to support two key initiatives for MVP: a hybrid cloud and a genotype-phenotype graph analysis engine.

The big challenge with genomics data is that it is quite large, and with the VA it is particularly challenging because of the sheer size and scope of MVP, said Dennis Dean, scientific site lead and senior scientist at Seven Bridges. Solutions are needed to help store those large amounts of data.

Because the VA works in an internet isolated environment that physically separates secure computer networks from unsecured networks, the department wants to use its infrastructure whenever possible, he explained, and the first element of the hybrid cloud is empowering the VA to handle a range of genomic computations on its own. That introduces a second challenge, however.

Although the VA would like, from a security standpoint, to analyze all the whole genomes on their infrastructure, it is apparent that at some point the size of the data will exceed their ability to handle the computations. "Hybrid cloud is the ability to run a workflow locally at the VA but to have the capacity to extend or bring in other resources," he said.

Cloud computation is much less expensive than maintaining large computational infrastructure, he added, so at some point the ability to expand in the cloud will be helpful for keeping computational costs down.

The genotype-phenotype engine solves a completely different problem, but it is still related to the size of the data. As the program moves toward a million genomes, the analysis tools need to be able to grow along with that expanding dataset.

"The challenge with current tools is that they're linear, and so as you scale up you need proportionately the same amount of memory and resources to do the analyses," Dean explained. "What the graph genotype-phenotype engine aims to solve is to create a scalable data analysis infrastructure for the VA."

"Seven Bridges brings the ability to think about the data right out of collection all the way to analysis, and we can work both through the hardware and computational concerns and the data analysis," he said.

Mutually Beneficial

The hope is that leveraging all this health data will help identify trends that support the development of new treatments and preventive strategies. But this massive undertaking and the promise it holds doesn't come without sizable hurdles.

"Starting this connection between the VA and DOE has required us to push at the edge of existing policy," Kusnezov said. There has been a need to rethink everything from IRBs to data use agreements to methods of curating and cleaning the data. "There are so many parts."

As these two entities work through the challenges, however, both sides see big potential in the big data. "I think we're really at kind of a watershed moment in terms of being able to impact veterans issues at scale with the kind of computing that DOE has," he added.

Luttrell also believes the collaboration, in its infancy, will help get at the root problems that veterans face. As he looks further ahead, though, he envisions the ability to refine healthcare in a way that will transcend the military community and impact the American population at large.


And while healthcare problems are being solved, the DOE stands to benefit from the partnership in ways that extend beyond just the medical field. The relationship with the VA will help advance its methodologies, technologies, and next-generation supercomputing designs to help plan for the world envisioned in the decade ahead.

When she peers into the future, Muralidhar sees physicians who are able to look at the signs and symptoms of patients while considering other crucial pieces of the puzzle—lifestyle, environment, genomic, and other molecular markers—to tailor treatment in real time. "That is the future of medicine," she said. "That is where everybody is heading.”

This collaborative enhances the ability to discover, and as new findings come out of MVP and the BDSI, the VA is working to establish a pipeline that can lead right back to the clinic. Ultimately, she said, the result of these two federal agencies joining forces will be better healthcare, better science, and better government.


Paul Nicolaus is a freelance writer specializing in science, technology, and health. Learn more at