Data Privacy Is Key to Enabling the Medical Community to Leverage Artificial Intelligence to Its Full Potential

Contributed Commentary by Mona G. Flores, MD

June 24, 2021 | If there’s anything the global pandemic has taught healthcare providers, it is the importance of timely and accurate data analysis and being ready to act on it. Yet these same organizations must move within the bounds of patient rights regulations, both existing and emerging, making it harder to access the data needed for building relevant artificial intelligence (AI) models.

One way to get around this constraint is de-identify the data before curating it into one centralized location where it can be used for AI model training.

An alternative option would be to keep the data where it originated and learn from this data in a distributed fashion without the need for de-identification. New companies are being created to do this, such as US startup Rhino Health. It recently raised $5 million (US) to connect hospitals with large databases from diverse patient populations to train and validate AI models using Federated Learning while ensuring privacy.

Other companies are following suit. This is hardly surprising considering that the global market for big data analytics in health care was valued at $16.87 billion in 2017 and is projected to reach $67.82 billion by 2025, according to a report from Allied Market Research.

Federated Learning Entering the Mainstream

AI already has led to disruptive innovations in radiology, pathology, genomics, and other fields. To expand upon these innovations and meet the challenge of providing robust AI models while ensuring patient privacy, more healthcare organizations are turning to federated learning.

With Federated Learning, Institutions hide their data and seek the knowledge. Federated Learning brings the AI model to the local data, trains the model in a distributed fashion, and aggregates all the learnings along the way. In this way, no data is exchanged whatsoever. The only exchange occurring is model gradients.

Federated Learning comes in many flavors. In the client-server model employed by Clara FL today, the server aggregates the model gradients it receives from all of the participating local training sites (Client-sites) after each iteration of training. The aggregation methodology can vary from a simple weighted average to more complex methods chosen by the administrator of the FL training.

The end result is a more generalizable AI model trained on all the data from each one of the participating institutions while maintaining data privacy and sovereignty.

Early Federated Learning Work Shows Promise

New York-based Mount Sinai Health Systems recently used federated learning to analyze electronic health records to better predict how COVID-19 patients will progress using the AI model and data from five separate hospitals. The federated learning process allowed the model to learn from multiple sources without exposing patient data.

The Federated model outperformed local models built using data from each hospital separately and it showed better predictive capabilities.

In a larger collaboration among NVIDIA and 20 hospitals, including Mass General Brigham, National Institutes of Health in Bethesda, and others in Asia and Europe, the work focused on creating a triage model for COVID-19. The FL model predicted on initial presentation if a patient with symptoms suspicious for COVID-19 patient will end up needing supplemental oxygen within a certain time window.

Considerations and Coordination

While Federated learning addresses the issue of data privacy and data access, it is not without its challenges. Coordination between the client sites needs to happen to ensure that the data used for training is cohesive in terms of format, pre- processing steps, labels, and other factors that can affect training. Data that is not identically distributed at the various client sites can also pose problems for training, and it is an area of active research. And there is also the question of how the US Food and Drug Administration, European Union, and other regulatory bodies around the world will certify models trained using Federated Learning. Will they require some way of examining the data that went into training to be able to reproduce the results of Federated Learning, or will they certify a model based on its performance on external data sets?

In January, the U.S. Food and Drug Administration updated its action plan for AI and machine learning in software as a medical device, underscoring the importance of inclusivity across dimensions like sex, gender, age, race, and ethnicity when compiling datasets for training and testing. The European Union also includes a “right to explanation” from AI systems in GDPR.

It remains to be seen how they will rule on Federated Learning.

AI in the Medical Mainstream

As Federated Learning approaches enter the mainstream, hospital groups are banking on Artificial Intelligence to improve patient care, improve the patient experience, increase access to care, and lower healthcare costs. But AI needs data, and data is money. Those who own these AI models can license them around the world or can share in commercial rollouts. Healthcare organizations are sitting on a gold mine of data. Leveraging this data securely for AI applications is a golden goose, and those organizations that learn to do this will emerge the victors.

Dr. Mona Flores is NVIDIA’s Global Head of Medical AI. She brings a unique perspective with her varied experience in clinical medicine, medical applications, and business. She is a board certified cardiac surgeon and the previous Chief Medical Officer of a digital health company. She holds an MBA in Management Information Systems and has worked on Wall Street. Her ultimate goal is the betterment of medicine through AI. She can be reached at mflores@nvidia.com.