How The New European Health Data Space Will Drive The Development Of AI-Based Tools

September 12, 2025

Contributed Commentary by Timo Kanninen, BC Platforms 

September 12, 2025 | The European Health Data Space (EHDS) is an exciting new initiative that aims to advance digital health for all European Union (EU) Member States—the largest European healthcare initiative in history. EHDS has two distinct goals: facilitating the movement of health data across borders for healthcare purposes (EHDS1) and utilizing healthcare data for scientific research and innovation (EHDS2). Integrating half a billion EU citizens into a unified network of electronic health data, it will be the world’s largest virtual healthcare database.  

EHDS2 readiness is a mandatory requirement for all EU healthcare data controllers, with the deadline for implementation set for the end of 2028*. When fully operational, the EHDS will generate an unprecedented surge in the access and sharing of patient healthcare data across the EU, including real-world data (RWD). Globally, a 15–20% growth in RWD use is anticipated, further driven by an almost 40% predicted growth in the use of AI in healthcare in general. 

By standardizing cross-border data access procedures and defining secure server environments for processing pseudonymized patient data across the EU, the EHDS facilitates cross-border research projects where data can come from multiple countries. To control data access and use, the EHDS data access process has three main steps for researchers:  

  1. Submission of a data access application that defines the purpose of data access, as well as what data is needed from which patients, according to the GDPR data minimization principle;
  2. Approval of the data access application by the relevant governmental health data access body;
  3. Release of the pseudonymized data, once approval has been given to secure the processing environment for analysis. Note that there is no possibility of exporting the accessed data. 

At present, providing cross-border data access can take months, even years. By defining standard processes, accessing application content and EU-approved secure server specifications, many of these steps can be improved or fully automated using AI-enhanced software solutions to deliver significant time and cost savings. AI-based tools can be used for fundamental harmonization of healthcare data, which can unlock further downstream federated data analysis and AI applications, such as federated AI model training. AI will also help realize the full potential of EHDS by enabling more efficient submission and approval of EHDS data access applications, as well as facilitating collaboration among researchers to address unmet needs through breakthrough medicines, enabled by predictive analytics.  

Integrating Healthcare Institutions Into EHDS: An Overview 

At present, most university hospitals have some, often manual, processes in place for handling data access applications that apply local laws and regulations, mostly originating from their own researchers. To fulfil the EHDS requirements within the required timeframe, it will be necessary to handle and respond to an increased number of incoming cross-border data access applications. For university hospitals to achieve this requirement, while also making overall gains in their clinical research productivity, significant IT projects will need to be established to address the following: 

  1. Data integration and harmonization
  2. A data request, access application, and data release processing system
  3. An EHDS-compatible trusted research environment (TRE) for internal and external researcher projects. 

Hospitals without a data access application processing function (i.e., ethical committees) must provide information about their data assets and data, but don’t necessarily require their own TRE. 

TREs are critical for EHDS compliance. EHDS compliant TREs must, for example, support workspaces where researchers can only see data that has been defined in the data access application and data release functions, with integrated pseudonymization. To provide access for external researchers, TREs must run in EU-approved secure processing environments.  

To maintain citizens’ trust in using non-consented data for research, EHDS2 has defined additional control layers for data access, including governmental and national health data access bodies that are responsible for data access application approvals. However, these layers will invariably make the data access process more complex, slow, and expensive, especially when combined with the anticipated significant increase in the volume of cross-border data access applications. This is where AI and automation could deliver much-needed solutions. 

Unlocking The Full Potential Of EHDS With Novel AI-Based Tools 

The anticipated surge in health data access applications risks overwhelming current university hospital processes and, furthermore, risks impacting internal research. These risks can be mitigated by improving and automating data discovery, access applications and data release processes. The key to process automation and federated analysis is data harmonization.  

For hospitals, AI-enhanced software can be used in EHDS-ready TRE environments to streamline various tasks, such as: 

  • Capturing and harmonizing healthcare data through natural language processing
  • Performing preliminary assessments of data access applications prepared using a standard set of terms to identify red flags that are then raised during the review process 
  • Performing data release processes, including data pseudonymization and anonymization
  • For researchers, the ability to compile and submit complex data access application forms.  

In general, AI-based tools can alleviate potential bottlenecks that may arise with the implementation of the EHDS, keeping the research costs and time to access EU healthcare data to a minimum. 

Harmonized Patient-Level Data: The Foundation For Privacy-Preserving Data Analysis And Data Process Automation 

Data is fuel for AI; both AI model training and AI model validation require continuous data access and is a commensurate need for high-quality, individual-level data. Standard data formats, such as the OMOP Common Data Model, make data useful for research. Data harmonization further improves patient privacy by allowing researchers to perform data analysis in a federated way, without requiring direct access to individual data. 

Currently, a lot of patient data is still in unstructured text format and must be converted to a structured format for analysis. Even in cases where data is stored in a structured format in electronic health record (EHR) systems, it needs to be cleaned and formatted as well as integrated into other registries. Existing learning language models (LLMs) can provide significant improvements in data harmonization efficiency, and future AI tools will provide even more automation. 

AI-Based Tools To Facilitate Data Analysis 

Once approval for data access has been obtained, pseudonymized data are then made available for federated analysis within TREs.  

Downstream data analysis processes where AI tools can make a significant difference include patient cohort extraction using text prompts. Data analysis can also be accelerated by using a prompt-based interface, where AI automatically writes R scripts (a popular programming language for statistical computing and data visualization) or performs analysis. 

AI-based tools can also be used for training AI models without moving the original data, for example, by applying federated learning (FL). The FL concept works well within the EHDS framework, where harmonized data is analyzed from different data sources, making data analysis more efficient while improving patient privacy. 

FL is one method of decentralized machine learning (DML) that entails collaborative model training across numerous decentralized data sources while protecting data privacy and security. So far, three methods of DML have been developed: FL, split learning (SpL) and swarm learning (SwL). Primary model training is performed locally within different universities and hospitals. SwL is currently being utilized by companies such as Waymo and Tesla in their autonomous vehicles to leverage real-time, collaborative data processing, and can also be applied in a health data setting. 

In the case of FL, learning occurs on each data holder's server, with only the learning results being sent to the next data holder, and so on, until the model is sufficiently accurate. With SwL, each data holder trains the model independently and sends results to the central node to be merged into the final model. In this approach, the final model is never seen by the data holders, facilitating intellectual property rights protection. 

It truly is an exciting time, where the integration of AI and health data has the long-term potential of realizing the promise of potentially transformational healthcare approaches, such as personalized medicine. 

* The initial requirements for EHDS across all EU member states will need to be in place starting on this date in 3 years’ time (2028), with additional milestones in 2029 and 2031. 

 

Timo Kanninen is Chief Scientific Officer (CSO) and Founder of BC Platforms. He is the visionary behind BC Platforms’ data management systems and has over three decades of experience in software development, genetic epidemiology, statistical genetics, and clinical statistics. This includes work with hospital and occupational health IT systems across the UK and Europe. Previously, Timo was the founder and CEO/CTO of Statwell Oy, which provided statistical consultation and developed software for data collection and statistical analyses of personnel and occupational health questionnaires. The company merged with BC Platforms in 2002. Mr. Kanninen studied Information Technology in Production at Aalto University and Statistics at Helsinki University in Finland.  He is a co-author of more than 15 peer-reviewed scientific articles. He can be reached at timo.kanninen@bcplatforms.com