Susan Gregurick Previews Potential Elements of NIH’s Next Strategic Plan for Data Science

June 28, 2022

By Allison Proffitt 

June 28, 2022 | In the latest episode of Bio-IT World’s Trends from the Trenches podcast, Susan Gregurick, Associate Director for Data Science and Director of the Office of Data Science Strategy (ODSS) at the National Institutes of Health (NIH), gave a peek into what’s coming in the next iteration of the NIH Strategic Plan for Data Science.  

NIH’s first Strategic Plan for Data Science was born out of the Institute’s learnings from the BD2K—Big Data to Knowledge—program and covered 2018 to 2022, a particularly challenging span.  

“What we learned from COVID, really, was we thought we were just doing great at data science. We really patted ourselves on the back; everything was going well,” Gregurick told Stan Gloss, host of the podcast and founder of BioTeam. “Then COVID just told us how much we weren’t doing, and how challenging it is to really get FAIR data into the hands of researchers in a timely fashion so they can find it, access it, and use it. We are nowhere close to actually accomplishing that goal.”  

That’s not to say the past five years have not been fruitful. Gregurick highlighted some real wins that have come out of the COVID era, in particular the Researcher Auth Service. “RAS provides a standardized method for accessing data and for logging and auditing these accesses,” Gregurick explained. “It works by linking your researcher identities with access privileges and layers of security, primarily through eRA commons.” (Rebecca Rosen, project owner for this work, recently spoke about RAS at the Bio-IT World Conference.) 

Gregurick also reported that NIH has piloted ideas about recording linkage across multiple data platforms in the hope that we could indicate, for example, participants for an All of Us clinical trial. “That was a successful pilot,” she said, “but it also told us that we are nowhere close to having in place all of the right policies for allowing that kind of data aggregation and linkage to happen.”  

Looking Ahead 

The next strategic plan will focus on 2023 to 2028, a five year span that Gregurick believes is a “reasonable amount of time for a strategy for data science, though the field moves so quickly.”  

Gregurick forecasted many areas of possible focus for the next plan that would all build on work already done, but “hopefully go far beyond.” She highlighted deeper thinking about data infrastructure and services, particularly a research data mesh, a decentralized and distributed system that would harmonize data services. Common APIs and common metadata standards may play a role in the new plan, working with digital object identifiers and indexing approaches, particularly with CRDC and BioData Catalyst data ecosystems.  

“Of course, we’re going to enhance Research Auth Services,” she added. “That’s really showing to be a very valuable activity. We want to use smart tokens to aid in communication… allowing for accountability and auditing.”  

Hybrid data compute infrastructures are also probably on the table, Gregurick said, increasing data availability.  

She predicts new language around artificial intelligence and machine learning—something not included in the prior strategic plan—particularly around data infrastructures, computation and modeling. Tools for ethical AI will be an area of focus, she suspects, and both synthetic data and model training will be priorities. “We want to think about Social Determinates of Health and including those in models and training,” she added.  

We may see additional language around FAIR Data Trust Repositories in data management policies, she said, as well as more work on open data and schema, common data elements, and tools to support structured metadata and annotation. Minimum standards and schemas for APIs could also be on the table.  

Gregurick also predicted new language around research-inspired data science, bringing real world data and social determinates of health into clinical data and enhancing FHIR for cohort discovery. She also foresees work to build a software ecosystem to develop a culture of software sustainability, including metrics, best practices, new technologies, and new tools.  

Personally, Gregurick has a passion for enhancing training in under-resourced communities, and she highlighted likely parts of the new strategic plan focused on strengthening the broader data science community. “We really do want to instantiate a multi-disciplinary and diverse data workforce that can accelerate our research, so you’ll probably hear more about opportunities for training and developing training modulars. We’d like to partner with our ICs to provide additional training resources to their platforms like AnVIL or BioData Catalyst. Enhancing data management and training of FAIR is going to be a big element we’ll tackle pretty early on.”  

The RFI for the new strategic plan will be out in the fall of 2022, Gregurick hopes, and she expects it to be finalized very early in 2023.   

Trends from the Trenches Podcast  

Bio-IT World’s Trends from the Trenches podcast delivers your insider’s look at the science, technology, and executive trends driving the life sciences through conversations with industry leaders. As host, BioTeam co-founder Stan Gloss brings years of industry experience in science, data, and technology to conversations exploring what is driving data and discovery, and what’s coming next.  

Catch up on earlier episodes on building AI/ML models for drug discovery, the evolution of supercomputing, digitization vs. digital transformation at Alnylam, AWS’s advice on digital transformation, and NCI’s Commons of Commons approach to data management. If you are enjoying the Trends from the Trenches podcast, please subscribe and rate us on iTunes, Spotify, or your preferred podcast player.