All of Us Update: Nearing Half a Million Participants, Whole Genomes Available This Winter

September 22, 2021

By Allison Proffitt

September 22, 2021 | Josh Denny, CEO of the All of Us research program, gave an update on the All of Us program during the Bio-IT World Conference & Expo this week held in Boston and online.

The All of Us program is the National Institutes of Health’s effort at building one of the most diverse health information databases in history. Genome-wide association studies has thus far had a surprising lack of diversity, Denny said, citing data from several papers giving overviews of GWAS makeup. The authors found that public GWAS studies represent about 78% European ancestry participants. In fact, only 4% of GWAS data are from non-European and non-Asian ancestry donors—a group that represents 1/3 of the US population.

Such nondiverse genetic data impacts both polygenic risk scores and interpretation and weightings of genetic variants. All of Us hopes to address that lack of diversity by nurturing long-term relationships with diverse participants across the country, catalyzing a robust ecosystem of data for researchers and funders, and delivering one of the largest, richest biomedical datasets.

All of Us began enrolling participants in May 2018, and Denny reported that now the All of Us program counts 410,000 participants including biosamples from about 312,000 participants, and more than 253,000 electronic health records donated. In addition to sharing biosamples and their electronic health records, participants also take surveys (including frequent COVID-19 health and quality of life surveys), give physical measurements, and sync their own wearable devices to share data. Over 80% of All of Us participants are from underrepresented groups in biomedical research.

Participation in All of Us is meant to be ongoing and collaborative, Denny explained. The program returns findings to participants in the form of genetic ancestry reports and four trait reports giving genetic predictions for bitter taste perception, cilantro preference, earwax type and lactose intolerance. All of Us follows up with participants, Denny said, to confirm whether participants found that the genetic predictions matched their experiences.

These findings have been gathered through array-based genotyping, Denny said, but the program also plans to return genetic health data built on whole genome sequencing. Denny set a target date of 2022 for those reports, and said the All of Us program is sequencing about 5,000 whole genomes each week (up from 4,500 he reported at an event in March).    

Denny said the program is already working with regulatory agencies and genetic counselors to develop and “end-to-end genetics experience” including health reports that will be returned to participants. He expects the first available health reports to include hereditary disease risk results based on the ACMG59 list of actionable variants and pharmacogenomics findings based on the CPIC guidelines of gene-drug interactions.

All hereditary disease risk will be returned through a genetic counselor, Denny said, and will give participants materials for themselves as well as materials to pass on to their physician. All of Us will also connect individuals with clinical testing options to confirm the findings. Two to three percent of people carry a hereditary disease risk gene, Denny said.

Value for Researchers

All of Us does not only want to return value to participants, though. The program also seeks to be a preferred research platform for researchers as well. All of Us has taken a cloud-centric approach designed to encourage shared research, decrease cost of storage, and keep security centralized.

The raw data are ingested and harmonized to one data model, and then fed into different tiers for different uses. The public tier of data is available now with no login via the Data Browser and includes summary statistics and aggregate counts from EHRs and the participant surveys.

The registered tier is also available now through the All of Us Researcher Workbench, which launched in late May 2020. Researchers gain access through a passport model. The Researchers Workbench is currently in beta, and access is limited to US nonprofit researchers with eRA commons IDs. Within a few minutes, researchers can create a workspace and start a project, Denny said.

This winter Denny expects the third tier of data access to go live: a control tier including more than 90,000 whole genome sequences and 130,000 arrays. This tier will also include COVID-19 diagnoses and surveys as well as expanded Fitbit data and more detailed demographic data. There will be no obvious personally identifiable information.

The final tier, individual biospecimen and participant data, will be available at some point in the future.

Currently, Denny reported, the Researcher Workbench has more than 1,000 registered users and 240 registered institutions. Nearly a quarter of those institutions are Historically Black Colleges and Universities, Hispanic-serving institutions, or non-profits.

Denny highlighted that researchers are already using the data; there have been more than 15 publications so far citing All of Us data. He mentioned the COVID-19 Serology Study that found early COVID-19 serology results from January to March 2020 in five states (DOI: 10.1093/cid/ciab519) as well as a predictive analysis study for glaucoma that used All of Us data to train data and refine the algorithm (DOI: 10.1016/j.ajo.2021.01.008).