To Advance Polygenic Risk Scores, Large Studies Need Better Genotyping Tools

Contributed Commentary by Keith Brown

December 5, 2019 | A decade ago, the concept of a polygenic risk score had not even been cited in a genomics publication. Today, it’s almost impossible to pick up a human genetics journal or attend a conference without hearing about these new biomarkers. Polygenic risk scores—the idea of bundling many variants with modest genetic effects and using their sum total to predict disease risk—have in some cases been shown to be as reliable as monogenic factors. No wonder, then, that enthusiasm is high for applying these scores to predict susceptibility to common, complex diseases for which we do not yet have useful biomarkers.

But developing polygenic risk scores is difficult—much more difficult than identifying and validating single-variant biomarkers. The study size and marker density needed to discover variants for inclusion in these multi-component scores are significantly larger than what’s needed to discover single variants with large genetic effects.

Scaling up to the required study sizes will require significant reductions in genotyping cost and other improvements to genomic analysis technologies. Even today’s most comprehensive microarrays, for example, do not deliver the resolution required to capture all of the genetic diversity across all human populations. Scientists might need to interrogate tens of millions of variants from millions of samples in each population before finding the correct mix that will accurately predict disease risk. Whole-genome sequencing, on the other hand, generates the needed volume of data but at a prohibitive cost for the number of people who must be included in these studies.

To pave the way for development of effective polygenic risk scores for a broad range of diseases, the community needs a robust, sequencing-based approach for high-throughput genotyping. Such a method will enable lower costs while generating an increased breadth of data, and it will remove the research biases associated with fixed-content microarrays.

Success Stories

Some of the best-validated polygenic risk scores to date have come from large-scale, longitudinal studies. Analyses of UK Biobank data, for instance, led to the calculation of polygenic risk scores for coronary artery disease (CAD), inflammatory bowel disease, type 2 diabetes, and other conditions. In a 2018 Nature Genetics article reporting these findings, scientists wrote: “For CAD, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care.” (DOI: 10.1038/s41588-018-0183-z)

Fairly early in the era of genome-wide association studies, we realized that these projects were most effective at finding highly penetrant variants with large genetic effect. This has been very important for elucidating the cause of some diseases, but it has been clear that common diseases are often too complex to be solved with a simple association to an individual variant. Polygenic risk scores offer an unprecedented chance to make inroads into these common diseases by allowing us to take advantage of many variants with more modest effects.

Study Size

These risk scores will be most useful when there is enough information to accurately capture genetic variation within each population. Choosing markers in one population, for instance, may not give the same high-resolution information about someone from another population (DOI: 10.1038/s41588-019-0512-x).

People of African descent, for example, have variation that reflects the greater number of generations that have occurred in this population compared to groups that branched out more recently. This means there have been more recombination events, or more opportunities to reshuffle genomic information, resulting in smaller haplotype blocks that require much higher resolution to find. For example, a genetic variant that reliably represents a large block in a European genome might represent a much smaller block in an African genome. Any conclusions drawn from that variant about the surrounding genomic region would likely be accurate for the European but possibly inaccurate for the African.

Because of these differences across populations, studies aimed at establishing new polygenic risk scores can only get enough statistical power by including many, many individuals. Fortunately, we do not need to rediscover the many variants that have already been identified by efforts such as the 1,000 Genomes Project or the HapMap Project—we simply need to build on their foundation. We do, however, have to improve on large-scale studies that were built on microarrays, a hybridization-based technology that can have difficulty calling some known variants and cannot detect novel variants.

To help realize the promise of polygenic risk scores, the human genetics research community will need cost-effective tools that enable high-throughput genotyping. Low-pass sequencing coverage of tens of millions of variants per person offers the necessary resolution at a feasible cost today, and promises users the opportunity to gain from continuing technical innovation. By improving the workflow upstream and downstream of modern sequencers, it should be possible to facilitate population-scale studies and generate the information needed to establish clinically actionable risk scores for many common, complex diseases.

Keith Brown is CEO and Founder of iGenomX, a company spun out of The Scripps Research Institute to commercialize next-generation genotyping technologies. He can be reached at kbrown@igenomx.com.