From Basic Science to Clinical Application of Polygenic Risk Scores: A Primer

“Genome-wide association studies have shown that common diseases are polygenic, ie, thousands of DNA variants contribute to risk, and most of these have very small effect. In spite of this complexity, it is now possible to estimate the degree to which an individual is at risk of common illnesses owing to their genetic makeup. The so-called polygenic risk scores (PRS) are generated from DNA taken from a saliva or blood sample with DNA variants measured using genotyping technologies that are inexpensive (< US $100 per person). From these data, PRS can be calculated for a wide range of diseases (by multiplying count of DNA variants with trait-specific, predetermined effect sizes). The DNA collection is only needed once, but the PRS can be recalculated from the genetic data if new information to improve PRS for a given disease becomes available. As with many risk factors used in health care (eg, cholesterol levels), these risk scores have limited predictive accuracy (ie, they cannot confidently predict the clinical outcome of interest with precision at the individual patient level). Even if the risk DNA variants were identified with perfect accuracy, imperfect prediction by PRS is expected for 2 key reasons. First, genetic factors are not the only risk factors for common disorders. Second, the risk scores currently only provide data about part of the genetic contribution (that associated with common DNA variants, which typically each have small effect). Moreover, in real applications, other factors contribute to the accuracy with which risk variants and their weights are estimated.

[..] While the costs of generating PRS are low, we do not consider downstream associated costs in a health system nor implications for health insurance. This is outside of our expertise, but evaluation of these topics needs to be informed by an understanding of what PRS can and cannot deliver. In particular, it is important to dispel the dogma that equates a genetic test with high levels of accuracy of current/future diagnosis.

[..] we consider the application in cardiovascular medicine because it was highlighted that PRS provide a level of predictive information that can be considered similar to the risk of specific single rare variants that are currently clinically actionable. In standard practice, the detection of such rare variants (often investigated in families that have multiple affected individuals) can lead to changes in clinical management (eg, surveillance or prophylactic measures). In this retrospective study, it was shown that those in the top 1% of cardiovascular PRS had lifetime risk of greater than 10%, which is equivalent to the risk faced by those carrying single rare genetic variants that, when detected, can inform changes in clinical management. On the flip side, approximately 90% of people in this top 1% would not go on to have heart disease, but encouraging this subgroup of the population to consider prevention strategies could be worthwhile in reducing risk. Use of risk information in this way is sometimes referred to as precision prevention genomics, where the precision focus is a population stratum.

[..] Another study using prospective, longitudinal data from the UK Biobank showed that while coronary artery disease PRS were a less accurate predictor of a subsequent coronary artery disease event than the other clinical risk predictors when they were combined, it was more accurate than any of the other individual clinical risk factors. Additionally, when PRS were added to the existing combination of clinical risk predictors, the accuracy increased. Extrapolating the UK Biobank results to 13 million UK residents aged 40 to 55 years, it is estimated that incorporating PRS into the QRISK algorithm could lead to many hundreds of thousands of people changing risk category: more than 500,000 could move from less than the threshold for statin prescription to greater than the risk threshold, while more than 200,000 people could move from greater than the risk threshold for statin prescription to less than the threshold. Although application of PRS in prediction of cardiovascular risk is an ongoing topic of discussion, incorporating genetic data into such risk algorithms used routinely in primary care could have significant public health implications.

[..] Polygenic risk scores can only explain part of the genetic aspect of a condition. Because nongenetic factors also contribute to risk, the maximum accuracy of genetic predictor is limited by the heritability of the disorder, where heritability is the proportion of the variance between people in their liability to a disease that is attributed to genetic factors. However, construction of PRS is, to date, limited to DNA risk variants that have frequency of at least 1% in the population (and in some applications, variants are only included if they have a frequency of more than 10% owing to greater instability in PRS using low frequency variants [currently]). Hence, PRS are not designed to capture all genetic variation only tagged by common single nucleotide variants (SNVs). Therefore, the so-called SNV-based heritability gives the upper limit of the variance between people in their liability to a disease that can be explained by PRS and represents the variance explained by common DNA variants. As GWAS sample sizes increase, the variance explained by PRS will also increase and approach the SNV-based heritability. The SNV-based heritability estimates vary across diseases, but an approximate upper limit is approximately 30%. Although in principle, use of whole-genome sequence data could increase the variance explained by PRS (because more variance would be tagged by measured markers, ie, the SNV-based heritability approaches the heritability), it is unlikely (at least in the short term) to improve PRS.

[..] Polygenic risk scores could be used at 3 key stages. First, PRS could be applied in healthy populations. In principle, PRS could be available for an individual for all common diseases from birth. The genetic data would be held as part of the health record, with the latest score accessed for a specific disease at a point relevant to that disease. As described previously, PRS could easily be integrated into health systems for diseases where population screening programs and preventive health management strategies are already available. It is notable that if PRS were available for 20 different (and uncorrelated) diseases/disorders, while only 1% of the population is at high risk (defined as in top 1%) for any one of them, up to 20% of the population is expected to be in the high-risk category for one of them. It is the ability of the same genetic data to provide multidisease results that are important for health economic evaluations.

Second, PRS could be used in the early phase of illness, when patients present with very general and nonspecific symptoms that do not fit a specific diagnosis. For many diseases/disorders, presentation with clinical symptoms is sufficient together with biomarker testing (such as electrocardiogram for heart arrhythmia) to confirm diagnosis. For diabetes, although a blood glucose test confirms diagnosis, 15% of adults presenting with type 1 diabetes are misdiagnosed as the more common type 2 diabetes, an impactful misdiagnosis given differences in treatment and care. Within those with type 1 diabetes, high PRS for type 1 diabetes could be used to trigger more frequent monitoring of insulin levels because type 1 diabetes PRS were found to predict progression to the critical phenotype of insulin deficiency. In some circumstances, PRS could be used to predict time to event. For example, prediction of age at onset of breast cancer for those carrying causal variants in BRCA1 could contribute to advice on timing of mastectomy. We also propose that PRS could help with the triage and clinical staging of young adults when they first present to services with very general and nonspecific symptoms (eg, anxiety, depression, or suicidal thoughts or behaviors), contributing to clinical decision-making.

Third, it is possible that in the future, PRS could contribute to treatment choices, because responses to treatment, including development of adverse health outcomes (such as weight gain), are likely complex genetic traits. However, investigating the utility of PRS in the context of choice of drug treatments requires larger data sets than are currently available. Compared with a decade ago, we now have the tools to develop models to predict treatment response, but are limited by data to develop and validate predictors. Large cohorts of patients treated with different medications must be followed up and responders contrasted with nonresponders to generate genetic predictors of response or recovery. To date, the inflammatory bowel diseases research has been the flagship for translation of genetic associations into new treatments, and identification of treatment responding subtypes is an active area of research.

We conclude that the PRS available currently may have clinical utility for some diseases for which investigation in clinical settings is already justified. The breadth of applications will increase as genetic data become increasingly available as part of routine health records. Key to making this happen is to extinguish the dogma that equates a genetic test with a result of very high predictive value for current or future diagnosis and accept PRS to have an inherently limited accuracy, as do to many other tests routinely used in health care.”

Full article, Wray NR, Lin T and Austin J et al. JAMA Psychiatry 2020.9.30