Trans”ethnic” Polygenic Scores

Trans”ethnic” polygenic score research is research in human genetics that estimates the applicability of polygenic scores developed in one population to another population; whether or not, and how well, polygenic scores perform (Li & Keating 2014).

Recent discussion over the applicability of polygenic scores (hereafter PGS) has sparked some controversy on Twitter. Some have alleged that recent work on trans”ethnic” genetic correlations provide evidence that they have good external validity (e.g. Lam et. al 2019), while others caution the need to specifically tailor polygenic scores to different populations (Grinde et. al 2018; Gurdasani et. al 2019).

However, there are several issues to the question of the applicability of polygenic scores outside of the group they were designed in (Coop 2019). The first is whether effect sizes are biased; if the SNPs discovered in “European” cohorts (as the vast majority of GWAS research is done on “Europeans” [Bien et. al 2019; Bustamante et. al 2011Duncan et. al 2019; Fullerton et. al 2010Martin et. al 2017Martin et. al 2018Martin et. al 2019; Mills & Rahal 2019; Mogil et. al 2018; Need & Goldstein 2009; Park et. al 2018Peterson et. al 2019; Petrovski & Goldstein 2016]) are not true causal SNPs, but only tag SNPs captured because the tag SNPs are in linkage disequilibrium with the true SNPs, then the fact that these patterns of linkage disequilibrium are destroyed in other populations via recombination will lead to reduced predictive accuracy. Additionally, allele frequency differences can induce different percentages of variance explained per population (Lam et. al 2019; Zanetti & Weale 2018), meaning that scores that are predictive in one population are not nearly as predictive in another (Duncan et. al 2019).

Reduction in Predictive Power

Theoretical research has argued that the predictive power of a polygenic score will reduce about linearly with the $F_{st}$ value of the GWAS population and the application population (Scutari et. al 2016).

  • Bigdeli et. al (2017) – major depressive disorder – CONVERGE PGS explains 0.09% of variance in PGC, PGC PGS explains 0.2% of variance in CONVERGE.
  • Chang et. al (2011) – HDL – \Delta R^2 of 2.3% for “Europeans”, 3.4% for “African-Americans”, and 5.0% for “Mexican-Americans”
  • Chang et. al (2011) – LDL – \Delta R^2 of 9.9% for “Europeans”, 10.5% for “African-Americans”, and 5.3% for “Mexican-Americans”
  • Chang et. al (2011) – TC – \Delta R^2 of 5.0% for “Europeans”, 5.0% for “African-Americans”, and 3.7% for “Mexican-Americans”
  • Chang et. al (2011) – TG – \Delta R^2 of 3.6% for “Europeans”, 1.1% for “African-Americans”, and 7.2% for “Mexican-Americans”
  • Ikeda et. al (2017) – bipolar ‘disorder’ – 86.5% attenuation (from 2% in Japan-Japan to 0.27% in European-Japan)
  • Lee et. al (2018) – EA / IQ – 65.2% attenuation (\Delta R^2 from 4.6% to 1.6%)
  • Lam et. al (2019) – schizophrenia – 33% attenuation (3% to 2%)
  • Monda et. al (2013) – BMI – 22% attenuation (1.67% to 1.3%)
  • Nievergelt et. al (2019) – PTSD –  50$ attenuation (h^2_{SNP} from 4% to 2%)
  • Nylolt et. al (2012) – endometriosis – BBJ SNPs explained 0.54% of variance in QIMRHCS+OX cohort, QIMRHCS+OX cohort explained 1.06% of variance in BBJ cohort.
  • Rabinowitz et. al (2019) – EA – 71% attenuation (R^2 from 4.6% to 1.3%)
  • Vassos et. al (2017) – psychosis – 88.3% attenuation (from 9.4% to 1.1%)
  • Ware et. al (2017) – BMI – 72.7% attenuation (from 5.5% to 1.5%)
  • Ware et. al (2017) – height – 89.2% attenuation (from 7% to .75%)
  • Ware et. al (2017) – EA 2013 – ~91% attenuation (from 3% to .25%)
  • Ware et. al (2017) – EA 2016 – 81% attenuation (from 5.5% to 1%)

Common Causal Variants?

Theory suggests that for traits under local adaptation between regions, that the traits will not have equal effect sizes (Shi et. al 2019), though there are other reasons SNPs could have differential effect sizes by region.

There has been some suggestion that most GWAS variants are in fact common to all populations (e.g. ‘universal’), with approximately identical effect sizes (Akiyama et. al 2017; de Candia et. al 2013Guo et. al 2019; Gurdasani et. al 2019He et. al 2015; Jorgensen et. al 2017Lau et. al 2017Marigorta & Navarro 2013; Monda et. al 2013Nylolt et. al 2012Waters et. al 2010; Xing et. al 2014), though not all work shows similar results (Bigdeli et. al 2017Carlson et. al 2013; Chang et. al 2011; Diagram Consortium et. al 2014Fesinmeyer et. al 2013; Locke et. al 2015; Ware et. al 2017; Wray et. al 2017).

r_{g} and effect sizes?

  • Akiyama et. al (2017) found an r_{g} of .94 for BMI loci.
  • Bigdeli et. al (2017) found r_{g}s ranging from 0.33 to 0.41 for major depressive disorder (MDD) between Chinese and “European” populations.
  • Brown et. al (2016) used traits like gene expression, rheumatoid arthritis and type 2 diabetes and found r_{g}s of 0.32, 0.46, and 0.62 respectively for Yorubans and “Europeans”, and “Europeans” and “East Asians” respectively.
  • Carlson et. al (2013) found that up to 25% of SNPs tagged in a “European” cohort had significantly different effect sizes in a “non-European” cohort.
  • de Candia et. al (2013) found an r_{g} of 0.66 and 0.61 for schizophrenia in two datasets, but the trans”ethnic” genetic correlations depended on minor allele frequencies.
  • Fesinmeyer et. al (2013) found evidence for effect size heterogeneity in 5/13 SNPs.
  • Guo et. al (2019) found that the r_{g} of height for all SNPs between “Europeans” & “Africans” was .75, for BMI was .68, while genome-wide significant SNPs had slightly higher r_{g}s at .82 and .87 for height and BMI respectively. They found, however, that these could not be attributable to allele frequency differences or linkage disequilibrium, but could not rule out issues of power.
  • Ikeda et. al (2017) reported a \rho_{g} of 0.724 for bipolar disorder
  • Jorgensen et. al (2017)‘s meta-analysis of alcohol consumption SNPs found r_{g}s ranging from 0.4 to 0.6.
  • Marigorta & Navarro (2013) found that \rho_{g} for \log(OR) was 0.82 between “European” and “East Asian” populations, but effect sizes were slightly larger (\beta>1). The \rho_{g} differed by whether the variant replicated between populations. They also found that differences in linkage disequilibrium could likely explain some of the failed replications.
  • Ntzani et. al (2011) found averages r_{g}s of about .20 (“Asian”-“African”), .27 (“European”-“African”) and 0.33 (“European”-“Asian”) for asthma, atrial fibrillation, BMI, breast cancer, colorectal cancer, eosinophil count, gout, height, Parkinson’s disease, prostate cancer, schizophrenia, SLE, stroke, systemic sclerosis, type 2 diabetes, and uric acid.
  • Wray et. al (2017) found an r_{g} of 0.33 for major depressive disorder, of 0.34 for schizophrenia and 0.45 for bipolar disorder between “Chinese” and “European” populations.
  • Yang et. al (2013) found an r_{g} of 0.39 for ADHD between “European” and “Chinese” populations.
  • Zhou et. al (2018) finds r_{g} of about .4 to .6 for height and BMI

Directional Consistency and Replication Rates

  • Bigdeli et. al (2017) found that ~50.5-51.1% of SNPs are directionally consistent/replicate between the CONVERGE and PGC cohorts for major depressive disorder.
  • Carlson et. al (2013) found directional consistency rates ranging from 68-88% for BMI, type 2 diabetes, and lipid levels.
  • Chang et. al (2011) found a replication rate between ancestral groups for lipid variants that ranged from 44-67%
  • Diagram Consortium et. al (2014) found that the effect sizes between “ethnic” groups were concordant in ~50-57% of SNPs, with concordance rates increasing as the p-value decreased.
  • Fesinmeyer et. al (2013) found replication rates of 69% for within-“European” analyses, 61% for “European”-“East Asian” analyses, 46% for “European”-“African” analyses, 46% for “European”-“Hispanic” analyses, 62.5% for “European”-“Pacific Islander” analyses, and 55.5% for “European”-“American Indian” analyses.
  • Locke et. al (2015) found 79% directional consistency for “Africans” and 91% directional consistency for “East Asians” for BMI.
  • Marigorta & Navarro (2013) found replication rates of 45.8% between “East Asians” and “Europeans”, which increased to 76.5% following accounting for statistical power. For “Africans” and “Europeans”, the respective figures were 9.6% and 59.2%.
  • Monda et. al (2013) found that 88.8% of BMI loci were directionally concordant between “African” and “European” populations.
  • Waters et. al (2010) found that all 19 loci in their study were directionally consistent between a number of populations.

Comparing PGS

Some individuals have advocated that polygenic scores finally provide an opportunity to test whether group differences in particularly phenotypic traits are the result of genes, environments, or some combination thereof. They argue that we can compute the polygenic score for each population, get some sort of representative sample and compare the means, thus concluding that some proportion of the phenotypic gap can be explained by the genotypic gap. However, there is little reason to believe that this is a valid method of inference, and indeed displays the naivete of these activists. As an example, Kerminen et. al (2019) document that subtle population stratification in Finland induces spurious polygenic differences in propensities for various complex traits like BMI, height and coronary artery disease. Martin et. al (2017) showed that the use of polygenic scores for height in “African” populations predicts that “Africans” would be less than 5 feet tall [1], indicating that the actual levels/intercepts of phenotypes can be severely misestimated (Kim et. al 2018). The reasons for this vary by trait, study and cohort, but include Eurocentric biases in GWAS, meaning that variants at high frequency in Africa, but low-frequency in Europe are not captured (Durvasula & Lohmueller 2019), differences in population frequencies of derived and ancestral alleles (Kim et. al 2018), gene-gene and gene-environment interactions (Coop 2019).


The applicability of polygenic scores developed in one population is likely to be limited in other populations, but the magnitude of this limitation varies by the discovery and target populations, the trait and the methodology used to assess heterogeneity. What is clear, however, is that inferences based on mean PGS values for different populations are severely limited at this point in time.


[1] See also here and Berg et. al (2019).