Spearman’s Nothingburger

A little game that hereditarians like to play is to invent a new observational methodology that is allegedly supposed to be able to allow us to infer whether racial gaps are genetic or not. This started with “regression to the mean”, then inferring from a confused combination of heritability figures and environmental disparities, and has continued until the present day. Following Jensen’s resurrection (almost literally) of the so-called Spearman’s g (Schonemann 1981; 1987; 2005), he also created and retroactively attributed a theory of the origin of racial differences in IQ tests to Spearman, calling it “Spearman’s hypothesis” (Jensen 1985). Essentially, Spearman’s hypothesis posits that the “source” of black-white differences on IQ tests is a result of their difference in the latent factor gg is identified by the positive manifold resulting from all-positive correlational matrices for IQ tests (Schonemann 1987). The theories of why a positive manifold arises are contentious and still under debate (Barbey 2018; Demetriou et. al 2016; Heene 2008Kovacs & Conway 2016, 2019; Richardson 2002, 2017Schonemann 1987; Schubert et. al 2017; Stankov 2007Turkheimer 2007van der Maas et. al 2006; van der Maas et. al 2017), but the “Spearman”‘s hypothesis is unique in its argument that it posits that all black-white gaps in IQ tests can be explained by differences in their underlying g. The theory also posits [1] that the greater g-loading a test has, the larger the black-white differences on that test will be.

There are almost innumerable issues with this method of allegedly ‘testing’ “Spearman”‘s hypothesis. First is the possibility that the correlations are merely an artifact of the way that the factor/principal component is extracted [2] (Guttman 1992; Schonemann 19891992, 1998a, 1998b). This would end the use of the method of correlated vectors as a means of investigating the source of group differences, entirely, and make an entire research programme (te Nijenhuis & Van den Hoek 2016te Nijenhuis et. al 2014; te Nijenhuis et. al 2015a; te Nijenhuis et. al 2015b; te Nijenhuis et. al 2016) useless.

Secondly, there have been an abundance of simulation studies showing that not only can positive correlations between g-loadings (and item heritabilities) and group differences appearance in the absence of a group difference on g (Ashton & Lee 2005; Dolan & Hamaker 2001Dolan et. al 2004Lubke et. al 2001; Wicherts & Johnson 2009; Wicherts 2017), but also when there is a group difference on g, there can be an absence of a correlation (Ashton & Lee 2005).

Finally, there is the issue of inference. Typically the way that “Spearman” correlations are used is to infer that a group difference must be genetic from the observed correlation. However, it is entirely possible that the pattern of gaps and g-loadings can be explained on an environmental hypothesis as well (Flynn 2010Flynn 2019). Moreover, the typically adduced “evidence” that adoption gains, nutrition gains, etc are not “on g“, or whatnot, is also not inconsistent with an environmentalism that posits nonadditivity and/or nonlinearities.


It is also unclear whether appeals to a mythical is relevant for the race & IQ debate. g itself, as noted above, is a controversial phenomenon whose existence as anything but a statistical construct is still contested [3] (Gould 1996). Moreover, does not seem to explain differences between people with and without brain trauma (Flynn et. al 2014), which is almost a reductio ad absurdium of using g for group differences questions. Flynn (2008) notes that when IQ gaps closed in the German adoption study, so did g gaps, implying that the IQ-distinction is not relevant when it comes to the IQ “debate”. Finally, there is evidence that g-gaps are malleable (Dickens and Flynn 2006a), even though gains correlated negatively with g-loading and gaps positively. This is a strong indictment of the relevance of IQ-g distinctions in explaining racial gaps.

[1] Whether or not the theory actually posits this is quite complicated, but we’ll get into this later in the post

[2] Dolan & Lubke (2001) argue that Schonemann’s results are the result of the fact that the conditions for his theorem make the result follow trivially, specifically the mechanism by which the black and white subgroups are constructed. However, this criticism doesn’t seem to fully understand Schonemann’s theorems of the mathematical consequences of the violations of its assumptions (Schonemann 2002). Schonemann notes that the geometrics of the question imply that that the mechanism of the decomposition of the pooled group into the subgroups does not have a substantial impact, which is most of the criticism Dolan & Lubke posit. There is a more clear statement of the very few assumptions necessary for the Level II artefacts to appear: “(a) approximate multinormality for the pooled group, (b) positive mean differences, and (c) positive within group covariance matrices”.

[3] For instance, the issues of factor indeterminacy (Schonemann & Wang 1972; Steiger & Schonemann 1975; Schonemann & Haagen 1987) and the fact that gs extracted from different batteries are not identical (Mackintosh 2011).