The relationship between brain volume and intelligence has long remained a controversial empirical debate (Gould 1980; Kamin & Omari 1998), but a recent paper has aimed to shed light on the debate. Nave et. al (2018) uses a large set of data on cognitive ability tests and brain volume from the UK Biobank to estimate the relationship between the two. However, there are several issues in the analysis that prevent us from making many conclusions.
Deviation from Normality
Though we don’t have access to privileged BioBank data, the authors did put a number of figures in their supplementary materials. They do indicate anywhere from moderate to severe deviations from normality that violate the assumptions of OLS regression.
Because we don’t have access to the data, we are unable to test if homoscedasticity assumptions are violated, but they are likely to be given known facts about IQ scores (Erdodi 2006). There are also likely to be multicolinearity issues, given the known covariance between IQ scores, brain volumes and socioeconomic indices (Brito & Noble 2014; Chan et. al 2018; Croziet & Dutrevis 2004), which could bias the regression parameter estimates.
Omitted Variable Bias
While they did include a number of covariates in an attempt to control for possible confounds, their list was very unimpressive. To be specific, they only included sex, age at brain scan, age at IQ test, height, the difference methods of ascertaining the IQ scores, genetic principal components, and some interactions. The reasoning for each goes as such:
- Sex: males and females differ in brain size, and possibly in IQ score, which could induce some sort of Simpson’s paradox.
- Age: IQ scores and brain volumes are both known to covary with age
- Height: IQ scores are known to covary with height (Hartwig et. al 2018), and height with brain volume (Lynn 1989), so height could induce a spurious correlation between IQ scores and brain volume.
- Genetic principal components: Individuals with different ancestries in the British isles have different developmental environments, access to resources, etc, which can induce spurious correlations.
- Townsend Deprivation Index: Socioeconomic status is known to covary with both IQ and brain volume (Capron & Duyme 1989; Noble et. al 2015), which could also induce a spurious relationship
However, the specific covariates they used are not ideal or full representations of the constructs they are intended to regress out. In order:
- Body size: the point they were getting at is that people with larger bodies typically have larger brains, and sometimes this is positively associated with intelligence scores. There are other ways of measuring height (sitting height, standing height, etc) that should also be considered, as well as other anthropometric factors.
- Socioeconomic status: This was perhaps the most egregious issue in the paper, the fact that their metric of socioeconomic status was merely the Townsend Deprivation Index, rather than any of the complex and nuanced metrics of socioeconomic status that sociologists, economists and other social scientists have developed over the years (Aber et. al 1997; Allin et. al 2009; Boardman & Robert 2000; Greenwald et. al 1996; Hodge et. al 1964; Richardson & Jones 2019; Rothstein & Wozny 2011; Sandefur et. al 2006; Wilkinson 1997). Given that the Townsend Deprivation Index will only partially correlate with many of these indices, it will not capture all of the relevant confounding.
There are also other relevant covariates to consider here:
- Air pollution has been known to affect IQ and brain development (Brockmeyer & D’Angiulli et. al 2016; Zhang et. al 2018).
- Lead poisoning is also known to covary with both IQ and brain development (Kuhlmann et. al 1997; Needleman et. al 1990).
- Various diseases can reduce brain development and IQ scores such as HIV (Doyle et. al 2013; Epstein & Gelbard 1999; Wood et. al 2009), parasitic infections (Cordeiro et. al 2015; El-Nofely & Shaalan 1999; Eppig et. al 2010; Nokes et. al 1992) and other diseases (Bale 2009; Daniele & Ostuni 2013)
- Among various other factors
Excluding these from the model can inflate parameters or even create a relationship where no such relationship exists.
If one looks carefully at the graphs reported in the paper, you’ll also come across another problematic figure: the cloud of data that their regression estimates are calculated from.
The methods underlying basic econometric techniques (as the authors are economists) are known to often be violated (Swann 2012), and it is dubious to infer anything from such regressions given that the signal-to-noise ratio is often very small, especially as sample sizes increase. T-statistics and p-values are not always informative as to the actual existence of a relationship. Moreover, improperly specified functional forms and covariates can give you regression coefficients of the wrong sign (Achen 2006).
The loess plot above can also suffer from issues like outliers, local variable selection (Miller & Hall 2010) and error dependencies (Hall & Hart 1990), not to mention well-known boundary biases (Gasser & Muller 1979). Given the lack of external validation for the functional form, we should also be skeptical of the fact that loess fits data with extensive flexibility.