Asian Intelligence Superiority, or Is It?

Because the ideology of hereditarianism has been historically and contemporarily been deeply embedded in white supremacist communities, networks, and platforms, it is often claimed that hereditarianism itself is a white supremacist belief. I will not take on that question today, but rather am looking at a related question of the response to this accusation. The typical response to the claim that hereditarianism is white supremacist is to posit that hereditarians don’t, in fact, believe that whites are superior in intelligence, but rather that East Asians are superior in intelligence, as a result of “natural selection”.

It is undoubted that Richard Lynn, Philippe Rushton and others claims that East Asians achieve higher scores on IQ tests than other ‘races’, but the question of whether their claim represents reality is a very different one.

Asian-Americans

The most often-cited example of East Asian IQs comes from the United States, where it has been long observed that East Asian ethnic groupings (Chinese, Japanese, Korean, etc) score higher on IQ tests than other groups. However, this is an entirely different question than whether East Asians as a (global) group score higher than other groups on IQ tests, because it is not necessarily true that East Asians in the United States are representative of the global East Asian population.

The Data

Ancient Data

Yeung (1921) reported a mean of 93.5 for Chinese boys and 99.9 for Chinese girls. Fukuda (1923) reports a mean of 97. Symonds (1924) reports means not different from the white population. Bere (1924) also reports means for Japanese individuals ranging from 86 (rural) to 99 (urban). Murdoch (1925) reports all Asian immigrant groups have lower meas than whites. Graham (1926) reported a Chinese mean of 79.7 on one test, and a mean of 87.83 on another, compared to white means of 127.55 and 144.11, respectively. Porteus & Babcock (1926) report lower means for Japanese, Chinese, Hawaiian and Filipino children (cited in Vernon & Lykken 1982 [1]), and also report that Hoag (1926) found a mean of ~87 for Chinese and 81 for Japanese. Darsie (1926) reports mean IQs for American-born Japanese children ranging from 89 to 99. Hsiao (1929) also reports that Chinese and white IQs were approximately equal in Lee (1921)’s sample, while Yokiosha (1929) reports the mean IQ of American-born Japanese children is ~86. Smith (1942) reports lower means for Japanese, Chinese and Filipino groups than whites, while Livesay (1942) reported Caucasian means were higher than other ethnic group means. Bitner (1954) found Caucasians generally received higher scores on intelligence and academic tests than other Asians. Werner et. al (1968) reports no consistent group differences between Japanese, Hawaiians and whites. Leiter (1969) reports much lower means for Hawaiian Chinese and Japanese than whites. Morton et. al (1976) reports no racial differences in academic performance in Hawaii using highly aggregated data (census tracts). [2]

More Recent Data

Jensen and Inouye (1980) report means of “Oriental” IQ slightly lower than white IQ.  Flynn (1991a) reports, using data from Project Talent, that the average IQ of East Asians was only 98.5, compared to a 100 mean for whites. More recent samples are typically unrepresentative or use inappropriate proxies for IQ such as student achievement tests.

Selection Effects

One persistent issue in the comparison of Asian scores to other ethnic groups is the fact that Asians in the United States (and other countries) have migrated to the United States, and as such are not representative of their home population, just by virtue of having traveled across the ocean. There is actually substantial evidence here that the Asians who immigrate to the United States (and other countries) are selected on the basis on their socioeconomic and class background, as well as other variables (Zhou & Lee 2014; Lee & Zhou 2014Lee & Zhou 2015; Lee & Zhou 2017), and that they are much more selected than other immigrant groups (Tran et. al 2018), and are actually doubly selected (Feliciano & Lanuza 2017).

It would be extremely difficult to explain the fact that Chinese immigrants in Spain have the lowest educational attainment of all ethnic groups without invoking cultural and environmental variables (Yiu 2013).

International Samples

Older Data

Li (1964) reports an approximately equal score of Hong Kong children on Raven’s matrices to British norms. Goodnow and Bethon (1966) reported approximately equal scores on Piaget tests. Douglas and Wong (1977) report similar results in 13 and 15 year cohorts, with Chinese children having lower scores than Americans.

Stevenson & Azuma (1983) report no group differences in IQ between Japanese and American, arguing that Lynn’s results are the artifacts of unrepresentative sampling [3].

Flynn (1991b) reanalyzes Lynn’s data on Hong Kong children and finds peculiar properties of reaction time and IQ test data between the samples, showing the paradoxic finding that Hong Kong children and British children both have faster reaction times than one another [4]

Lynn’s “National IQs”

Most of the citations for high East Asian IQs come from Richard Lynn’s estimates for IQs by nation (Lynn 2008, 2019; Lynn & Meisenberg 2010; Lynn & Vanhanen 2002, 2006).

China

In Lynn & Vanhanen (2002), they cite Raven, Court & Raven (1996), “Li (sic) et. al (1990)“, and Li et. al (1996). The first had averages of 100, 99 and 91.5 for differing age groups and standardization comparisons. I am not able to access the book Raven, Court & Raven (1996), as it does not appear anywhere online (archive.org, libgen or Google Books) or in any local libraries (closest is 3 hours away), so I am not able to analyze their estimates. They reported a mean of 112.4 for the second sample, Dan et. al (1990), but adjusted it downwards for several reasons (unrepresentativeness, norm comparison groups and the Flynn effect) to 103.4. I am unable to find any American norms for the WISC-R online, so I will not be able to recalculate their numbers. I remain very skeptical about their adjustment for representativeness, as I suspect it is not large enough given the reported differences between Chinese regions as well as urban-rural differences (Lynn, Cheng and Wang 2016; Taji et. al 2019). However, Dan et. al (1990) did provide an explanation for their observed difference:

Analyzing the bases for these differences, we think they may be as follows. (1) The children in our study are from a very large city with high economic and cultural levels, but those in the USA sample (WISC-R) are from all over America, so this is an asymmetric comparison. (2) Different economics, cultures, and living habits and customs may produce such differences as these. For instance, the scores for Digit span and Arithmetic in our sample are obviously higher than those in the USA sample (WISC-R). Has such a result something to do with the pronunciation of numbers in the two different languages? This is worthy of further study. (3) For some subtests (such as Vocabulary) in the Verbal Scale a few items were changed according to the conditions of China and one may ask did that decrease the difficulty of these subtests so that the means of the raw scores are higher than those in the USA sample (WISC-R)? This is also considered a relevant factor. (4) With regard to the comparison between our sample and the Chinese trial sample, the higher scores on most of the Performance Scale subtests for the Chinese children may reflect paying more attention to training students on various kinds of skills, especially performance abilities. This is also worthy of further study.

It is worth noting that other researchers have also posited the unique structure of the Chinese language’s numbers may have a role in the higher observed scores on math-related tests (Arithmetic, Coding and Digit Span) (Zhou & Boehm 2001).

Li et. al (1996) also reported possible explanations for the observed differences:

The different levels of performance (in terms of the mean test scores) by youth from the differing nations in this study indicated that some cognitive abilities may develop at differing rates in the three cultures. Among the cognitive factors used in this study, numerical reasoning ability differed most across the three cultures. Japanese students scored higher than American students with Chinese students scoring higher than both Japanese and American students. Also, on two nonverbal ability tests (Hidden Pattern and Figure Classification), the Chinese and Japanese students showed relatively higher performance than American students. American students showed generally higher scores on a verbal reasoning test (Pedigree). Large differences between American youth and Chinese and Japanese youth on numerical ability might be explained by cultural differences. In many Asian cultures, including China and Japan, science and mathematics are emphasized more than they are in the United States (Geary et al., 1992; Stevenson & Lee, 1990; Stevenson et al., 1990). Chinese and Japanese youths’ better performance on nonverbal tests also might be related to cultural influences. Previous studies have found that people from Asia are more capable in nonverbal tasks than in verbal ones (e.g., Wing, 1980). In certain Asian countries, particularly Japan, nonverbal performance is emphasized in school (Vernon, 1982). Japanese children’s folding paper art is an example. Asian youths’ high performance on nonverbal tasks, however, is an area in which further research is needed. Likewise, Chinese and Japanese youths’ lower scores on the Pedigree reasoning test might be culture related. One possible explanation is that Chinese and Japanese youths, because of the complexity of the terminology and concepts in their cultures, are not as familiar with the terminology and concept of genealogical system (family tree) as youths in the United States. For example, corresponding to the English word “aunt” in the United States, there are four possible terms in Chinese based on whether an “aunt” comes from the paternal or maternal side and whether the woman is a sibling or an in-law of the parent. Corresponding to “cousin,” there are at least 12 (in some regions, up to 16) expressions in Chinese depending on the different combinations of parental side, gender of parental siblings, and gender and age of the “cousin.” As a result, genealogical reasoning of Chinese youth may be more difficult than that of American youth. Another possible reason for the lower test scores is that verbal reasoning ability among Chinese and Japanese youth is lower than that of American youth because of some cultural traditions. In both Chinese and Japanese family communications, the emotional aspect is often more important than the logical one. It is possible that Japanese verbal reasoning ability seems somewhat depressed in comparison with other cultures because the Japanese culture does not encourage verbal reasoning (Vernon, 1982).

It is also worth noting that the Chinese students were taken from unmentioned schools in Beijing, the largest city in China, and rural-urban differences in IQ have long been noted for China (Taji et. al 2019).

I do not have access to Lynn & Vanhanen (2006), Lynn & Meisenberg (2010) doesn’t report the usage of any new IQ estimates, and Lynn (2008) doesn’t seem to list the studies he used to construct his national IQ estimate for China, so I will move on to Lynn & Becker (2019).

The samples used are broadly similar, except there are two additions. The first is Liu et. al (2016), which is (surprisingly) a random sample of students, that Lynn & Becker report the mean as 94 for. The other addition is Lynn, Cheng & Wang (2016), which utilizes a Chinese sample of 37,238 students on the Combined Raven’s Test for Children. However, the norms for the sample are not mentioned, nor can they be found in any of the discussions of the paper, so it is entirely unclear how they even converted these figures to British IQs. I have searched for the paper they cited, but have only been able to find the abstract. Moreover, the “Combined Raven’s Test for Children” seems to be a test developed specifically by the Chinese researchers in that study (Dong et. al 2007).

Finally, when we compare the means they report for Raven, Raven and Court (1998) (advanced progressive matrices) in the 2019 book and the means they report for Raven, Raven and Court (1996) (standard progressive matrices) in the 2002 book, we surprisingly see substantially different answers. The 2019 book drops the Raven et. al (1996) data in favor of the Raven et. al (1998) data. Correspondingly, the IQs jump from the mid-90s to over 110. It is unclear why they included the 1996 SPM data in the 2002 book, but dropped the 1996 SPM data in favor of the 1998 APM and 1999 SPM data. This inconsistent reporting (along with the unavailability of the data) makes it difficult to put much trust into the reported values.

Japan

I’ll only review a smaller part of the literature here. Misawa, Motegi, Fujita, and Hattori (1984) reported decreasing IQ gaps between Japanese and American samples with time, a peculiar result. Perhaps it is the result of superior Japanese home environments that is partially erased by the schooling environment. Flynn (1983), (1984) reports issues with Lynn’s various papers on Japanese IQ, and both Kaufman et. al (1989) and Das et. al (2013) report that scores on a subset of subtests can explain the spurious overestimation of Japanese IQ, confirmed by Ishikuma et. al (1988). A more comprehensive review of the literature reported a mean IQ of Japan of about 93-94 (Brouwers et. al 2009). Geary et. al (1996) reported faster reaction times for US undergraduates than Japanese undergraduates.

South Korea

The literature on Japan is small, but the literature on South Korea is even smaller. I have not been able to find a single study not already reported by Lynn and Becker in their 2019 book that reports mean IQs and the norms on that of Korean individuals, or compares Korean IQs to European IQs. There is a burgeoning literature (partially in Korean) that reports Korean translations of WAIS-R, but the standardization samples and relevant norm comparisons are unavailable. However, Song and Jinyu (2017) report that Chinese infants have slightly higher IQs, though the difference is not significant. One of the samples in Lynn and Becker’s estimate is Lynn and Song (1994), which has a sample of Koreans from Busan, South Korea’s second largest city, which may be unrepresentative of the country as a whole. Lynn and Becker themselves report a mean of 97.37 adjusted for sample size and quality, indicating that their own numbers disprove the East Asian superiority thesis. They adjust these upwards after adding in achievement tests, which are considered below.

Southeast Asia

Despite claims that Chinese individuals have the same level of IQ in all nations, this has been manifestly contradicted by the above results and others. For example, it has been inferred from PISA results that Chinese individuals in Southeast Asia have maximum possible scores of around 97: Malaysia-95, Thailand 97.

Other Recent Data

There are a number of other studies showing varying values of IQ for Chinese samples, but the tests used are often Chinese adaptions with unknown standardizations and unrepresentative sampling. Some of the studies are as follows: Guo et. al (1991); 76-81, An et. al (1992); 76-84, Yang et. al (1994); 71-81, Li et. al (1995); 79-89, Hong et. al (2001); 65-82, Wang et. al (2001); 76-81, Xiang et. al (2003); ~97, Xiang et. al (2005); ~96

A very recent study, Lonneman et. al (2017), suggests that there are no differences in the general cognitive ability of Japanese and German adults, but that they differ specifically on their arithmetic ability, perhaps the result of the language difference mentioned above.

PISA/Academic Achievement

Another common citation for the alleged high level of intelligence for East Asians are their scores on PISA and other academic achievement tests. It is well-known that East Asians have fairly high reported scores on tests like PISA, TIMSS, and PIRLS (Soh 2017), but the question is why. It may be easy to attribute this to a higher level of ‘g‘ for East Asians, but the evidence suggests this is too quick. The first issue is that tests like PISA have been documented to suffer from differential levels of effort being put in between countries (Borgonovi & Biecek 2016; Gneezy et. al 2017Zamarro et. al 2016). The second is that PISA sampling is not representative of schools as a whole, as the sampling scheme is quite strange (Fernandez-Cano 2016). PISA does not weight individual schools results by the size of the school, and differential levels of participation can also influence the results. For example, China only reported PISA scores for Shanghai until 2015 (Loveless 2013a; 2013b). Other strange factors, like the choice of schools represented, are also known to affect reported scores. Another factor noted by Flynn (1991a) is that Asian-Americans often do better in schooling assessments than they do on IQ tests, indicating that there could be a cultural factor inflating the results. It has also been shown that tests like TIMSS lack measurement invariance (Wu et. al 2007), so the comparisons don’t have much meaning.

Moreover, recent research has shown that even these national academic achievement tests do not support hereditarian conclusions. For example, China’s scores are lower than Gabon’s and equal to Kenya’s in the recent World Bank harmonized learning outcomes metrics (Patrinos & Angrist 2019).

The Flynn Effect

It is also worth noting that the Flynn effect seems to be much exacerbated in East Asian countries; Wang & Lynn (2018) reporting 7-8 IQ point increases per decade, in contrast to increases of only 2-3 points in European countries (Trahan et. al 2014). This is consistent with the data showing previous Asian IQs being lower than European IQs, but have increased to be larger than European IQs through some process of educational reform (Hsieh 2013; Mok 2014; Nisbett 2009Palmer et. al 2011Postiglione & Tan 2007).

Asian IQs: Lower, Higher, Equal or NOTA?

Our review of the literature on Asian IQs doesn’t leave us at a happy conclusion. There is an absolute paucity of data as to whether the IQs of Asians within the United States is lower or higher than that of whites, and a paucity for Asians outside of the United States as well. Sampling bias, unrepresentativeness, systematic error, Flynn effects, test standardization and normalization issues and the disparate use of tests makes it nearly impossible to compare IQs between the groups. Even more, what does “Asian IQ” mean? There is no one IQ, but many, and we can clearly see that different tests yield different gaps. Whether or not these gaps are the result of the different aforementioned issues is currently unknowable, but it has become clear that any claims of superior East Asian IQ should be met with significant skepticism given the current data we have.

Footnotes

[1] Though see Flynn (1991a).

[2] There are numerous issues with using these old samples. For one, the tests used in these samples are known to have strange properties (like scores having negative correlations with age [Kamin 1974], be improperly standardized, and be extremely culturally biased). The second is that the information about these tests are often hard to track down and may not exist anymore, making it dubious to use these samples for much information. Moreover, there are the issues of the fact that little information about the sampling schemata of the tests and the representativeness of the sample make it difficult to compare means between groups.

[3] See also Flynn (1987),

[4] See also Thomas (2010).