National Mean IQ Estimates: Validity, Data Quality, and Recommendations. Russell T. Warne. Evolutionary Psychological Science, Dec 19 2022. https://link.springer.com/article/10.1007/s40806-022-00351-y
Abstract: Estimates of mean IQ scores for different nations have engendered controversy since their first publication in 2002. While some researchers have used these mean scores to identify relationships between the scores and other national-level variables (e.g., economic and health variables) or test theories, others have argued that the scores are without merit and that any study using them is inherently and irredeemably flawed. The purpose of this article is to evaluate the quality of estimates of mean national IQs, discuss the validity of different interpretations and uses of the scores, point out shortcomings of the dataset, and suggest solutions that can compensate for the deficiencies in the data underpinning the estimated mean national IQ scores. My hope is that the scientific community can chart a middle course and reject the false dichotomy of either accepting the scores without reservation or rejecting the entire dataset out of hand.
Notes
This Flynn effect adjustment is often misunderstood. It does not increase or decrease the score of the country to reflect the age of the test. Rather, it adjusts the international IQ standard (where 100 = the mean in the UK) to the year of the test administration in a country so that the country’s measured IQ is compared to the estimated standard for the same year.
Only one sample had an overall quality rating of .18; it was collected in the United States. Four samples achieved an overall quality rating of .90. The data for these samples were collected in Tajikistan, the UK, the USA, and Yemen.
The width of a confidence interval is equal to , where is equal to the standard error of the mean, σ = 15 (the default SD of a population on the IQ metric), and n is the combined sample size of all samples that contribute to a country’s mean IQ estimate (Warne, 2021, pp. 199–201).
These statistics are calculated using the absolute value of the differences between the QNW + SAS + GEO IQ in the Lynn and Becker (2019b) dataset and the IQ + GEO IQ for the previous version.
Listed in descending order of the magnitude of IQ change: Nicaragua (− 23.78 IQ points); Haiti (21.60 IQ points); Honduras (− 18.84 IQ points); Nepal (− 18.00 IQ points); Guatemala (− 17.71 IQ points); Saint Helena, Ascension, and Tristan da Cunha (− 17.01 IQ points); Belize (− 16.25 IQ points); Cabo Verde (− 16.00 IQ points); Morocco (− 15.39 IQ points); Yemen (− 14.39 IQ points); Mauritania (− 14.00 IQ points); Chad (11.83 IQ points); Saint Lucia (11.71 IQ points); Barbados (11.69 IQ points); Senegal (− 10.50 IQ points); Republic of the Congo (− 10.03 IQ points); Côte d’Ivoire (− 10.02 IQ points); and Vanuatu (10.02 IQ points). Positive values in this list indicate that the new IQ estimates from Lynn and Becker (2019b) are higher than the earlier estimate. Negative values indicate the new value is lower.
The PERCE 1997 and SERCE 2006 data are taken from official publications reporting country means for each grade level and subject (Oficina Regional de Educación para América Latina y el Caribe/UNESCO, 2001, p. 176; 2008, Tables A.3.1, A.3.5, A.4.1, A.4.5, and A.5.1). TERCE 2013 and ERCE 2019 data can be downloaded at https://raw.githubusercontent.com/llece/comparativo/main/datos_grafico_1-1.csv
Cuba did not participate in TERCE 2013. Its ERCE 2019 data are much more similar to data from other countries in Latin America.
The 1995 SACMEQ test only produced reading scores. The 2000 and 2007 SACMEQ tests produced a reading and mathematics score. The 2000 and 2007 scores were combined as an unweighted mean for each country when calculating correlations with the estimated national-level IQs.
The correlations between PASEC scores and the GEO IQ scores from the Lynn and Becker (2019b) dataset—i.e., with Benin and Burkina Faso removed—are r = .142 (for PASEC grade 2 language), r = .153 (for PASEC grade 2 mathematics), r = − .662 (for PASEC grade 6 language), and r = −.262 (for PASEC grade 6 mathematics). This does not change the conclusion that geographically imputed scores have a poor correspondence with data drawn from a country.
The national PIRLS/TIMSS scores and the chart to convert LLECE and PASEC scores to PIRLS/TIMSS scores to one another are available at https://www.cgdev.org/sites/default/files/patel-sandefur-human-capital-final-results.xlsx
Two regions, England and Northern Ireland, were part of the same country. When calculating correlations with QNW + SAS IQs, the Northern Ireland data were dropped, and the data for England was compared to QNW + SAS IQs for the entire UK.
Sear’s (2022) criticism of using IQ data from children to estimate IQs for an entire population shows that she does not understand that IQ scores are calculated by comparing examinees to their age peers. This functionally controls for age and allows scores from different age groups to have the same meaning. For an accessible explanation of how IQ scores are calculated, see Warne (2020), pp. 5–9.
Readers may be aware of Lim et al.’s (2018) study that measures human capital in 195 countries. These scores are not included in the discussion in this article because the underlying data are not solely cognitive/educational scores. Lim et al. (2018) also used health data and longevity/life expectancy data in the calculation of their human capital scores. Therefore, the Lim et al. (2018) data cannot be interpreted as a cognitive measure, which makes it inadequate to use for convergent validity purposes when studying the Lynn and Becker (2019b) dataset.
Available at https://datacatalog.worldbank.org/search/dataset/0038001.
This statistical truism is why the Flynn effect (a purely environmental effect) can coexist with high heritability (a variance statistic measuring the strength of generic influence on a phenotype in a population) of IQ. The same secular mean increase occurred in height (a phenotype with high heritability) in many countries during the twentieth century. Changes in the mean do not automatically result in changes in the variance—and vice versa.
It is important to recognize that mean QNW + SAS IQs below 70 are also found in some Central American nations (Belize, El Salvador, Guatemala, Honduras, Nicaragua), the Caribbean (Dominica and Saint Vincent and the Grenadines), and Morocco, Nepal, and Yemen.
For the 2018 PISA, the SD for the UK data was 93 for math scores and 99 for science scores (Schleicher, 2019, pp. 7–8). In these calculations, I used the standard deviation of 99 to be more conservative. My choice of standard deviation will not affect any correlations, but it will change differences between these IQs and others and make outlier national mean IQs slightly less extreme.
This is not an artifact of the extrapolation based on nearby countries’ data that Gust et al. (2022) used. The correlation between scores for the 12 countries that had imputed data in both datasets was r = .608; for the 13 countries that had geographically imputed scores in the Lynn and Becker (2019b) dataset and scores based on educational achievement testing data in the Gust et al. (2022) dataset, the correlation was r = .511. The average difference between the two sets of scores is also similar.
Gust et al. (2022, p. A1) noted that Angrist et al.’s (2021) method overestimates academic achievement HLOs, compared to the Gust et al. (2022) method. The average scores in Table 2 are much more similar than would be expected because of the different means for the UK that were used to calculate z-scores and IQs. The HLO mean for the UK is 527.8 in the Angrist et al. (2021) data, compared to the Gust et al. (2022) mean of 503.2. The higher HLO mean for the UK provides a correction to the HLO scores, when converted to IQs, and makes the weighted mean IQs for both datasets in Table 2 much more similar.
The QNW + SAS IQs for these countries are 69.45 (Botswana), 60.98 (Ghana), and 69.80 (South Africa). However, note that these are not independent of the PIRLS and TIMSS data because Lynn and Becker (2019b) used the educational achievement data to calculate SAS IQs, which contributed data to the QNW + SAS IQs.
The largest discrepancies were for the Dominican Republic (+ 20.11 IQ points), Yemen (+ 19.34 IQ points), Tunisia (+ 12.39 IQ points), Argentina (+ 11.45 IQ points), Kuwait (+ 10.59 IQ points), and Honduras (− 10.47 IQ points). In this list, positive numbers indicate a higher QNW + SAS score in the Lynn and Becker (2019b) dataset, and negative numbers indicate a higher IQ derived from the Patel and Sandefur (2020) study.
The four countries with geographically imputed IQs in Lynn and Becker’s (2019b) dataset that have discrepancies of at least 10 IQ points are Paraguay (+ 17.26 IQ points), Senegal (− 15.76 IQ points), Chad (+ 13.92 IQ points), and Niger (+ 10.10 IQ points). In this list, positive numbers indicate a higher QNW + SAS + GEO score in the Lynn and Becker (2019b) dataset, and negative numbers indicate a higher IQ derived from the Patel and Sandefur (2020) study.
The largest discrepancies were for Cambodia (+ 26.4 IQ points), Venezuela (− 23.1 IQ points), Cuba (− 20.6 IQ points), Pakistan (+ 18.4 IQ points), Nicaragua (− 15.9 IQ points), Sri Lanka (+ 15.9 IQ points), Guatemala (− 15.4 IQ points), the Dominican Republic (+ 15.3 IQ points), the Philippines (+ 14.8 IQ points), Kyrgyzstan (+ 13.1 IQ points), Argentina (+ 12.4 IQ points), Haiti (+ 12.2 IQ points), Morocco (− 11.4 IQ points), Mongolia (+ 10.8 IQ points), and the United Arab Emirates (− 10.1 IQ points). In this list, positive numbers indicate a higher QNW + SAS score in the Lynn and Becker (2019b) dataset, and negative numbers indicate a higher IQ derived from the Gust et al. (2022) study. The inclusion of Cuba on this list is due to the use of SERCE 2006 data in the Gust et al. (2022) paper. As I stated earlier in this article, the Cuban data for this test are an outlier and likely fraudulent. This shows that when national IQ discrepancies arise in different datasets, it does not always indicate that Lynn and Becker’s (2019b) data are wrong.
In descending order of the magnitude of the discrepancy, these countries were Honduras (22.62 IQ points lower), Botswana (18.52 IQ points lower), South Africa (13.80 IQ points lower), and Egypt (11.65 IQ points lower).
Testing students one grade higher typical is standard practice for South Africa when administering PIRLS and TIMSS tests.
The Burundi data are clearly an outlier. Patel and Sandefur (2020) reported that 43% of examinees in Burundi met or exceeded the TIMSS low international benchmark in reading, which is typical of PASEC countries (PASEC, 2015, p. 50). The discrepancy between Burundi’s math and reading performance originates in the PASEC data and is not an error in Patel and Sandefur’s conversion of PASEC scores to TIMSS scores.
Pupil age is another factor to consider in making these comparisons. Repeating a grade is much more common in sub-Saharan Africa than it is in Western countries. However, these older pupils score worse on the PASEC than their classmates who have never repeated a grade (PASEC, 2015, pp. 78–81). Unlike testing students in a higher grade, the inclusion of these older students does not increase the countries’ percentages of students who meet the TIMSS low international benchmark.
I only compared mathematics scores here because language differences (e.g., one language being easier to learn to read than another) make comparing reading scores and competency less straightforward than comparing proficiency in mathematics (Gust et al., 2022). Additionally, many children in African learn to read in a non-native language (i.e., Swahili, or a colonial language instead of their local African language), which would be a penalty when comparing reading scores to children in economically developed nations where most children are tested in their native language.
There are three versions of the Raven’s: the Colored Progressive Matrices, Progressive Matrices, and Advanced Matrices (listed in ascending order of difficulty).
Countries with a low NWQ + SAS IQ (≤ 75) based solely on matrix test data are Benin, the Republic of the Congo, Djibouti, Dominica, Eritrea, Ethiopia, The Gambia, Guatemala, Malawi, Mali, Morocco, Namibia, Nepal, Saint Vincent and the Grenadines, Sierra Leone, Somalia, South Sudan, Syria, Tanzania, Yemen, and Zimbabwe.
This is why I have preferred to use the QNW + SAS IQs whenever possible in this article. QNW + SAS IQs are based on the most data and do not include countries with geographically imputed mean IQs.
That is, unless one does not believe that educational performance, life outcomes, health and disease, economic prosperity, and strong civic institutions are important.