Abstract: Spearman’s g is the name for the shared variance across a set of intercorrelating cognitive tasks. For some—but not all—theorists, g is defined as general intelligence. While g is robustly observed in Western populations, it is questionable whether g is manifested in cognitive data from other cultural groups. To test whether g is a cross-cultural phenomenon, we searched for correlation matrices or data files containing cognitive variables collected from individuals in non-Western, nonindustrialized nations. We subjected these data to exploratory factor analysis (EFA) using promax rotation and 2 modern methods of selecting the number of factors. Samples that produced more than 1 factor were then subjected to a second-order EFA using the same procedures and a Schmid-Leiman solution. Across 97 samples from 31 countries totaling 52,340 individuals, we found that a single factor emerged unambiguously from 71 samples (73.2%) and that 23 of the remaining 26 samples (88.5%) produced a single second-order factor. The first factor in the initial EFA explained an average of 45.9% of observed variable variance (SD = 12.9%), which is similar to what is seen in Western samples. One sample that produced multiple second-order factors only did so with 1 method of selecting the number of factors in the initial EFA; the alternate method of selecting the number of factors produced a single higher-order factor. Factor extraction in a higher-order EFA was not possible in 2 samples. These results show that g appears in many cultures and is likely a universal phenomenon in humans.
Public Significance Statement: This study shows that one conceptualization of intelligence—called Spearman’s g—is present in over 90 samples from 31 non-Western, nonindustrialized nations. This means that intelligence is likely a universal trait in humans. Therefore, it is theoretically possible to conduct cross-cultural research on intelligence, though culturally appropriate tests are necessary for any such research.
KEYWORDS: Spearman’s g, cross-cultural psychology, general cognitive ability, human intelligence, factor analysis
General Discussion of Results
We conducted this study to create a strong test of the theory that general cognitive ability is a cross-cultural trait by searching for g in human populations where g would be the least likely to be present or would be weakest. The results of this study are remarkably similar to results of EFA studies of Western samples, which show that g accounts for approximately half of the variance among a set of cognitive variables (e.g., Canivez & Watkins, 2010). In our study, the first extracted factor in 97 EFAs of data sets from non-Western, nonindustrialized countries was 45.9%.
Moreover, 73.2% of the data sets unambiguously produced a single factor, regardless of the method used to select the number of factors in the EFA. Of the remaining data sets, almost every one in which a second-order EFA was possible produced a single general factor. The only exceptions were from Grigorenko, Ngorosho, Jukes, and Bundy (2006) and Gurven et al. (2017). The Grigorenko et al. (2006) dataset produced two general factors only if one sees the modified Guttman method of selecting the number of first-order factors as being a more realistic solution than MAP. Given the modified Guttman rule’s penchant for overfactoring and the generally accurate results from MAP in simulation studies (Warne & Larsen, 2014), it seems more likely that even the Grigorenko et al. (2006) dataset has two first-order factors and one general factor that accounts for 49.9% of extracted variance. The Gurven et al. (2017) samples both produced two factors in an initial EFA, but the factor extraction process failed for the second-order EFAs for both samples. The inability to test whether the two initial factors could form a general factor makes the Gurven et al. (2017) data ambiguous in regards to the evidence of the presence of g in its Bolivian samples.
Although we did not preregister any exact predictions for our study, we are astonished at the uniformity of these results. We expected before this study began that many samples would produce g, but that there would have been enough samples for us to conduct a post hoc exploratory analysis to investigate why some samples were more likely to produce g than others. With only three samples that did not produce g, we were unable to undertake our plans for exploratory results because g appeared too consistently in the data.
Thus, Spearman’s g appeared in at least 94 of the 97 data sets (97.0%) from 31 countries that we investigated, and the remaining three samples produced ambiguous results. Because these data sets originated in cultures and countries where g would be least likely to appear if it were a cultural artifact, we conclude that general cognitive ability is likely a universal human trait. The characteristics of the original studies that reported these data support this conclusion. For example, some of these data sets were collected by individuals who are skeptical of the existence or primacy of g in general or in non-Western cultures (e.g., Hashmi et al., 2010; Hashmi, Tirmizi, Shah, & Khan, 2011; O’Donnell et al., 2012; Pitchford & Outhwaite, 2016; Stemler et al., 2009; Sternberg et al., 2001, 2002). One would think that these investigators would be most likely to include variables in their data sets that would form an additional factor. Yet, with only three ambiguous exceptions ( Grigorenko et al., 2006; Gurven et al., 2017), these researchers’ data still produced g. Additionally, many of these data sets were collected with no intention of searching for g (e.g., Bangirana et al., 2015; Berry et al., 1986; Engle, Klein, Kagan, & Yarbrough, 1977; Kagan et al., 1979; McCoy, Zuilkowski, Yoshikawa, & Fink, 2017; Mourgues et al., 2016; Ord, 1970; Rehna & Hanif, 2017; Reyes et al., 2010; Tan, Reich, Hart, Thuma, & Grigorenko, 2014). And yet a general factor still developed anyway. It is important to recognize, though, that the g factor explained more observed variable variance in some samples than in others.
For those who wish to equate g with a Western view of “intelligence,” this study presents several problems for the argument that Western views of intelligence are too narrow. First, in our search, we discovered many examples of non-Western psychologists using Western intelligence tests with little adaptation and without expressing concern about the tests’ overly narrow measurement techniques. Theorists who argue that the Western perspective of intelligence is too culturally narrow must explain why these authors use Western (or Western-style) intelligence tests and why these tests have found widespread acceptance in the countries we investigated (Oakland, Douglas, & Kane, 2016). Another difficulty for the argument that Western views of intelligence are too narrow is the fact that tests developed in these nonindustrialized, non-Western cultures positively correlate with Western intelligence tests (Mahmood, 2013; van den Briel et al., 2000). This implies that these indigenous instruments are also g-loaded to some extent, which would support Spearman’s (1927) belief in the indifference of the indicator.
One final issue bears mention. Two peer reviewers raised the possibility that developmental differences across age groups could be a confounding variable because a g factor may be weaker in children than adults. To investigate this possibility, we conducted two post hoc nonpreregistered analyses. First, we found the correlation between the age of the sample (either its mean or the midpoint of the sample’s age range) and the variance explained by the first factor in the dataset was r = .127 (r2 = .016, n = 84, p = .256). Because a more discrete developmental change in the presence of strength of a g factor was plausible, we also divided the data sets five age groups: <7 years (10 samples), 7–12.99 years (34 samples), 13–17.99 years (12 samples), 18–40.99 years (21 samples), and ≥41 years (five samples). All of these age groups had a mean first factor that had a similar strength (between 41.79% and 49.63%), and the null hypothesis that all age groups had statistically equal means could not be rejected (p = .654, η2 = .031) These analyses indicate that there was no statistical relationship between sample age and the strength of the g factor in a dataset.
Methodological Discussion
A skeptic of g could postulate that our results are a statistical artifact of the decisions we used to conduct a factor analysis. Some data sets in our study had been subjected to EFA in the past, and the results often differed from ours (Attallah et al., 2014; Bulatao & Reyes-Juan, 1968; Church, Katigbak, & Almario-Velazco, 1985; Conant et al., 1999; Dasen, 1984; Dawson, 1967b; Elwan, 1996; Guthrie, 1963; Humble & Dixon, 2017; Irvine, 1964; Kearney, 1966; Lean & Clements, 1981; McFie, 1961; Miezah, 2015; Orbell, 1981; Rasheed et al., 2017; Ruffieux et al., 2010; Sen, Jensen, Sen, & Arora, 1983; Sukhatunga et al., 2002; van den Briel et al., 2000; Warburton, 1951). In response, we wish to emphasize that we chose procedures a priori that are modern methods accepted among experts in factor analysis (e.g., Fabrigar et al., 1999; Larsen & Warne, 2010; Thompson, 2004; Warne & Larsen, 2014). The use of promax rotation, for example, might be seen as an attempt to favor correlated first-order factors—which are mathematically much more likely to produce a second-order g than orthogonal factors. However, promax rotation does not force factors to be correlated, and indeed uncorrelated factors are possible after a promax rotation. Therefore, the use of promax rotation permitted a variety of potential factor solutions—including uncorrelated factors—and permitted the strong test of g theory that we desired.
Another potential source of criticism would be our methods of retaining the number of factors in a dataset. The original Guttman (1954) rule of retaining all factors with an eigenvalue of 1.0 or greater is the most common method used in the social sciences, probably because it is the default method on many popular statistical analysis packages (Fabrigar et al., 1999). However, the method can greatly overfactor, especially when a dataset has a large number of variables, the sample size is large, and when factor loadings are weak (Warne & Larsen, 2014). These circumstances are commonly found in cognitive data sets, which are frequently plagued by overfactoring (Frazier & Youngstrom, 2007). This is why we chose to use more conservative and accurate methods of retaining the number of factors (Warne & Larsen, 2014). The use of MAP is especially justified by its strong performance in simulation studies and its tendency to rarely overfactor. MAP is insensitive to sample size, the correlation among observed variables, factor loading strength, and the number of observed variables (Warne & Larsen, 2014), all of which varied greatly among the 97 analyzable data sets.
Indeed, it is because of our use of modern methods of factor selection and rotation that we believe that prior researchers have never noticed g as a ubiquitous property of cognitive data in non-Western groups. Many prior researchers used varimax rotation and the original Guttman rule, likely because these methods mathematically and computationally were easier in the days before inexpensive personal computers or because both are the default method in popular statistics packages today. (Additionally, the older data sets predate the invention of promax rotation and/or MAP). But both of these methods obscure the presence of g. As an extreme example, Guthrie’s (1963) data consist of 50 observed variables (the most of any dataset in our study) that produced 22 factors when he subjected them to these procedures. Some of Guthrie’s (1963) factors were weak, uninterpretable, or defined by just one or two variables. In our analyses we found five (using MAP) or 10 (using the modified Guttman rule) first-order factors; when subjected to the second-order EFA, the data clearly produced a single factor with an obvious interpretation: g.
The results of this study are highly unlikely to be a measurement artifact because the original researchers used a wide variety of instruments to measure cognitive skills in examinees. While some of these instruments were adaptations of Western intelligence tests (e.g., Abdelhamid, Gómez-Benito, Abdeltawwab, Bakr, & Kazem, 2017), some samples included variables that were based on Piagetian tasks (e.g., Dasen, 1984; Kagan et al., 1979; Orbell, 1981). Other samples included variables that were created specifically for the examinees’ culture (e.g., Mahmood, 2013; Stemler et al., 2009; Sternberg et al., 2001; van den Briel et al., 2000) or tasks that did not resemble Western intelligence test subtests (Bangirana et al., 2015; Berry et al., 1986; Gauvain & Munroe, 2009). There were also several samples that included measures of academic achievement in their data sets (e.g., Bulatao & Reyes-Juan, 1968; Guthrie, 1963; Irvine, 1964). The fact that g emerged from such a diverse array of measurements supports Spearman’s (1927) belief in the “indifference of the indicator” and shows that any cognitive task will correlate with g to some degree.
Other readers may object to our use of EFA at all, arguing that a truly strong test of g theory would be to create a confirmatory factor analysis (CFA) model in which all scores load onto a general factor. However, we considered and rejected this approach because CFA only tests the model(s) at hand and cannot generate new models from a dataset (Thompson, 2004). In this study, EFA procedures did not “know” that we were adherents to g theory when producing the results. Rather, “EFA methods . . . are designed to ‘let the data speak for themselves,’ that is, to let the structure of the data suggest the most probable factor-analytic model” (Carroll, 1993, p. 82). Thus, if a multifactor model of cognitive abilities were more probable in a dataset than a single g factor, then EFA would be more likely to identify it than a CFA would. The fact that these EFAs so consistently produced g in their data is actually a stronger test of g than a set of CFAs would have been because EFA was more likely to produce a model that disproved g than a CFA would. CFA is also problematic in requiring the analyst to generate a plausible statistical model—a fact that Carroll (1993, p. 82) recognized when he wrote:
It might be argued that I should have used CFA. . . . But in view of wide variability in the quality of the analyses applied in published studies, I could not be certain about what kind of hypotheses ought to be tested on this basis. (Carroll, 1993, p. 82)
We agree with Carroll on this point. CFA also requires exactly specifying the appropriate model(s) to be tested. While this is a positive aspect of CFA in most situations, it was a distinct disadvantage when we were merely trying to establish whether g was present in a dataset that may not have been collected for that purpose. This is because most authors usually did not report a plausible theoretical model for the structure of their observed variables, and there was often insufficient information for us to create our own plausible non- g models that could be compared with a theory of the existence of Spearman’s g in the data .3 Indeed, some researchers did not collect their data with any model of intelligence in mind at all (e.g., McCoy et al., 2017). By having EFA to generate a model for us, we allowed plausible competing models to emerge from each dataset and examined them afterward to see if they supported our theory of the existence of Spearman’s g in non-Western cultures. Another problem with CFA’s requirement of prespecified models is that some theories of cognitive abilities include g as part of a larger theoretical structure of human cognition (e.g., Canivez, 2016; Carroll, 1993). How the non-g parts of a model might relate to g and to the observed variables is rarely clear.
Another advantage to EFA over CFA is that the former uses data to generate a new model atheoretically, and the subjective decisions (e.g., factor rotation method, second-order procedures, standards used to judge the number of factors) in an EFA are easily preregistered, whereas the subjective decisions in a CFA (e.g., when to use modification indices, how to arrange variables into factors, the number of non-g factors to include in a model) often cannot be realistically preregistered—or even anticipated before knowing which variables were collected—in secondary data analysis if the data were not collected in a theoretically coherent fashion (as was often the case for our data sets). By preregistering the subjective decisions in an EFA, we could ensure that subjective decisions could not bias our results into supporting our preferred view of cognitive abilities.
Finally, we want to remind readers that our dataset search and analysis procedures were preregistered and time stamped at the very beginning of the study before we engaged in any search procedures or analyses. This greatly reduces the chance for us to reverse engineer our methods to ensure that they would produce the results we wanted to obtain. Still, deviations from our preregistration occurred. When we deviated from the preregistration protocol, we stated so explicitly in this article, along with our justification for the deviation. Additionally, some unforeseen circumstances presented themselves as we conducted this study. When these circumstances required subjective decisions after we had found the data, we erred on the side of decisions that would maximize the chances that the study would be a strong test of g theory. Again, we have been transparent about all of these unforeseen circumstances and the decisions we made in response to them.
Full paper, maps, references, etc., at the link at the beginning.
No comments:
Post a Comment