Wednesday, August 10, 2022

No structural brain differences as a function of the Big Five personality traits from a systematic review and meta-analysis

"Nothing to see here": No structural brain differences as a function of the Big Five personality traits from a systematic review and meta-analysis. Yen-Wen Chen, Turhan Canli. Personality Neuroscience, Volume 5, Aug 9 2022. https://www.cambridge.org/core/journals/personality-neuroscience/article/nothing-to-see-here-no-structural-brain-differences-as-a-function-of-the-big-five-personality-traits-from-a-systematic-review-and-metaanalysis/BD74C86346A7C3B65E255FA9F1C6D797


Abstract: Personality reflects social, affective, and cognitive predispositions that emerge from genetic and environmental influences. Contemporary personality theories conceptualize a Big Five Model of personality based on the traits of neuroticism, extraversion, agreeableness, conscientiousness, and openness to experience. Starting around the turn of the millennium, neuroimaging studies began to investigate functional and structural brain features associated with these traits. Here, we present the first study to systematically evaluate the entire published literature of the association between the Big Five traits and three different measures of brain structure. Qualitative results were highly heterogeneous, and a quantitative meta-analysis did not produce any replicable results. The present study provides a comprehensive evaluation of the literature and its limitations, including sample heterogeneity, Big Five personality instruments, structural image data acquisition, processing, and analytic strategies, and the heterogeneous nature of personality and brain structures. We propose to rethink the biological basis of personality traits and identify ways in which the field of personality neuroscience can be strengthened in its methodological rigor and replicability.

 

3. Discussion

MRI studies have come under criticism for reporting under-powered and non-replicable findings (Button et al., Reference Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson and Munafò2013; Szucs & Ioannidis, Reference Szucs and Ioannidis2017). Here, we used a systematic review and a meta-analysis approach to discern which findings, if any, would replicate reported associations between personality and brain measures. Surprisingly, we found no evidence for robust associations between any of the Big Five traits and brain structural indices (i.e., GMV, CT, and SA). Although we observed some consistent results from the qualitative systematic review, these findings failed confirmation when we used a quantitative meta-analytic approach.

3.1. Comparison with previous systematic review and/or meta-analysis studies

To our knowledge, only three studies used a systematic review and/or quantitative meta-analytic approach to evaluate the replicability of associations between personality traits and brain structural indices. One meta-analysis (Mincic, Reference Mincic2015) examined the association between GMV and a broad composite meta-trait named “negative emotionality” and included studies that measured one of these traits: Behavioral inhibition, harm avoidance, trait anxiety, or neuroticism.

Two other studies restricted their analyses to single “Big Five” traits only. One review by Montag, Reuter and colleague (Reference Montag, Reuter, Jurkiewicz, Markett and Panksepp2013) focused on GMV and neuroticism. These investigators reported heterogeneous findings across studies but noted consistent negative associations with neuroticism in prefrontal regions that included SFG and MFG and the OFC. This observation is consistent with our systematic review. However, Montag and colleagues did not subject their reviewed studies to a quantitative meta-analysis to determine the robustness of this observation, whereas our meta-analysis failed to confirm this observed association.

The second study was conducted by Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019) to examine the association between GMV and extraversion, using both a systematic review and meta-analytic approach. Based on quantitative meta-analysis, these investigators reported positive associations in the medOF and PRC, and negative associations in PHC, SMG, ANG, and MFG. The results contradicted our null meta-analysis result for extraversion. The discrepancies might derive from, first, the difference in the studies that were included across the two meta-analyses. First, Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019) only included VBM studies, whereas the present study included both VBM (n = 14) and SBM (n = 4) studies. Thus, four studies using SBM were not included in Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019). Second, Grodin and White (Reference Grodin and White2015) were excluded from the present study due to the personality instrument this study used. This study used two subscales (Social Potency and Social Closeness) from the Multidimensional Personality Questionnaire Brief Form as proxy of extraversion. However, we determined that this instrument does not align with the same conceptual structure of the global extraversion of the Big Five and therefore excluded this study. Third, two studies that used different image processing were by DeYoung et al. (Reference DeYoung, Hirsh, Shane, Papademetris, Rajeevan and Gray2010) (which did not perform segmentation in the preprocessing) and by Yasuno et al. (Reference Yasuno, Kudo, Yamamoto, Matsuoka, Takahashi, Iida and Kishimoto2017) (which used a T1w/T2w ratio signal), and were not included in Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019). In addition, Nostro et al. (Reference Nostro, Müller, Reid and Eickhoff2017), which was included in Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019), were excluded from the present study due to non-independent samples overlapping with another larger study (Owens et al., Reference Owens, Hyatt, Gray, Carter, MacKillop, Miller and Sweet2019). The second difference between the present study and Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019) is the meta-analytic software versions used. Lai et al. (Reference Lai, Wang, Zhao, Zhang, Yang and Gong2019) used the previous version of SDM, Anisotropy Effect-Size Seed-based d Mapping (AES-SDM) (Radua et al., Reference Radua, Mataix-Cols, Phillips, El-Hage, Kronhaus, Cardoner and Surguladze2012Reference Radua, Rubia, Canales-Rodríguez, Pomarol-Clotet, Fusar-Poli and Mataix-Cols2014), whereas the present study used the latest version, SDM-PSI. The major improvement of SDM-PSI is the implementation of multiple imputations of study images to avoid the bias from the single imputation and a less biased estimation of population effect size, and it is considered more robust than AES-SDM (Albajes-Eizagirre, Solanes, Vieta, et al., Reference Albajes-Eizagirre, Solanes, Vieta and Radua2019).

3.2. Possible explanations of heterogeneous findings

Several possible explanations may account for discrepancies across the studies, including, but not limited to, (1) sample heterogeneity, (2) Big Five personality instruments, (3) structural image data acquisition, processing, and analytic strategies, (4) statistical approach and statistical significance threshold, and (5) the heterogeneous nature of personality and brain structure. The following sections discuss the above-listed factors in greater detail.

3.2.1. Sample heterogeneity

Sample characteristics that potentially contribute to highly heterogeneous results across the literature include mean age and age range, sex, and the inclusion of patient cohorts and different levels of personality traits across samples.

3.2.1.1. Age

From the systematic review, age correlated negatively with neuroticism, extraversion, and openness, but positively with agreeableness (note that not all studies that examined association between age and personality traits reported significant associations, as shown in Table S18). Those qualitative observations are consistent with previous population-based cross-sectional and mean-level studies (Allemand et al., Reference Allemand, Zimprich and Hertzog2007; Donnellan & Lucas, Reference Donnellan and Lucas2008). We examined whether age could account for heterogeneous meta-analysis results for all five traits using meta-regression, but we did not observe any significant age effect on any of the meta-analyses for five traits. However, the lack of a significant effect may be due, in part, to a narrow age range across the samples, which mainly consisted of adults aged 18 – 40 (Figure 4). This may hinder the generalizability of the results. Only one study (Nickson et al., Reference Nickson, Chan, Papmeyer, Romaniuk, Macdonald, Stewart and Whalley2016) examined longitudinal changes of personality traits and two studies (Nickson et al., Reference Nickson, Chan, Papmeyer, Romaniuk, Macdonald, Stewart and Whalley2016; Taki et al., Reference Taki, Thyreau, Kinomura, Sato, Goto, Wu and Fukuda2013) examined longitudinal changes of GMV. Nickson et al. (Reference Nickson, Chan, Papmeyer, Romaniuk, Macdonald, Stewart and Whalley2016) observed no association between AMY GMV changes and neuroticism and extraversion changes over an average of two-year interval among a mixed sample of patients with major depressive disorder and healthy controls. However, considering the small-to-moderate sample size and heterogeneous composition of the sample, future research with longitudinal study design is required to explore the causal relationship between age, personality traits, and brain structures. Furthermore, none of the included studies considered non-linear associations between age and personality traits, despite evidence in support of such a relation (Donnellan & Lucas, Reference Donnellan and Lucas2008; Terracciano et al., Reference Terracciano, McCrae, Brant and Costa2005). For example, a curvilinear association was reported between age and conscientiousness, such that the highest scores were observed in middle adulthood (Donnellan & Lucas, Reference Donnellan and Lucas2008). Future research with the consideration of non-linear nature of age and personality traits is required to delineate the relationship.

Figure 4. Sample Mean Age and Age Range Distribution of Studies Included in the Systematic Review and Meta-analysis across Big Five Personality Traits and Three Brain Indices. Note. The study (y-axis) was ordered by the mean age (dot) from each study. Studies were separately labeled as “(hc)/(pt)/(hc/pt)” indicating results from the given study were reported separately for healthy and patient groups or combining healthy and patient groups. Not all studies provided the information for mean age or age range, thus, data from those studies were presented incompletely or not presented. Two red dashed vertical lines indicating the age of 18 and 65

3.2.1.2. Sex

From our systematic review, females reported higher levels of personality measures across all five traits, with the exception of extraversion from Omura et al. (Reference Omura, Constable and Canli2005) and openness from Gray et al. (Reference Gray, Owens, Hyatt and Miller2018) (note that not all studies that examined sex difference in personality traits reported significant difference, as shown in Table S19), and this observation is consistent with a previous population-based cross-sectional study (Soto et al., Reference Soto, John, Gosling and Potter2011). Interestingly, we observed that the mean ages of the participants were younger in Omura et al. (Reference Omura, Constable and Canli2005) and Gray et al. (Reference Gray, Owens, Hyatt and Miller2018), compared to other studies that reported higher levels of extraversion and openness in females. This observation is also partially consistent with the observations from Soto et al. (Reference Soto, John, Gosling and Potter2011), which showed that, on average, sex differences in personality traits vary across difference ages. Among all included studies in the systematic review (across five traits and three brain indices), 14 studies examined sex-dependent associations between the Big Five and brain structure. Two main approaches were used, including conducting trait–brain analysis separately for females and males and conducting sex-by-trait interaction analysis. As summarized in the previous section, the results are inconsistent across those studies, such that some studies reported trait–brain associations only in females (Hu et al., Reference Hu, Erb, Ackermann, Martin, Grodd and Reiterer2011), only in males (Hu et al., Reference Hu, Erb, Ackermann, Martin, Grodd and Reiterer2011; Montag, Eichner, et al., Reference Montag, Eichner, Markett, Quesada, Schoene-Bake, Melchers and Reuter2013; Nostro et al., Reference Nostro, Müller, Reid and Eickhoff2017), or in neither group (Knutson et al., Reference Knutson, Momenan, Rawlings, Fong and Hommer2001; Liu et al., Reference Liu, Weber, Reuter, Markett, Chu and Montag2013; Wright et al., Reference Wright, Williams, Feczko, Barrett, Dickerson, Schwartz and Wedig2006Reference Wright, Feczko, Dickerson and Williams2007), and interaction analyses produced similarly conflicting results ((Blankstein et al., Reference Blankstein, Chen, Mincic, McGrath and Davis2009; Cremers et al., Reference Cremers, van Tol, Roelofs, Aleman, Zitman, van Buchem and van der Wee2011; Sweeney et al., Reference Sweeney, Tsapanou and Stern2019versus (Bjørnebekk et al., Reference Bjørnebekk, Fjell, Walhovd, Grydeland, Torgersen and Westlye2013; J. C. Gray et al., Reference Gray, Owens, Hyatt and Miller2018; Lewis et al., Reference Lewis, Dickie, Cox, Karama, Evans, Starr and Deary2018; Wang et al., Reference Wang, Zhao, Li, Wang, Luo and Gong2019)). We then examined whether sex (using the proportion of females) could account for heterogeneous meta-analysis results for all five traits using meta-regression, but again we did not observe significant effects. The potential explanations for sex difference include sex-related hormonal variability (De Vries, Reference De Vries2004), early biological and social developmental trajectory (Blankstein et al., Reference Blankstein, Chen, Mincic, McGrath and Davis2009; Goldstein et al., Reference Goldstein, Seidman, Horton, Makris, Kennedy, Caviness and Tsuang2001), and social processing differences in response to the environment (Wager et al., Reference Wager, Phan, Liberzon and Taylor2003). However, due to the mixed results from the literature, future research should take sex difference into account when examining the association between personality traits and brain structures.

3.2.1.3. Inclusion of patient cohorts and different levels of personality traits across samples

From the systematic review studies (across five traits and three brain indices), 14 studies included patient cohorts, as summarized in Table S17. Most studies reported higher mean level of neuroticism and lower mean level of extraversion and conscientiousness in patient cohorts, compared to healthy individuals (note that not all studies reported group differences in these three traits, as shown in Table S17), whereas no mean-level differences were reported for agreeableness and openness. Among those 14 patient cohort studies, three studies reported group differences in trait–brain associations, noting opposite associations between patients and healthy participants. It is possible that different levels of personality traits between patient and healthy groups contribute to the conflicting trait–brain associations. For example, a higher mean level of neuroticism was observed among patients with alcohol use disorder, compared to healthy participants (Zhao et al., Reference Zhao, Zheng and Castellanos2017). However, no group mean-level difference was observed in Nair et al. (Reference Nair, Beniwal-Patel, Mbah, Young, Prabhakaran and Saha2016) and Moayedi et al. (Reference Moayedi, Weissman-Fogel, Crawley, Goldberg, Freeman, Tenenbaum and Davis2011). Alternative explanations include symptoms associated with the given medical or psychiatric conditions and brain structural differences underlying those conditions. To remove the potential effect from patient cohorts for meta-analysis, we conducted a sub-group meta-analysis excluding patient cohort studies and the result from neuroticism (this is the only trait that with patient cohort study in meta-analysis) remained unchanged.

Considering the potential influence of levels of personality traits across studies among non-patient studies, we compared the mean scores of personality trait measures among systematic review studies that reported contradictory associations. For example, two included studies reported associations in opposite directions between openness and PCC GMV, with higher mean level of openness reported in Yasuno et al. (Reference Yasuno, Kudo, Yamamoto, Matsuoka, Takahashi, Iida and Kishimoto2017) compared to Kitamura et al. (Reference Kitamura, Yasuno, Yamamoto, Kazui, Kudo, Matsuoka and Kishimoto2016). On the other hand, another two studies that reported associations in opposite directions between extraversion and MFG GMV reported comparable mean-level extraversion scores (Blankstein et al., Reference Blankstein, Chen, Mincic, McGrath and Davis2009; Coutinho et al., Reference Coutinho, Sampaio, Ferreira, Soares and Gonçalves2013). However, due to differences in personality trait instruments and scoring methods (e.g., some studies reported raw score, whereas some studies reported T scores), lacking reporting of personality scores in some studies, it is difficult to determine whether the levels of personality traits might play a role in conflicting results we observed from the included studies.

3.2.2. Heterogeneity in big five trait instruments

The use of different personality trait instruments might contribute to discrepancies across included studies. The most commonly used instruments were the Neuroticism, Extraversion, Openness Personality Inventory – Revised (NEO-PI-R) and the NEO-Five-Factor Inventory (NEO-FFI, short version of NEO-PI-R). Other instruments included the International Personality Item Pool (IPIP), Eysenck Personality Questionnaire (EPQ), Big Five Inventory (BFI), Big Five Structure Inventory (BFSI), Big Five Aspects Scale (BFAS), and 16 Personality Factor test (16 PF). Although studies showed high correlations between different instruments, some trait scales showed only low-to-moderate correlations (Gow et al., Reference Gow, Whiteman, Pattie and Deary2005). We examined whether the use of different instruments (NEO versus non-NEO) could account for heterogeneous meta-analysis results using sub-group analysis with only studies using NEO (either NEO-PI-R or NEO-FFI), but we did not observe significant results for all five traits. In addition, all the instruments listed were self-report questionnaires. Studies suggested combined observation- and interview-based, informant report (Connolly et al., Reference Connolly, Kavanagh and Viswesvaran2007; Hyatt et al., Reference Hyatt, Owens, Gray, Carter, MacKillop, Sweet and Miller2019) or to use physiological responses (Taib et al., Reference Taib, Berkovsky, Koprinska, Wang, Zeng and Li2020) to better capture the complex construct of personality and avoid the bias from self-report.

3.2.3. Heterogeneity in structural image data acquisition, processing, and analytic strategies

The heterogeneities of the structural image data acquisition, (pre)processing, and analytic strategies may also have contributed to discrepancies across studies.

3.2.3.1. Structural image data acquisition and processing

The use of different MRI scanners, scanner magnetic field strength, voxel size, and smoothing kernel could result in differences in image spatial resolution and signal-to-noise ratio. In addition, the use of VBM versus SBM processing methods might lead to inconsistent results. Although Kunz et al. (Reference Kunz, Reuter, Axmacher and Montag2017) reported highly correlated total GMV results between VBM and SBM processing methods, none of the included studies directly compared VBM and SBM for regional structural results. However, the small number (18 VBM and 5 SBM studies across five traits) of studies we could include in the meta-regression limits any strong conclusions.

3.2.3.2. Structural image data analytic approaches

Different levels of structural analysis, whole-brain versus region-of-interest (ROI), could contribute to the heterogeneous results. Note that only studies using whole-brain voxel-/vertex-wise analysis and the same threshold across the whole brain were included in the meta-analysis to avoid the bias derived from regions with more liberal threshold (i.e., a prior ROI analysis) (Albajes-Eizagirre, Solanes, Fullana, et al., Reference Albajes-Eizagirre, Solanes, Fullana, Ioannidis, Fusar-Poli, Torrent and Radua2019; Q. Li et al., Reference Li, He, Zhuang, Wu, Sun, Wei and Qiu2020), which makes a direct comparison between whole-brain and ROI studies difficult. Li et al. (Reference Li, Yan, Li, Wang, Li, Li and Li2017) made a direct comparison between whole-brain vertex-wise and whole-brain regional parcellation-based analyses and reported inconsistent results (i.e., different traits were associated with different brain indices in different regions) from the same group of participants. The authors suggested that the two approaches provide different advantages, such that vertex-wise approach could potentially give more accurate localizations, whereas parcellation-based approach could potentially achieve higher test-retest reliability across populations/studies. Furthermore, the selection of the atlas to label brain region for a given peak coordinates (for whole-brain voxel-/vertex-wise studies) or to extract mean value for pre-defined ROIs (for ROI studies) adds another layer of heterogeneity, as the same voxel/vertex coordinates may be labeled differently across atlases. Atlas used by the included studies can be found in Supplementary Tables S2 - S16. Future research should utilize alternative approaches, such as voxel-/vertex-based and parcellation-based approaches, to evaluate the reliability of the results.

The present study was limited to studies examining brain structure using T1-weighted structural MRI. Alternatively, brain structure can be measured by diffusion MRI. By measuring diffusivity of the water molecules, diffusion imaging allows an indirect way to measure white matter fiber structure (Mori & Zhang, Reference Mori and Zhang2006) and it has been implemented in the field of personality neuroscience (e.g., Avinun et al., Reference Avinun, Israel, Knodt and Hariri2020; Bjørnebekk et al., Reference Bjørnebekk, Fjell, Walhovd, Grydeland, Torgersen and Westlye2013; Privado et al., Reference Privado, Román, Saénz-Urturi, Burgaleta and Colom2017; Ueda et al., Reference Ueda, Kakeda, Watanabe, Sugimoto, Igata, Moriya and Korogi2018; Xu & Potenza, Reference Xu and Potenza2012). Furthermore, beyond single voxel/vertex and single parcellated region, connectome and network approaches may offer promising alternative ways to investigate patterns of brain structures and their associations with personality (Markett et al., Reference Markett, Montag and Reuter2018). Network approaches not only measure characteristics of nodes (brain regions) and edges (connections between brain regions) within and between the brain networks but also measure the local and global organization of the brain networks (Sporns & Zwi, Reference Sporns and Zwi2004). An optimal connectome approach may be achieved by implementing both high-resolution structural and diffusion MRI images (Gong et al., Reference Gong, He, Chen and Evans2012; Sporns et al., Reference Sporns, Tononi and Kötter2005).

3.2.4. Heterogeneity in statistical approach and statistical significance threshold

The use of different statistical approaches and statistical significance thresholds might contribute to discrepancies across studies.

3.2.4.1. Covariates in model specification

For model specification, commonly used covariates include age, sex, and global brain indices. Among the included studies in the systematic review (across five traits and three brain indices), covariates included age (n = 55 studies), sex (n = 47), TGMV/mean CT (n = 13), total brain volume (TBV) (n = 9), intracranial volume (ICV) (n = 26). Other covariates included intelligence (n = 7) and education (n = 3). Studies have directly examined influence of the covariates and suggested that the associations between personality traits and brain structures change dramatically (Hu et al., Reference Hu, Erb, Ackermann, Martin, Grodd and Reiterer2011; Hyatt et al., Reference Hyatt, Owens, Crowe, Carter, Lynam and Miller2020). For example, Hu et al. (Reference Hu, Erb, Ackermann, Martin, Grodd and Reiterer2011) reported different trait–GMV associations when controlling for different combinations of age, sex, and TGMV, and Hyatt et al. (Reference Hyatt, Owens, Crowe, Carter, Lynam and Miller2020) reported remarked changes from the inclusion of ICV as covariate in statistical significance of the relation between various psychological variables (e.g., personality, psychopathology, cognitive processing) and regional GMV. In addition to demographic covariates, some studies also controlled for other personality traits (n = 15 across five traits among systematic review studies). Statistically speaking, the “unique association” of a given trait by including the other traits as covariates seemed to be reasonable, but whether the inclusion of other traits as covariates is still under debate, as some studies argued that the interpretation of “partial association” might not be straightforward (Lynam et al., Reference Lynam, Hoyle and Newman2006; Sleep et al., Reference Sleep, Lynam, Hyatt and Miller2017) and might fail to capture the inter-correlations between personality traits (e.g., Gray et al., Reference Gray, Owens, Hyatt and Miller2018; Holmes et al., Reference Holmes, Lee, Hollinshead, Bakst, Roffman, Smoller and Buckner2012; Liu et al., Reference Liu, Weber, Reuter, Markett, Chu and Montag2013). Profile- or cluster-based approaches have been proposed as an alternative way to capture the inter-dependency of personality traits (e.g., Gerlach et al., Reference Gerlach, Farb, Revelle and Nunes Amaral2018; Y. Li et al., Reference Li, He, Zhuang, Wu, Sun, Wei and Qiu2020; Mulders et al., Reference Mulders, Llera, Tendolkar, van Eijndhoven and Beckmann2018). To assess whether the inclusion of different covariates could account for heterogeneous meta-analysis results, we conducted a series of meta-regression analyses with the inclusion of (1) ICV, (2) TGMV, (3) any global brain indices (ICV, TGMV, or TBV), (4) other personality traits as covariates, and (5) the total number of covariates. Our results did not change as a function of any of these variables, although this may also reflect the small number of studies that fulfilled relevant selection criteria. We suggest that future research and future synthesis work should take the inclusion of covariates into account.

3.2.4.2. Significance threshold and multiple comparison correction

The choice of statistical significance threshold for reporting results should also be considered. Various levels of threshold were used among the included studies, including uncorrected versus corrected for multiple comparison, voxel-/vertex-level versus cluster-level correction, and different multiple comparison correction methods (e.g., family-wise error, Monte Carlo simulated, non-stationary, false discovery rate). Meta-analytic null results from this study may be due, in part, to positive results from studies that applied liberal statistical thresholds to data with small effect sizes, which were not robust enough to be replicable.

3.2.5. Heterogeneous nature of personality and brain structure

3.2.5.1. Replication challenges

Direct replication efforts in studies of personality and brain structure remain scarce to date. Replication was directly assessed in Owens et al. (Reference Owens, Hyatt, Gray, Carter, MacKillop, Miller and Sweet2019) by comparing results with their earlier study (Riccelli et al., Reference Riccelli, Toschi, Nigro, Terracciano and Passamonti2017), using data from the same dataset (i.e., HCP). Owens et al. (Reference Owens, Hyatt, Gray, Carter, MacKillop, Miller and Sweet2019) demonstrated that not all results were replicated from the replication sample or the full sample. The sample characteristics, personality and image data acquisition, and processing were almost identical across those two studies, suggesting that other explanations than differences in sample characteristic and methodologies should be considered, according to Owens et al. (Reference Owens, Hyatt, Gray, Carter, MacKillop, Miller and Sweet2019).

3.2.5.2. Heterogeneous nature and individual difference

Finally, we consider the complex and heterogeneous nature of both personality and brain structures. First, a large number of brain regions were reported to be associated with one or more personality traits, and this observation might suggest that personality is constructed by many small effects from different brain regions (M. Li et al., Reference Li, Wei, Yang, Zhang and Qiu2019; Montag, Reuter, et al., Reference Montag, Reuter, Jurkiewicz, Markett and Panksepp2013; Owens et al., Reference Owens, Hyatt, Gray, Carter, MacKillop, Miller and Sweet2019). Second, most conclusions from the literature were drawn from group mean levels, which ignored individual differences. Studies have demonstrated the influences of individual differences on cross-sectional and longitudinal changes in personality traits (Allemand et al., Reference Allemand, Zimprich and Hertzog2007; Lüdtke et al., Reference Lüdtke, Trautwein and Husemann2009). Both genetic and environment factors have been suggested to contribute to the heterogeneities. For example, heritability of personality and regional brain structures has been suggested to contribute to heterogeneous associations between the two (Nostro et al., Reference Nostro, Müller, Reid and Eickhoff2017; Valk et al., Reference Valk, Hoffstaedter, Camilleri, Kochunov, Yeo and Eickhoff2020). On the other hand, ample research has also demonstrated that both personality and brain structure are susceptible to change by the environment and experiences (e.g., Montag, Reuter, et al., Reference Montag, Reuter, Jurkiewicz, Markett and Panksepp2013; Roberts & Mroczek, Reference Roberts and Mroczek2008). Third, considering the highly heterogeneous nature and individual differences shaped by genetic and environment in personality traits, it is also possible that the global dimension of the Big Five personality traits is too broad to have the universal representation in the brain. Based on NEO-PI-R (Costa & McCrae, Reference Costa and McCrae1992), each of the five personality traits is constructed by six facets. Studies have demonstrated that some trait facets contribute stronger, relative to the remaining facets for a given global trait, to the association between a certain global trait and brain structure (Bjørnebekk et al., Reference Bjørnebekk, Fjell, Walhovd, Grydeland, Torgersen and Westlye2013; M. Li et al., Reference Li, Wei, Yang, Zhang and Qiu2019). However, only a few studies included in the systematic review conducted additional facet analysis. Future research is recommended to examine the facets and variances of personality traits and brain structures and studies with regular follow-up are needed to evaluate the longitudinal changes.

3.2.5.3. Considerations of statistical approach and power

The highly heterogeneous nature of personality and brain structure also raises the concern of statistical power of the previous literature to detect reliable associations between psychological phenotype and brain structure (Masouleh et al., Reference Masouleh, Eickhoff, Hoffstaedter and Genon2019). An important aspect of statistical power relates to the image data analytic approach used. Although a voxel- or vertex-based approach could potentially provide more precise localization (T. Li et al., Reference Li, Yan, Li, Wang, Li, Li and Li2017), this also raises the concern of overestimating the statistical effect based on a peak voxel/vertex (Allen & DeYoung, Reference Allen, DeYoung and Widiger2017; DeYoung et al., Reference DeYoung, Hirsh, Shane, Papademetris, Rajeevan and Gray2010). On the other hand, with the advantage of improving signal-to-noise ratio, improving test-retest reliability, and reducing the number of variables, the use of whole-brain parcellation-based approach has been increased (e.g., Eickhoff et al., Reference Eickhoff, Constable and Yeo2018; Hyatt et al., Reference Hyatt, Owens, Gray, Carter, MacKillop, Sweet and Miller2019; T. Li et al., Reference Li, Yan, Li, Wang, Li, Li and Li2017). Future research should carefully weigh the advantages and limitations of different image analytic approaches and possibly report on the congruency of their findings across multiple analysis methodologies.

3.3. Does a meaningful relation between the big five and brain structure really exist?

Having addressed several plausible factors contributing to heterogeneous findings and replication failure, we also consider the possibility that there is no meaningful relation between the Big Five personality traits and brain structure. Indeed, consistent null-to-very-small associations between the Big Five personality and brain structures have been reported by recent large-scale studies with over 1,100 participants (Avinun et al., Reference Avinun, Israel, Knodt and Hariri2020; Gray et al., Reference Gray, Owens, Hyatt and Miller2018). For example, Avinun et al. (Reference Avinun, Israel, Knodt and Hariri2020) investigated both global and facet levels of Big Five personality with structural indices from whole-brain parcellation (cortical CT, cortical SA, and subcortical GMV) and reported only conscientiousness (R 2  = .0044) (and its dutifulness facet (R 2  = .0062)) showed a small association with regional SA in superior temporal region. A recent study by Hyatt et al. (Reference Hyatt, Sharpe, Owens, Listyg, Carter, Lynam and Miller2021) of different levels of personality (from meta-traits, global Big Five traits, facets, and individual NEO-FFI items) and different levels of structural measures (from global brain measures to regional cortical and subcortical parcellations) reported that even the largest association (between Intellect facet of openness and global brain measures) yielded a mean effect size that was less than .05 (estimated by R 2 ). As we discussed above, alternative ways to assess brain structural features, such as connectome and network approaches, exist, and these may be better suited to map correspondences to complex traits than traditional approaches.

Our review of the literature was limited to studies of the Big Five personality model, which emerged from lexical analyses. The model serves the descriptive purpose but is not necessary explanatory (Deyoung & Gray, Reference Deyoung, Gray, Corr and Matthews2009; Montag & Davis, Reference Montag and Davis2018). Thus, there is no a priori reason why these constructs should map neatly onto biological systems, although the Big Five traits are associated with biologically based constructs such as Gray’s Reinforcement Sensitivity Theory (RST) and Behavioral Inhibition and Approach System (BIS/BAS) (McNaughton & Corr, Reference McNaughton and Corr2004), Panksepp’s Affective Neuroscience Theory (ANT) (Montag et al., Reference Montag, Elhai and Davis2021; Montag & Davis, Reference Montag and Davis2018), and the dimensions of extraversion and neuroticism in Eysenck’s model (Eysenck, Reference Eysenck1967). For example, Vecchione and colleagues (Reference Vecchione, Ghezzi, Alessandri, Dentale and Corr2021), applied a latent-variable analysis approach to a sample of 330 adults who completed both Carver and White’s BIS/BAS and the Big Five Inventory. These authors found that BIS correlated with emotional stability (inverse of neuroticism) and that BAS correlated with extraversion, after controlling for higher-order factors. Moreover, Montag and Panksepp (Reference Montag and Panksepp2017) demonstrated that seven ANT primary emotional systems are respectively associated with at least one global dimension of the Big Five, such as FEAR, ANGER, and SADNESS with neuroticism, PLAY with extraversion, CARE and ANGER with agreeableness, and SEEKING with openness. Considering that Big Five model closely maps onto biological motivational and emotional systems, future work should include side-by-side comparisons and integrate across conceptual models of personality structure (such as Big Five, RST, and ANT) to provide a comprehensive picture of personality. This could be an iterative process by which future personality models would continue to be refined by neural data and guide the next generation of imaging and other biological (e.g., genetic) studies.

3.4. Limitations

Some limitations should be noted when interpreting the present systematic review and meta-analysis results. First, one of the major challenges of meta-analysis is the trade-off between meta-analysis power and homogeneity of the included studies (Müller et al., Reference Müller, Cieslik, Laird, Fox, Radua, Mataix-Cols and Eickhoff2018). Although no study, to our knowledge, has empirically evaluated the minimal number of studies required for meta-analysis using SDM, Eickhoff et al. (Reference Eickhoff, Nichols, Laird, Hoffstaedter, Amunts, Fox and Eickhoff2016) suggested that between 17 to 20 studies are required to achieve adequate power using activation likelihood estimator (ALE) meta-analysis, although whether this result is transferrable to SDM meta-analysis remains to be determined. From the present meta-analysis for five personality traits and GMV studies, we maximized the number of studies by including heterogeneous studies such that studies with patient cohorts, measuring GMD, using T1w/T2w ratio signals, and so on, and we demonstrated that the results remained unchanged with more homogeneous sub-group meta-analysis by excluding the abovementioned studies. Second, the present meta-analysis results were derived from the reported peaks, rather than raw data, and this limits our evaluation of variability within each individual study. Lastly, the present review only included peer-reviewed articles. “Grey literature,” which refers to studies not captured by traditional database and/or commercial publishers, should be also considered to avoid bias when synthesizing the literature (Cooper et al., Reference Cooper, Hedges and Valentine2009). Previous review has found that the peer-reviewed published works had average greater effects and more significant results, compared to unpublished works like theses and dissertations (McLeod & Weisz, Reference McLeod and Weisz2004; Webb & Sheeran, Reference Webb and Sheeran2006). It is therefore unlikely that our general conclusions of lacking associations between the Big Five and structural brain measures would be altered by the inclusion of unpublished studies. Future researchers are encouraged to include studies from various sources and to carefully evaluate the quality of all works to provide reliable review.

3.5. Implication and conclusion

To our knowledge, this is the first study to systematically evaluate the entire published literature of the association between the Big Five personality traits and three brain structural indices, using a combination of qualitative and quantitative approaches. Qualitative results suggested highly heterogeneous findings, and the quantitative results found no replicable results across studies. Our discussion pointed out methodological limitations, the dearth of direct replications, as well as gaps in the extant literature, such as limited data on trait facets, on brain-personality associations across the life span, and on sex differences.

When it comes to the relation of Big Five personality and structural brain measures, the field of Personality Neuroscience may have come to a crossroads. In fact, the challenge of finding meaningful and replicable brain–behavior relations is not unique to Big Five personality traits. The same challenge has also emerged in other psychological constructs, including, but not limited to, intelligence and cognition (e.g., attention, executive function), psychosocial processes (e.g., political orientation, moral), and psychopathology (e.g., anxiety, internalizing, externalizing) (Boekel et al., Reference Boekel, Wagenmakers, Belay, Verhagen, Brown and Forstmann2015; Genon et al., Reference Genon, Wensing, Reid, Hoffstaedter, Caspers, Grefkes and Eickhoff2017; Marek et al., Reference Marek, Tervo-Clemmens, Calabro, Montez, Kay, Hatoum and Dosenbach2020; Masouleh et al., Reference Masouleh, Eickhoff, Hoffstaedter and Genon2019). On the one hand, the lack of any significant associations discourages further efforts down this path, as resources may be better spent on following other leads. On the other hand, we suggested several ways to strengthen future work investigating personality–brain structure associations. Consilience may be attained by parallel processing: Expanding upon next-generation structural imaging and analysis approaches, while developing new models of personality informed by cutting-edge data prescribing biological constraints. This may be best accomplished by coming to a consensus as a field on how we can best strengthen methodological rigor and replicability and creating an incentive structure that rewards large-scale consortia building in parallel to smaller-scale creative innovations in methods and constructs.

It is relatively rare for women to commit suicide while nude, and female nudity at the time of death is strongly related to the death being a homicide

Homicide or Suicide: How Nudity Factors into This Determination. Sarah W. Craun et al. Homicide Studies, May 3, 2021. https://doi.org/10.1177/10887679211013071

Abstract: Anecdotal reports of deceased celebrities being found nude abound, yet research is lacking regarding the frequency of nudity at death. Moreover, it is unknown if nudity at the time of death is a useful investigative clue or a distracting non-factor in equivocal death cases. This study used data from 119,145 homicides and suicides reported to the Centers for Disease Control to explore victim nudity, prior life stressors, and demographics on the likelihood of a death being a homicide or a suicide. Logistic regression results indicate that a female victim being found nude is a strong indicator of homicide.

Keywords: equivocal deaths, investigation, policing, crime scene, policing, nudity, suicides


Tuesday, August 9, 2022

Swearing is stereotypically associated with socially undesirable traits and behaviors, including limited verbal ability, but the authors observed positive associations between swear word fluency and verbal fluency & vocabulary

Swear Word Fluency, Verbal Fluency, Vocabulary, Personality, and Drug Involvement. Anna-Kaisa Reiman and Mitchell Earleywine. Journal of Individual Differences, Aug 8 2022. https://doi.org/10.1027/1614-0001/a000379

Abstract. Swearing is stereotypically associated with socially undesirable traits and behaviors, including limited verbal ability, disagreeable personality, and alcohol use. We sought to demonstrate that, contrary to such stereotypes, swear word fluency (i.e., ability to generate swear words) does not arise from a lack of verbal skills. We also explored whether swear word fluency might serve as an index of personality traits related to drug use. Accordingly, we conducted a preregistered study in which 266 undergraduates at a US university (Mage = 19.36; 66.9% self-identified as women and 49.6% as White) completed measures of swear word fluency, verbal fluency (i.e., overall ability to generate words), vocabulary, Big Five traits, sensation seeking, and drug use. We observed positive associations between swear word fluency and verbal fluency, vocabulary, Openness, and Extraversion, and a negative association with Agreeableness. Moreover, swear word fluency accounted for unique variance in self-reported drug use over and above that accounted for by personality and general verbal ability. Swear word fluency might serve as one of few tasks where higher scores predict more drug involvement, justifying further work linking this measure with other aspects of personality and drug use.


Belief change can be understood as an economic transaction in which the multidimensional utility of the old & new beliefs are compared; change will occur when potential outcomes alter across attributes

Why and When Beliefs Change. Tali Sharot et al. Perspectives on Psychological Science, August 8, 2022. https://journals.sagepub.com/doi/abs/10.1177/17456916221082967

Abstract: Why people do or do not change their beliefs has been a long-standing puzzle. Sometimes people hold onto false beliefs despite ample contradictory evidence; sometimes they change their beliefs without sufficient reason. Here, we propose that the utility of a belief is derived from the potential outcomes associated with holding it. Outcomes can be internal (e.g., positive/negative feelings) or external (e.g., material gain/loss), and only some are dependent on belief accuracy. Belief change can then be understood as an economic transaction in which the multidimensional utility of the old belief is compared against that of the new belief. Change will occur when potential outcomes alter across attributes, for example because of changing environments or when certain outcomes are made more or less salient.

Keywords: belief, decision-making, value, confidence, metacognition


Ungated version: https://psyarxiv.com/q75ej

Highlights

• The value of a belief is derived from the potential outcomes of holding it. Some of these are dependent on whether a belief is accurate, and some are not. Some are internal to the individual (e.g., positive/negative feelings) and some external (e.g., material gain/loss).

• Belief change can be understood as a process of comparing the multidimensional value of an old belief to that of a new belief and changing beliefs when the latter is greater.

• Changing environments can lead to changes in the potential outcomes of the belief, leading to significant changes in a belief’s utility, which can lead to belief change.

• The confidence people hold about different dimensions of a belief affect whether they seek new information about those dimensions, affecting the likelihood of belief change.


When instruction moved online during the COVID-19 epidemic, the grades of attractive female students deteriorated in non-quantitative subjects; however, the beauty premium persisted for males

Student beauty and grades under in-person and remote teaching. Adrian Mehic. Economics Letters, August 6 2022, 110782. https://doi.org/10.1016/j.econlet.2022.110782


Highlights

I examine the relationship between university students’ appearance and grades.

When education is in-person, attractive students receive higher grades.

The effect is only present in courses with significant teacher-student interaction.

Grades of attractive females declined when teaching was conducted remotely.

For males, there was a beauty premium even after the switch to online teaching.


Abstract: This paper examines the role of student facial attractiveness on academic outcomes under various forms of instruction, using data from engineering students in Sweden. When education is in-person, attractive students receive higher grades in non-quantitative subjects, in which teachers tend to interact more with students compared to quantitative courses. This finding holds both for males and females. When instruction moved online during the COVID-19 pandemic, the grades of attractive female students deteriorated in non-quantitative subjects. However, the beauty premium persisted for males, suggesting that discrimination is a salient factor in explaining the grade beauty premium for females only.


JEL: D91I23J16Z13

Keywords: AttractivenessBeautyCOVID-19Discrimination


1. Introduction

It is well-known that physical appearance is an important predictor for success in life. Attractive people are more satisfied with their lives, earn higher wages and grades, and are less likely to engage in criminal activity (Mocan and Tekin, 2010Hamermesh, 2011). However, the explanation for the beauty premium is subject to debate, where the traditional viewpoint according to which it is a consequence of taste-based discrimination (Hamermesh and Biddle, 1994Scholz and Sicinski, 2015) is increasingly challenged by findings suggesting that beauty is a productive attribute (Cipriani and Zago, 2011Stinebrickner et al., 2019). As an example of the latter, attractive individuals are likely to be more self-confident, which can positively affect human capital formation (Mobius and Rosenblat, 2006).

In this paper, I use data from mandatory courses within a Swedish engineering program to examine the role of student facial attractiveness on university grades. I first consider academic outcomes when education is in-person, and the faces of students are readily available to teachers. The results suggest that beauty is positively related to academic outcomes, however, the results are only significant in non-quantitative courses, which to a greater extent rely on interactions between teachers and students. The beauty premium on grades in non-quantitative subjects hold for both male and female students. Then, using the COVID-19 pandemic as a natural experiment, and utilizing a difference-in-difference framework, I show that switching to full online teaching resulted in deteriorated grades in non-quantitative courses for attractive females. However, there was still a significant beauty premium for attractive males.

Taken together, these findings suggest that the return to facial beauty is likely to be primarily due to discrimination for females, and the result of a productive trait for males. The former result in line with the findings by Hernández-Julián and Peters (2017), while the latter is new to the literature. An advantage with the empirical strategy of this paper is that the switch to online teaching during the pandemic enables us to more credibly isolate the effect of appearance. This is because only the mode of instruction changed, and not the structure of the courses. Additionally, my identification strategy removes the problem of self-selection into courses.

[...] 

Cognitive training is completely ineffective in advancing cognitive function and academic achievement, but the field has maintained an unrealistic optimism about them

Cognitive Training: A Field in Search of a Phenomenon. Fernand Gobet, Giovanni Sala. Perspectives on Psychological Science, August 8, 2022. https://doi.org/10.1177/17456916221091830

Abstract: Considerable research has been carried out in the last two decades on the putative benefits of cognitive training on cognitive function and academic achievement. Recent meta-analyses summarizing the extent empirical evidence have resolved the apparent lack of consensus in the field and led to a crystal-clear conclusion: The overall effect of far transfer is null, and there is little to no true variability between the types of cognitive training. Despite these conclusions, the field has maintained an unrealistic optimism about the cognitive and academic benefits of cognitive training, as exemplified by a recent article (Green et al., 2019). We demonstrate that this optimism is due to the field neglecting the results of meta-analyses and largely ignoring the statistical explanation that apparent effects are due to a combination of sampling errors and other artifacts. We discuss recommendations for improving cognitive-training research, focusing on making results publicly available, using computer modeling, and understanding participants’ knowledge and strategies. Given that the available empirical evidence on cognitive training and other fields of research suggests that the likelihood of finding reliable and robust far-transfer effects is low, research efforts should be redirected to near transfer or other methods for improving cognition.

Keywords: cognitive training, meta-analysis, methodology, working memory training

As is clear from the empirical evidence reviewed in the previous sections, the likelihood that cognitive training provides broad cognitive and academic benefits is very low indeed; therefore, resources should be devoted to other scientific questions—it is not rational to invest considerable sums of money on a scientific question that has been essentially answered by the negative. In a recent article, Green et al. (2019) took the exact opposite of this decision—they strongly recommended that funding agencies should increase funding for cognitive training. This obviously calls for comments.

The aim of Green et al.’s (2019) article was to provide methodological recommendations and a set of best practices for research on the effect of behavioral interventions aimed at cognitive improvement. Among others, the addressed issues include the importance of distinguishing between different types of studies (feasibility, mechanistic, efficacy, and effectiveness studies), the type of control groups used, and expectation effects. Many of the points addressed in detail by Green et al. reflected sound and well-known research practices (e.g., necessity of running studies with sufficient statistical power, need for defining the terminology used, and importance of replications; see also Simons et al., 2016).

However, the authors made disputable decisions concerning central questions. These include whether superordinate terms such as “cognitive training” and “brain training” should be defined, whether a discussion of methods is legitimate while ignoring the empirical evidence for or against the existence of a phenomenon, the extent to which meta-analyses can compare studies obtained with different methodologies and cognitive-enhancement methods, and whether multiple measures should be used for a latent construct such as intelligence.

Lack of definitions

Although Green et al. (2019) emphasized that “imprecise terminology can easily lead to imprecise understanding and open the possibility for criticism of the field,” they opted to not provide an explicit definition of “cognitive training” (p. 4). Nor did they define the phrase “behavioral interventions for cognitive enhancement,” used throughout their article. Because they specifically excluded activities such as video-game playing and music (p. 3), we surmised that they used “cognitive training” to refer to computer tasks and games that aim to improve or maintain cognitive abilities such as WM. The term “brain training” is sometimes used to describe these activities, although it should be mentioned that Green et al. objected to the use of the term.

Note that researchers investigating the effects of activities implicitly or explicitly excluded by Green et al. (2019) have emphasized that the aim of those activities is to improve cognitive abilities and/or academic achievement, for example, chess (Jerrim et al., 2017Sala et al., 2015), music (Gordon et al., 2015Schellenberg, 2006), and video-game playing (Bediou et al., 2018Feng et al., 2007). For example, Gordon et al.’s (2015) abstract concluded by stating that “results are discussed in the context of emerging findings that music training may enhance literacy development via changes in brain mechanisms that support both music and language cognition” (p. 1).

Green et al. (2019) provided a rationale for not providing a definition. Referring to “brain training,” they wrote:

We argue that such a superordinate category label is not a useful level of description or analysis. Each individual type of behavioral intervention for cognitive enhancement (by definition) differs from all others in some way, and thus will generate different patterns of effects on various cognitive outcome measures. (p. 4)

They also noted that even using subcategories such as “working-memory training” is questionable. They did note that “there is certainly room for debate” (p. 4) about whether to focus on each unique type of intervention or to group interventions into categories.

In line with common practice (e.g., De Groot, 1969Elmes et al., 1992Pedhazur & Schmelkin, 1991), we take the view that definitions are important in science. Therefore, in this article, we have proposed a definition of “cognitive training” (see “Defining Terms” section above), which we have used consistently in our research.

Current state of knowledge and meta-analyses

A sound discussion of methodology in a field depends on the current state of knowledge in this field. Whereas Green et al. (2019) used information gleaned from previous and current cognitive-training research to recommend best practices (e.g., use of previous studies to estimate the sample size needed for well-powered experiments), they also explicitly stated that they will not discuss previous controversies. We believe that this is a mistake because, as just noted, the choice of methods is conditional on the current state of knowledge. In our case, a crucial ingredient of this state is whether cognitive-training interventions are successful—specifically, whether they lead to far transfer. One of the main “controversies” precisely concerns this question, and thus it is unwise to ignore it.

Green et al. (2019) were critical of meta-analyses and argued that studies cannot be compared:

For example, on the basic research side, the absence of clear methodological standards has made it difficult-to-impossible to easily and directly compare results across studies (either via side-by-side contrasts or in broader meta-analyses). This limits the field’s ability to determine what techniques or approaches have shown positive outcomes, as well as to delineate the exact nature of any positive effects – e.g., training effects, transfer effects, retention of learning, etc. (p. 3)

These comments wholly underestimate what can be concluded from meta-analyses. Like many other researchers in the field, Green et al. (2019) assumed that (a) the literature is mixed and, consequently, (b) the inconsistent results depend on differences in methodologies between researchers. However, assuming that there is some between-studies inconsistency and speculating on where this inconsistency stems from is not scientifically apposite (see “The Importance of Sampling Error and Other Artifacts” section above). Rather, quantifying the between-studies true variance (τ2) should be the first step to take.

Using latent factors

In the section “Future Issues to Consider With Regard to Assessments,” Green et al. (2019, pp. 16–17) raised several issues with using multiple measures for a given construct such as WM. This practice has been recommended by authors such as Engle et al. (1999) to reduce measurement error. Several of Green et al.’s arguments merit discussion.

A first argument is that using latent factors—as in confirmatory factor analysis—might hinder the analysis of more specific effects. This argument is incorrect because the relevant information is still available to researchers (see Kline, 2016Loehlin, 2004Tabachnik & Fidell, 1996). By inspecting factor loadings, one can examine whether the preassessment/postassessment changes (if any) affect the latent factor or only specific tests (this is a longitudinal-measurement-invariance problem). Green et al. (2019) seemed to equate multi-indicator composites (e.g., summing z scores) with latent factors. Composite measures are the result of averaging or summing across a number of observed variables and cannot tell much about any task-specific effect. A latent factor is a mathematical construct derived from a covariance matrix within a structural model that includes a set of parameters that links the latent factor to the observed variables. That being said, using multi-indicator composites would be an improvement compared with the current standards in the field.

A second argument is that large batteries of tests induce motivational and/or cognitive fatigue in participants, especially with particular populations. Although this may be true, for example with older participants, large batteries have been used in several cognitive-training studies, and participants were able to undergo a large variety of testing (e.g., Guye & von Bastian, 2017). Nevertheless, instead of assessing many different constructs, it may be preferable to focus on one or two constructs at a time (e.g., fluid intelligence and WM). Such a practice would help reduce the number of tasks and the amount of fatigue.

Another argument concerns carryover and learning effects. The standard solution is to randomize the presentation order of the tasks. This procedure, which ensures that bias gets close to zero as the number of participants increases, is generally efficient if there is no reason to expect an interaction between treatment and order (Elmes et al., 1992). If this is the case, another approach can be used: counterbalancing the order of the tasks. However, complete counterbalancing is difficult with large numbers of tasks, and in this case, one often has to be content with incomplete counterbalancing using a Latin square (for a detailed discussion, see Winer, 1962).

A final point made by Green et al. (2019) is that using large batteries of tasks increases the rate of Type I errors. Although this point is correct, it is not an argument against multi-indicator latent factors. Rather, it is an argument in favor because those do not suffer from this bias. In addition, latent factors aside, there are many methods designed for correcting α (i.e., the significance threshold) for multiple comparisons (e.g., Bonferroni, Holm, false-discovery rate). Increased Type I error rates are a concern with researchers who ignore the problem and do not apply any correction.

One reasonable argument is that latent factor analysis requires large numbers of participants. The solution is offered by multilab trials. The ACTIVE trial—the largest experiment carried out in the field of cognitive training—was, indeed, a multisite study (Rebok et al., 2014). Another multisite cognitive-training experiment is currently ongoing (Mathan, 2018).

To conclude this section, we emphasize two points. First, it is well known that in general, single tests possess low reliability. Second, multiple measures are needed to understand whether improvements occur at the level of the test (e.g., n-back) or at the level of the construct (e.g., WM).

Some methodological recommendations

We are not as naive as to believe that our analysis will deter researchers in the field to carry out much more research on the putative far-transfer benefits of cognitive training despite the lack of any empirical evidence. We thus provide some advice about the directions that should be taken so that not all resources are spent in search of a chimera.

Making methods and results accessible, piecemeal publication, and objective report of results

We broadly agree with the methodological recommendations made by Green et al. (2019), such as reporting not only p values but also effect sizes and confidence intervals, and the need for well-powered studies. We add a few important recommendations (for a summary of the recommendations throughout this article, see Table 3). To begin with, it is imperative to put the data, analysis code, and other relevant information online. In addition to providing supplementary backup, this allows other researchers to closely replicate the studies and to carry out additional analyses (including meta-analyses)—important requirements in scientific research. By the same token and in the spirit of Open Science, researchers should reply to requests from meta-analysts asking for summary data and/or the original data. In our experience, response rate is currently 20% to 30% at best (e.g., Sala et al., 2018). Although we understand that it may be difficult to answer such replies positively when data were collected 20 years or more ago, there is no excuse for data collected more recently.

Table

Table 3. Key Recommendations for Researchers

Table 3. Key Recommendations for Researchers

Just like other questionable research practices, piecemeal publication should be avoided (Hilgard et al., 2019). If dividing the results of a study into several articles cannot be avoided, the articles should clearly and unambiguously indicate the fact that this has been done and should reference the articles sharing the results.

There is one point made by Green et al. (2019) with which we wholeheartedly agree: the necessity of reporting results correctly and objectively without hyperbole and incorrect generalization. The field of cognitive training is littered with exaggerations and overinterpretations of results (see Simons et al., 2016). A fairly common practice is to focus on the odd statistically significant result even though most of the tests turn out nonsignificant. This is obviously capitalizing on chance and should be avoided at all costs.

In a similar vein, there is a tendency to overinterpret results of studies using neuroscience methods. A striking example was recently offered by Schellenberg (2019), who showed that in a sample of 114 journal articles published in the last 20 years on the effects of music training, causal inferences were often made although the data were only correlational; neuroscientists committed this logical fallacy more often than psychologists. There was also a rigid focus on learning and the environment and a concurrent neglect of alternative explanations, such as innate differences. Another example consists in inferring far transfer when neuroimaging effects are found but not behavioral effects. However, such an inference is illegitimate.

The need for detailed analyses and computational models

As a way forward, Green et al. (2019) recommended well-powered studies with large numbers of participants. In a similar vein, and focusing on the n-back-task training, Pergher et al. (2020) proposed large-scale studies isolating promising features. We believe that such an atheoretical approach is unlikely to succeed. There is an indefinite space of possible interventions (e.g., varying the type of training task, the cover story used in a game, the perceptual features of the material, the pace of presentation, ad infinitum), which means that searching this space blindly and nearly randomly would require a prohibitive amount of time. Strong theoretical constraints are needed to narrow down the search space.

There is thus an urgent need to understand which cognitive mechanisms might lead to cognitive transfer. As we showed above in the section on meta-analysis, the available evidence shows that the real effect size of cognitive training on far transfer is zero. Prima facie, this outcome indicates that theories based on general mechanisms, such as brain plasticity (Karbach & Schubert, 2013), primitive elements (Taatgen, 2013), and learning to learn (Bavelier et al., 2012), are incorrect when it comes to far transfer. We reach this conclusion by a simple application of modus tollens: (a) Theories based on general mechanisms such as brain plasticity, primitive elements, and learning to learn predict far transfer. (b) The empirical evidence shows that there is no far transfer. Therefore, (c) theories based on general mechanisms such as brain plasticity, primitive elements, and learning to learn are incorrect.

Thus, if one believes that cognitive training leads to cognitive enhancement—most likely limited to near transfer—one has to come up with other theoretical mechanisms than those currently available in the field. We recommend two approaches to identify such mechanisms, which we believe should be implemented before large-scale randomized controlled trials are carried out.

Fine analyses of the processes in play

The first approach is to use experimental methods enabling the identification of cognitive mechanisms. Cognitive psychology has a long history of refining such methods, and we limit ourselves to just a few pointers. A useful source of information consists in collecting fine-grained data, such as eye movements, responses times, and even mouse location and mouse clicks. Together with hypotheses about the processes carried out by participants, these data make it possible to rule out some mechanisms while making others more plausible. Another method is to design experiments that specifically test some theoretical mechanisms. Note that this goes beyond establishing that a cognitive intervention leads to some benefits compared with a control group. In addition, the aim is to understand the specific mechanisms that lead to this superiority.

It is highly likely that the strategies used by the participants play a role in the training, pretests, and posttests used in cognitive-training research (Sala & Gobet, 2019Shipstead et al., 2012von Bastian & Oberauer, 2014). It is essential to understand these strategies and the extent to which they differ between participants. Are they linked to a specific task or a family of tasks (near transfer), or are they general across many different tasks (far transfer)? If it turns out that such general strategies exist, can they be taught? What do they tell researchers about brain plasticity and changing basic cognitive abilities such as general intelligence?

Two studies that investigated the effects of strategies are mentioned here. Laine et al. (2018) found that instructing participants to employ a visualization strategy when performing n-back training improved performance. In a replication and extension of this study, Forsberg et al. (2020) found that the taught visualization strategy improved some of the performance measures in novel n-back tasks. However, older adults benefited less, and there was no improvement in WM tasks structurally different from n-back tasks. In the uninstructed participants, n-back performance correlated with the type of spontaneous strategies and their level of detail. The types of strategies also differed as a function of age.

A final useful approach is to carry out a detailed task analysis (e.g., Militello & Hutton, 1998) of the activities involved in a specific regimen of cognitive training and in the pretests and posttests used. What are the overlapping components? What are the critical components and those that are not likely to matter in understanding cognitive training? These components can be related to information about eye movements, response times, and strategies and can be used to inspire new experiments. The study carried out by Baniqued et al. (2013) provides a nice example of this approach. Using task analysis, they categorized 20 web-based casual video games into four groups (WM, reasoning, attention, and perceptual speed). They found that performance in the WM and reasoning games was strongly associated with memory and fluid-intelligence abilities, measured by a battery of cognitive tasks.

Cognitive modeling as a method

The second approach we propose consists of developing computational models of the postulated mechanisms, which of course should be consistent with what is known generally about human cognition (for a similar argument, see Smid et al., 2020). To enable an understanding of the underlying mechanisms and be useful in developing cognitive-training regimens, the models should be in a position to simulate not only the tasks used as pretests and posttests but also the training tasks. This is what Taatgen’s (2013) model is doing: It first simulates improvement in a complex verbal WM task over 20 training sessions and then simulates how WM training reduces interference in a Stroop task compared with a control group. (We would, of course, query whether this far-transfer effect is genuine.) By contrast, Green, Pouget, & Bavelier’s (2010) neural-network and diffusion-to-bound models simulate the transfer tasks (a visual-motion-direction discrimination task and an auditory-tone-location discrimination task) but do not simulate the training task with action video-game playing. Ideally, a model of the effect of an action video game should simulate actual training (e.g., by playing Call of Duty 2), processing the actual stimuli involved in the game. To our knowledge, no such model exists. Note that given the current developments in technology, modeling such a training task is not unrealistic.

The models should also be able to explain data at a micro level, including eye movements and verbal protocols (to capture strategies). There is also a need for the models to use exactly the same stimuli as those used in the human experiments. For example, the chunk hierarchy and retrieval structures model of chess expertise (De Groot et al., 1996Gobet & Simon, 2000) receives as learning input the kind of board positions that players are likely to meet in their practice. When simulating experiments, the same stimuli are used as those employed with human players, and close comparison is made between predicted and actual behavior along a number of dimensions, including percentage of correct responses, number and type of errors, and eye movements. In the field of cognitive training, Taatgen’s (2013) model is a good example of the proper level of granularity for understanding far transfer. Note that, ideally, the models should be able to predict possible confounds and how modifications to the design of training would circumvent them. Indeed, we recommend that considerable resources be invested in this direction of research with the aim of testing interventions in silico before testing them in vivo (Gobet, 2005). Only those interventions that lead to benefits in simulations should be tested in trials with human participants. In addition to embodying sound principles of theory development and testing, such an approach would also lead to considerable savings of research money in the medium and long terms.

Searching for small effects

Green et al. (2019, p. 20) recognized the possibility that large effects are unlikely and that one should be content with small effects. They are also open to the possibility of using unspecific effects, such as expectation effects. It is known that many educational interventions bring a modest effect (Hattie, 2009), and thus, the question arises as to whether cognitive-training interventions are more beneficial than alternative ones. We argue that many other interventions are cheaper and/or have specific benefits when they directly match educational goals. For example, games related to mathematics are more likely to improve one’s mathematical knowledge and skills than n-back tasks and can be cheaper and more fun.

If cognitive training leads only to small and unspecific effects, one faces two implications, one practical and one theoretical. Practically, the search for effective training features has to operate blindly, which is very inefficient. This is because current leading theories in the field are incorrect, as noted above, and thus there is no theoretical guidance. Thus, effectiveness studies are unlikely to yield positive results. Theoretically, if the effectiveness of training depends on small details of training and pre/post measures, then the prospects of generalization beyond specific tasks are slim to null. This is unsatisfactory scientifically because science progresses by uncovering general laws and finding order in apparent chaos (e.g., the state of chemistry before and after Mendeleev’s discovery of the periodic table of elements).

A straightforward explanation can be proposed for the pattern of results found in our meta-analyses with respect to far transfer—small to zero effect sizes, low or null true between-studies variance. Positive effect sizes are just what can be expected by chance, features of design (i.e., active vs. passive control groups), regression to the mean, and sometimes publication bias. (If you believe that explanations based on chance are not plausible, consider Galton’s board: It perfectly illustrates how a large number of small effects can lead to a normal distribution. Likewise, in cognitive training, multiple variables and mechanisms lead to some experiments having a positive effect, others a negative effect, with most experiments centered around the mean of the distribution.) Thus, the search for robust and replicable effects is unlikely to be successful.

Note that the issue with cognitive training is not the lack of replications and the lack of reproducibility, which plague large swathes of psychology: The main results have been replicated often and form a highly coherent pattern when results are put together in (meta-)meta-analyses. Pace Pergher et al. (2020), we do not believe that variability of methods is an issue. On the contrary, the main outcomes are robust to experimental variations. Indeed, results obtained with many different training and evaluation methods converge (small-to-zero effect sizes and low true heterogeneity) and thus satisfy a fundamental principle in scientific research: the principle of triangulation (Mathison, 1988).

Funding agencies

Although Green et al.’s (2019) article is explicitly about methodology, it does make recommendations for funding agencies and lobbies for more funding: “We feel strongly that an increase in funding to accommodate best practice studies is of the utmost importance” (p. 17). On the one hand, this move is consistent with the aims of their article in that several of the suggested practices, such as using large samples and performing studies that would last for several years, would require substantial amounts of money to be carried out. On the other hand, lobbying for an increase in funding is made without any reference to results showing that cognitive training might not provide the hoped-for benefits. The authors only briefly discussed the inconsistent evidence for cognitive training, concluding that “our goal here is not to adjudicate between these various positions or to rehash prior debates” (p. 3). However, in general, rational decisions about funding require an objective evaluation of the state of the research. Obviously, if the research is about developing methods for cognitive enhancement, funders must take into consideration the extent to which the empirical evidence supports the hypothesis that the proposed methods provide domain-general cognitive benefits. As we showed in the “Meta-Analytical Evidence” section, there is little to null support for this hypothesis. Thus, our advice for funders is to base their decisions on the available empirical evidence and on the conclusions reached by meta-analyses.

As discussed earlier, our meta-analyses clearly show that cognitive training does not lead to any far transfer in any of the cognitive-training domains that have been studied. In addition, using second-order meta-analysis made it possible to show that the between-meta-analyses true variance is due to second-order sampling error and thus that the lack of far transfer generalizes to different populations and different tasks. Taking a broader view suggests that our conclusions are not surprising and are consistent with previous research. In fact, they were predictable. Over the years, it has been difficult to document far transfer in experiments (Singley & Anderson, 1989Thorndike & Woodworth, 1901), industrial psychology (Baldwin & Ford, 1988), education (Gurtner et al., 1990), and research on analogy (Gick & Holyoak, 1983), intelligence (Detterman, 1993), and expertise (Bilalić et al., 2009). Indeed, theories of expertise emphasize that learning is domain-specific (Ericsson & Charness, 1994Gobet & Simon, 1996Simon & Chase, 1973). When putting this substantial set of empirical evidence together, we believe that it is possible to conclude that the lack of training-induced far transfer is an invariant of human cognition (Sala & Gobet, 2019).

Obviously, this conclusion conflicts with the optimism displayed in the field of cognitive training, as exemplified by Green et al.’s (2019) article discussed above. However, it is in line with skepticism recently expressed about cognitive training (Moreau, 2021Moreau et al., 2019Simons et al., 2016). It also raises the following critical epistemological question: Given that the overall evidence in the field of cognitive training strongly suggests that the postulated far-transfer effects do not exist, and thus the probability of finding such effects in future research is very low, should one conclude that the reasonable course of action is to stop performing cognitive-training research on far transfer?

We believe that the answer to this question is “yes.” Given the clear-cut empirical evidence, the discussion about methodological concerns is irrelevant, and the issue becomes searching for other cognitive-enhancement methods. However, although the hope of finding far-transfer effects is tenuous, the available evidence clearly supports the presence of near-transfer effects. In many cases, near-transfer effects are useful (e.g., with respect to older adults’ memory), and developing effective methods for improving near transfer is a valuable—and importantly, realistic—avenue for further research.