Abstract: Previous studies suggest that gyrification is associated with superior cognitive abilities in humans, but the strength of this relationship remains unclear. Here, in two samples of related individuals (total N = 2882), we calculated an index of local gyrification (LGI) at thousands of cortical surface points using structural brain images and an index of general cognitive ability (g) using performance on cognitive tests. Replicating previous studies, we found that phenotypic and genetic LGI–g correlations were positive and statistically significant in many cortical regions. However, all LGI–g correlations in both samples were extremely weak, regardless of whether they were significant or nonsignificant. For example, the median phenotypic LGI–g correlation was 0.05 in one sample and 0.10 in the other. These correlations were even weaker after adjusting for confounding neuroanatomical variables (intracranial volume and local cortical surface area). Furthermore, when all LGIs were considered together, at least 89% of the phenotypic variance of g remained unaccounted for. We conclude that the association between LGI and g is too weak to have profound implications for our understanding of the neurobiology of intelligence. This study highlights potential issues when focusing heavily on statistical significance rather than effect sizes in large-scale observational neuroimaging studies.
Discussion
In the present study, we analyzed data from two samples of related individuals to examine the association between gyrification and general cognitive ability. We used a popular automatic method to calculate LGI across the cortex from MRI images (Schaer et al. 2008), and calculated g from performance on batteries of cognitive tests. We estimated the heritability of height, ICV, and g, as well as the heritability LGI, area, and thickness at all vertices. We estimated phenotypic, genetic, and environmental LGI–g correlations, as well as partial LGI–g correlations with height, ICV, area (at the same vertex), and thickness (at the same vertex) as potential confounding variables. We estimated the amount of phenotypic variance of g explained by all LGIs together via ridge regression, and examined the across-sample consistency of neuroanatomical specificity in heritability of LGI, area, and thickness, as well as LGI–g correlations. Finally, we tested whether heritability estimates and LGI–g correlations were stronger in regions implicated by the P-FIT, a model of the neurological basis of human intelligence (Jung and Haier 2007).A novel finding of the present study was that LGI was heritable across the cortex, extending a previous study that established the heritability of whole-brain GI (Docherty et al. 2015). This finding was not particularly surprising because many features of brain morphology are heritable. Nevertheless, it was necessary to establish the heritability of LGI before calculating genetic LGI–g correlations, which are only meaningful if both LGI and g are heritable traits. The previous study estimated the heritability of GI to be 0.71, which is much greater than most of the heritability estimates for LGI observed in GOBS or HCP. This result is also not surprising, because GI is likely to be contaminated by less measurement error than LGI. Heritabilities of all other traits were consistent with those published in previous studies.
The present study represents a replication of previous work and provides several important extensions to our understanding of the relationship between gyrification and cognition. First, we replicated previous work by finding positive and significant phenotypic LGI–g correlations (e.g., Gregory et al. 2016). Furthermore, we found that genetic LGI–g correlations were positive and significant (but only in HCP), suggesting that the relationship between gyrification and intelligence may be driven by pleiotropy. Since environmental LGI–g correlations were not significant, their net sign differed across GOBS and HCP, and their spatial patterns showed no consistency across samples, it is reasonable to conclude that they mostly reflected measurement error rather than meaningful shared environmental contributions to LGI and g.
In our view, the most important finding from the present study is that all LGI–g correlations, even the significant ones, were weak. Phenotypically, LGI at a typical vertex poorly predicted g. Even when the predictive ability of all LGIs was considered together via ridge regression, at least 89% of the variance of g remained unaccounted for. Phenotypic and genetic LGI–g correlations were weaker than ICV–g correlations in the same participants, and about the same as area–g correlations. Partialing out ICV or area further reduced LGI–g correlations.
The volume of cortical mantle is often computed as the product of its area and thickness, but at the resolution of meshes typically used to represent the cortex, the variability of area is higher than the variability of thickness such that surface area is the primary contributor to the variability of cortical volume (Winkler et al. 2010), and therefore of its relationship to other measurements; the same holds, more strongly even, for parcellations of the cortex in large anatomical or functional regions. This means that the association between overall brain volume and cognitive abilities reported by previous studies (e.g., Pietschnig et al. 2015) is probably primarily driven by area–g correlations (Vuoksimaa et al. 2015). LGI is strongly correlated with area (Gautam et al. 2015; Hogstrom et al. 2013), which explains why partialing out either ICV or area reduced phenotypic and genetic LGI–g correlations in the present study. Thus, we conclude, based on our results, that the association between gyrification and cognitive abilities to a large extent reflects the already well-established relationship between surface area and cognitive abilities, and that the particular association between the unique portion of gyrification and cognitive abilities is extremely small.
The above conclusion is consistent with that of a previous twin study (Docherty et al. 2015), which examined genetic associations between overall cortical surface area, whole-brain GI, and cognitive abilities. The authors concluded that the genetic GI–g correlation could be more or less fully explained by the area–g correlation. It has been argued previously that focusing on whole-brain GI may miss important neuroanatomical specificity; however, our findings suggest that Docherty et al.’s conclusion holds for both local and global gyrification.
The P-FIT is a popular hypothesis concerning which brain regions matter most for human cognition (Jung and Haier 2007). The P-FIT was initially proposed to explain activation patterns observed during functional MRI experiments, but has been extended to aspects of brain structure. Previous studies have suggested that the association between gyrification and cognitive abilities may be stronger in P-FIT regions than the rest of the brain (Green et al. 2018; Gregory et al. 2016). However, when we tested this hypothesis, we actually found evidence to the contrary. Since neuroanatomical patterns of phenotypic and genetic LGI–g correlations were consistent across GOBS and HCP, this unexpected finding was unlikely to have been caused by a lack of specificity, such as if LGI–g correlations were distributed randomly over the cortex. Instead, while LGI–g correlations exhibited a characteristic neuroanatomical pattern, this pattern did not match the P-FIT. A potential limitation of the present study in this regard is that there is no widely accepted method of matching Brodmann areas (used to define P-FIT regions) to surface-based ROIs (used to group vertices). Therefore, one could argue that our selection of P-FIT regions was incorrect. While our selection was based on that of a previous study (Green et al. 2018), we nevertheless reperformed our analysis several times with different selections of P-FIT regions, and the results remained the same. Importantly, although we argue that the P-FIT is not a good model for the association between gyrification—a purely structural aspect of cortical organization—and cognitive abilities, our results should not be used to criticize the P-FIT as a hypothesis of the brain’s functional organization, because function does not necessarily follow structure.
Most of our results were consistent across samples. However, estimates of heritability and genetic correlations were generally weaker in GOBS than HCP. Notably, some genetic LGI–g correlations were strong enough to surpass the FDR-corrected threshold for significance in HCP, but not GOBS. Such differences could be related to study design. One limitation of all family studies is that polygenic effects are susceptible to inflation due to shared environmental factors, which would cause overestimation of both heritability and genetic correlations. It could be argued that extended-pedigree studies, such as GOBS, are less susceptible to this kind of inflation than twin studies, such as HCP, because there are usually fewer shared environmental factors between distantly related individuals than twins (Almasy and Blangero 2010); this reduction in inflation comes at the expense of a reduction in power to detect polygenic effects, which could also explain the lack of significant genetic correlations in GOBS. It is unlikely that differences in results between samples were caused by differences in scanner or scanning protocol (Han et al. 2006). Furthermore, while GOBS and HCP participants completed different cognitive batteries, both were comprehensive in terms of measured cognitive abilities, ensuring that g indexed a similar construct in both samples.
With the recent emergence of large, open-access data sets and international consortia, neuroimaging and genetics studies have entered a new era characterized by samples comprising many thousands of participants. In such large studies, trivial effects may be labeled as statistically significant. This observation is not new (Berkson 1938) and numerous solutions have been proposed, such as adopting more stringent significance criteria (Benjamin et al. 2018), scaling criteria by sample size (Mudge et al. 2012), testing interval-null rather than point-null hypotheses (Morey and Rouder 2011), and, most radically, abandoning the notion of statistical significance altogether (McShane et al. 2019). One could argue that these solutions suffer from their own drawbacks and are unlikely to be adopted by the scientific mainstream in near future. Therefore, in the meantime, we believe that it is imperative to judge, at least qualitatively, whether the sizes of statistically significant effects are large enough to justify one’s conclusions, particularly when these conclusions may have broad, overarching implications. This idea is not new either (Kelley and Preacher 2012) but deserves to be restated. Based on the results of the present study, we are inclined to believe that gyrification minimally explains variation in cognitive abilities and therefore has somewhat limited implications for our understanding of the neurobiology of human intelligence.