Is the Psychopathic Brain an Artifact of Coding Bias? A Systematic Review. Jarkko Jalava et al. Front. Psychol., April 12 2021. https://doi.org/10.3389/fpsyg.2021.654336
Abstract: Questionable research practices are a well-recognized problem in psychology. Coding bias, or the tendency of review studies to disproportionately cite positive findings from original research, has received comparatively little attention. Coding bias is more likely to occur when original research, such as neuroimaging, includes large numbers of effects, and is most concerning in applied contexts. We evaluated coding bias in reviews of structural magnetic resonance imaging (sMRI) studies of PCL-R psychopathy. We used PRISMA guidelines to locate all relevant original sMRI studies and reviews. The proportion of null-findings cited in reviews was significantly lower than those reported in original research, indicating coding bias. Coding bias was not affected by publication date or review design. Reviews recommending forensic applications—such as treatment amenability or reduced criminal responsibility—were no more accurate than purely theoretical reviews. Coding bias may have contributed to a perception that structural brain abnormalities in psychopaths are more consistent than they actually are, and by extension that sMRI findings are suitable for forensic application. We discuss possible sources for the pervasive coding bias we observed, and we provide recommendations to counteract this bias in review studies. Until coding bias is addressed, we argue that this literature should not inform conclusions about psychopaths' neurobiology, especially in forensic contexts.
Discussion
Neurobiological reviews of PCL-R and PCL:SV psychopathy significantly under-report null-findings in sMRI research, indicating widespread coding bias. The majority (64.18%) of original sMRI findings were nulls, whereas nulls made up a small minority (8.99%) of effects in review literature. Reviewers, in other words, preferentially reported data supporting neurobiological models of psychopathy. We found no evidence that the reporting imbalance was due to factors other than bias: systematic, narrative, and targeted reviews all reported disproportionately few nulls (though meta-analyses reported too few effects to evaluate), the pattern was stable across time, and not driven by exploratory research or outliers. Notably, reviews calling for forensic application of the data, such as treatment, criminal responsibility, punishment, and crime prediction, were no more accurate than purely theoretical reviews. Applied reviews were, however, more likely than theoretical reviews to conclude that the data supported neurobiological bases of psychopathy. These findings are surprising, as applied reviews in other fields—such as those examining drug safety and efficacy—typically face the highest burden of proof and are thus most likely to emphasize limitations in the data [see e.g., Köhler et al. (2015)].
Our study is the first to systematically examine coding bias in cognitive neuroscience. Although our findings are limited to structural imaging in psychopathy, they suggest that coding bias should be considered alongside more widely recognized Questionable Research Practices (QRPs) such as p-hacking, reporting bias, publication bias, citation bias, and the file drawer problem. QRPs in original research filter out null-findings at early stages of the research and publication process, while coding and citation bias further distort the state of scientific knowledge by eliminating null findings from reviews. In addition to coding bias, we found evidence of reporting bias during our review of sMRI studies. Null-findings in the original literature were rarely reported in the study abstracts and were frequently not reported fully in results sections. Nulls often appeared only in data or supplemental tables, and in some cases they had to be inferred by examining ROIs mentioned in the introduction but not in the results section. This illustrates how QRPs are not mutually exclusive, and the presence of one QRP may also signal the presence of another [see e.g., Agnoli et al. (2017)].
The coding bias we observed may have a number of explanations. First, reviewers may have been subject to confirmation bias. Confirmation bias refers to the tendency to weigh evidence that confirms a belief more heavily than evidence that does not (Nickerson, 1998). Reviewers in our study may have assumed neurobiological abnormalities in psychopaths—perhaps from previous reviews—and looked more carefully for data to confirm that assumption. Confirmation bias has been cited as a possible explanation for under-reporting of null-findings in original research (Forstmeier et al., 2017). Our findings suggest that it may play a role in review literature, where null-findings would be especially difficult to square with theories presuming group differences [see e.g., Sterling et al. (1995) and Ferguson and Heene (2012)], and reporting bias would make it very hard to locate disconfirming (null) findings. Second, reviewers may have been following convention. The earliest review studies did not generally include null-findings, and later reviews may have interpreted this as a precedent to follow. Third, explicit and tacit publication preferences may increase coding bias. Research tracking original studies from grant proposal to publication show that most null-findings are not even written up for publication, and that journals—particularly top-tier journals—show a marked preference for strong positive findings (Franco et al., 2014; Ioannidis et al., 2014). Similarly, review authors may have declined to submit reviews with inconclusive findings. Given the extent of publication bias, it is also possible that journal editors may have been more likely to reject inconclusive reviews in favor of those summarizing consistent, positive findings.
Coding bias observed in our study has a number of potential effects. Aside from distorting the true state of knowledge about structural brain abnormalities in psychopaths, it may also have led at least some researchers and courts to believe that the abnormalities are consistent enough for forensic application. This may have encouraged practitioners to de-emphasize or overlook more reliable, behavioral indicators of criminal responsibility, future dangerousness and treatment amenability in favor of less reliable predictors, such as brain structure. Neuroprediction of crime has a number of empirical shortcomings, such as unknown measurement error and inadequate outcome variables (Poldrack et al., 2018). Using MRI data to predict crime can thus introduce substantial error into an already imperfect process (e.g., Douglas et al., 2017). Neurobiologically-informed assessments and treatments are even less likely to be effective if the population's neurobiology is fundamentally misunderstood. Given the extent of coding bias in the psychopathy literature, such interventions may in fact be harmful.
More broadly, coding bias may have contributed to reverse inference [see Scarpazza et al. (2018)] whereby reports of brain abnormalities are taken as proof that psychopathy is a legitimate diagnostic category [for an argument such as this, see e.g., Kiehl and Hoffman (2011)].5 Similarly, some researchers have suggested that psychopathy diagnoses could be enhanced by neuroimaging evidence (e.g., Hulbert and Adeli, 2015). Arguments of this sort can detract from problems in other aspects of the PCL-R, particularly in its psychometric properties. Recently, these critiques have intensified, with authors raising concerns about the reliability of the PCL-R, its utility in forensic contexts (DeMatteo et al., 2020), its factor structure, and its predictive validity (Boduszek and Debowska, 2016). Using neurobiology to validate psychopathy as a diagnostic category is doubly problematic: not only are presumed brain abnormalities in psychopathy broad and non-specific [for problems in reverse inference, see Poldrack (2011) and Scarpazza et al. (2018)], but as we have shown here, their consistency appears to be largely misunderstood as well.
In light of our findings, we recommend the following: First, published review literature on sMRI studies of PCL-R and PCL:SV psychopathy should be approached with caution, especially when the literature is used to influence forensic decisions. Second, we recommend that guidelines for conducting review literature be revised to include explicit guidance for avoiding coding bias. Although the problem of un- and under-reported null-findings is recognized [e.g., Pocock et al., 1987; Hutton and Williamson, 2000; guidelines for accurate reporting in review literature also exist; see Petticrew and Roberts (2008), American Psychological Association (2008), and Moher et al. (2015)], the role of coding bias, by and large, is not. Third, we recommend that review literature pay careful attention to the a priori likelihood of null-findings in their data. In our example, both the PCL-R (DeMatteo et al., 2020) and neuroimaging methods (Nugent et al., 2013) have relatively low reliability. The likelihood that sMRI research on psychopathy should yield more than 91% positive findings is therefore not realistic [for more extended discussions relating to fMRI, see Vul et al. (2009) and Vul and Pashler (2017)]. Fourth, we recommend that the production of new data should be complemented by closer examination of data already published. Among the 45 reviews we evaluated, we found a single study (Plodowski et al., 2009) that comprehensively reported all nulls in the original literature. Unfortunately, it was also among the least cited reviews, suggesting that accuracy and scientific impact do not necessarily go together. Finally, we recommend that reviewers pay close attention to potential biases—such as publication and reporting bias, p-hacking, and the file drawer problem—in the original literature, and take measures to compensate for them. Currently, it appears that reviews largely magnify them instead.
Limitations
Our study has a number of important limitations. First, in order to focus on forensically relevant studies, we limited our analysis to PCL-R and PCL:SV psychopathy. We also excluded studies that reported on PCL-R Factor scores only (e.g., Bertsch et al., 2013), that did not use case-control or correlational method (Sato et al., 2011; Kolla et al., 2014), and that included youth samples. It is possible that the excluded studies were reported more accurately in review literature than those we included. Second, we excluded original and review studies not published in English. This may have introduced a selection bias of our own, as it is possible that non-English publications use different standards of reporting and reviewing than those published in English. Third, our findings may have underestimated the extent of the bias. For example, one whole-brain analysis reviewed here (Contreras-Rodríguez et al., 2015) only reported positive findings, which means that the remaining brain regions were unreported nulls. Had these unreported null-findings been included in our analysis, the true percentage of nulls in the original studies would have been greater than 64.18%. Further, we did not account for possible publication bias. Since null-findings are presumed to be less likely than null-rejections to be published, the percentage of true nulls in the field is essentially unknown, though it may be significantly higher than we estimated (review literature examined here did not report any unpublished null-findings). Finally, we excluded fMRI and other imaging methods entirely. Future research could evaluate whether coding bias is present in reviews of this literature as well.