Monday, December 7, 2020

Teachers with some factual knowledge of the brain were more easily believers of neuromyths; those who endorsed neuromyths were generally more confident in their answers than those who identified the myths

Why do teachers believe educational neuromyths? Brenda Hughes, Karen A. Sullivan, Linda Gilmore. Trends in Neuroscience and Education, Volume 21, December 2020, 100145. https://doi.org/10.1016/j.tine.2020.100145

Rolf Degen's take: https://twitter.com/DegenRolf/status/1335927859332984833

Abstract

Background: It is not well understood whether qualified teachers believe neuromyths, and whether this affects their practice and learner outcomes.

Method: A standardised survey was administered to practising teachers (N = 228) to determine whether or not they believe fictional (neuromyth) or factual statements about the brain, the confidence in those beliefs, and their application.

Results: Although factual knowledge was high, seven neuromyths were believed by >50% of the sample. Participants who endorsed neuromyths were generally more confident in their answers than those who identified the myths. Key neuromyths appear to be incorporated into classrooms.

Conclusion: Australian teachers, like their overseas counterparts, have some neuroscience awareness but are susceptible to neuromyths. A stronger partnership with neuroscientists would addresss the complex problem of disentangling brain facts from fictions, and provide better support for teachers. This study uncovered psychometric weaknesses in the commonly used neuromyth measure that future research should address.

Keywords: NeuroscienceEducationTeachingLearningStudentsBrainNeuromyths


Higher income is more consistently linked to how frequently individuals experience happiness than how intensely happy each episode is; in part because lower-income individuals spend more time engaged in passive leisure activities, reducing the frequency of positive affect

Income More Reliably Predicts Frequent Than Intense Happiness. Jon M. Jachimowicz et al. Social Psychological and Personality Science, December 7, 2020. https://doi.org/10.1177/1948550620972548

Abstract: There is widespread consensus that income and subjective well-being are linked, but when and why they are connected is subject to ongoing debate. We draw on prior research that distinguishes between the frequency and intensity of happiness to suggest that higher income is more consistently linked to how frequently individuals experience happiness than how intensely happy each episode is. This occurs in part because lower-income individuals spend more time engaged in passive leisure activities, reducing the frequency but not the intensity of positive affect. Notably, we demonstrate that only happiness frequency underlies the relationship between income and life satisfaction. Data from an experience sampling study (N = 394 participants, 34,958 daily responses), a preregistered cross-sectional study (N = 1,553), and a day reconstruction study (N = 13,437) provide empirical evidence for these ideas. Together, this research provides conceptual and empirical clarity into how income is related to happiness.

Keywords: money, income, happiness, life satisfaction, time use


Recent research involving birds, ‘enculturated’ chimpanzees, and humans suggests that the cognitive mechanisms that make imitation possible are constructed during development through social interaction

Heyes, Cecilia. 2020. “Imitation Primer.” PsyArXiv. December 7. doi:10.31234/osf.io/tn34f

Abstract: In this Primer, Cecilia Heyes explains why imitation is thought to be a mark of cognitive complexity and an inheritance mechanism for cumulative culture. Recent research involving birds, ‘enculturated’ chimpanzees, and humans suggests that the cognitive mechanisms that make imitation possible are constructed during development through social interaction.


Many conservatives reject both gender equality & evolution of sex differences, embracing instead “naturally occurring” gender differences; many liberals reject evolved gender differences & naturally occurring gender differences, while nonetheless strongly endorsing evolution

Lewandowsky, S., Woike, J. K., & Oberauer, K. (2020). Genesis or Evolution of Gender Differences? Worldview-Based Dilemmas in The Processing of Scientific Information. Journal of Cognition, 3(1), 9, Apr 30 2020. DOI: http://doi.org/10.5334/joc.99

Rolf Degen's take: https://twitter.com/DegenRolf/status/1335878463190790145

Abstract: Some issues that have been settled by the scientific community, such as evolution, the effectiveness of vaccinations, and the role of CO2 emissions in climate change, continue to be rejected by segments of the public. This rejection is typically driven by people’s worldviews, and to date most research has found that conservatives are uniformly more likely to reject scientific findings than liberals across a number of domains. We report a large (N > 1,000) preregistered study that addresses two questions: First, can we find science denial on the left? Endorsement of pseudoscientific complementary and alternative medicines (CAM) has been anecdotally cited as being more consonant with liberals than conservatives. Against this claim, we found more support for CAM among conservatives than liberals. Second, we asked how liberals and conservatives resolve dilemmas in which an issue triggers two opposing facets of their worldviews. We probed attitudes on gender equality and the evolution of sex differences—two constructs that may create conflicts for liberals (who endorse evolution but also equality) and conservatives (who endorse gender differences but are sceptical of evolution). We find that many conservatives reject both gender equality and evolution of sex differences, and instead embrace “naturally occurring” gender differences. Many liberals, by contrast, reject evolved gender differences, as well as naturally occurring gender differences, while nonetheless strongly endorsing evolution.

Keywords: Emotion and cognition, Social cognition, Reasoning


Discussion

Relationship to previous results

Our results coordinate well with multiple precedents in the literature, which we take up for each of the constructs examined. Considering first religiosity, we replicated the substantial association between stronger religious beliefs and conservatism in the American population (Malka et al., 2012Schlenker, Chambers, & Le, 2012). In our study this association generalized across a broadly-defined socio-political conservatism construct as well as a specific construct targeting endorsement of laissez-faire free-market economics. We also replicated the long-standing strong negative association between religiosity and acceptance of evolution (e.g., Ecklund, Scheitle, Peifer, & Bolger, 2017Tom, 2018) and the modest negative association between religiosity and analytic thinking (i.e., CRT performance) reported previously (Jack, Friedman, Boyatzis, & Taylor, 2016Shenhav, Rand, & Greene, 2012Stagnaro, Ross, Pennycook, & Rand, 2019). Likewise, the correlations between religiosity and the gender constructs (e.g., Table 5) are consistent with previous reports that religiosity predicts sexism (Van Assche et al., 2019). Our results go beyond previous findings because our scales did not probe discriminatory sexism but the origin of presumed gender differences. We find that religiosity makes it less likely that people believe that gender differences have evolved.

The negative association between religiosity and CAM rejection is also unsurprising in light of previous research that has shown acceptance of CAM to be driven by intuitive thinking, paranormal beliefs, and ontological confusions (Lindeman, 2011). At least one of those variables (intuitive thinking) is also known to be a predictor of religiosity (e.g., Shenhav et al., 2012). The positive correlation between CAM rejection and acceptance of vaccinations replicates much previous research (e.g., Attwell, Ward, Meyer, Rokkas, & Leask, 2018Browne, Thomson, Rockloff, & Pennycook, 2015Bryden, Browne, Rockloff, & Unsworth, 2018Ernst, 2002).

However, our findings concerning religiosity also deviate from aspects of other recent research (Rutjens, Sutton, & van der Lee, 2018). Unlike Rutjens et al., we found no evidence of a link between religiosity and rejection of vaccinations. Given that Rutjens et at. observed this link only in some of their studies and only for some measures of religiosity (mainly measures of religious orthodoxy), we are not concerned about this apparent departure from previous results. Indeed, in another recent as-yet unpublished study involving identical constructs, we did observe a negative association between vaccination and religiosity, suggesting that this relationship may well be real but is only observable in certain circumstances.

Turning to the associations involving CRT performance, the observed modest but significant negative correlation with religiosity replicates previous results (Gervais & Norenzayan, 2012Shenhav et al., 2012Stagnaro et al., 2019). Jost (2017) reported a meta analysis of 13 studies that related CRT performance to political views. The vast majority of those studies showed that liberals exhibited more cognitive reflection than conservatives. In the present data, this is echoed by the modest negative correlation with free market, although it was not reflected in the socio-political conservatism measure. The positive associations of the CRT with endorsement of all three scientific constructs, vaccination, CAM rejection, and evolution replicate similar previous findings (Shtulman & McCallum, 2014Wagner-Egger et al., 2018). The association also coordinates well with recent findings that analytical thinking is associated with better differentiation between “fake news” and valid information (Pennycook & Rand, 2018).

Rejection of science on the political left?

Our findings provide little or no evidence that people on the political left reject vaccinations. On the contrary, to the extent that worldviews determined vaccination attitudes, it was free-market endorsement that predicted rejection. This result parallels a similar association observed by Hornsey, Harris, and Fielding (2018), albeit using a different instrument to measure Libertarian attidudes (hierarchical-individualism as opposed to free-market endorsement). The result is also consonant with the notion that libertarians object to the government intrusion arising from mandatory vaccination programs (Kahan et al., 2010). It also meshes well with the pattern observed by Lewandowsky, Gignac, and Oberauer (2013), who showed that when socio-political conservatism was removed from a model, free-market endorsement on its own predicted rejection of vaccinations (whereas the converse was not true). Overall, our results thus converge with other recent findings that have found an association between right-wing politics and rejection of vaccinations (Baumgaertner, Carlisle, & Justwan, 2018Kahan et al., 2010Rabinowitz et al., 2016). In a recent cross-sectional analysis of voting behavior and vaccination rates across European countries, Kennedy (2019) found a strong relationship between the vote share for populist parties and vaccine hesitancy.

Similarly, contrary to reports that CAM use and left-wing ideas have a natural affinity for each other (see, e.g., Keshet, 2009), we found that CAM rejection was negatively, but modestly, associated with the All conservatism factor that subsumed all three of our worldview constructs; namely, religiosity, free market endorsement, and socio-political conservatism. Moreover, in our data, none of the gender constructs were associated with CAM attitudes. This runs counter to the idea that CAM use is “feminist” (Scott, 1998). To our knowledge, our results constitute the first empirical examination of the links between political views and CAM attitudes. Our results that conservatives are more likely to embrace CAM is consonant with historical analyses that have found strong links between right-wing organizations, such as the John Birch Society in the U.S., and endorsement of “alternative” cancer treatments (Markle, Petersen, & Wagenfeld, 1978). The present result adds to the list of failed attempts to discover science denial on the political left (e.g., Hamilton, 20112015Hamilton, Hartter, & Saito, 2015Hamilton, Hartter, Lemcke-Stampone, et al., 2015Kahan et al., 2010Lewandowsky, Gignac, & Oberauer, 2013Tom, 2018).

Attitudes towards gender differences

We observed an intriguing interplay of the attitudes towards general Darwinian evolution, gender differences, and how those gender differences might have arisen. At a coarse level of analysis, we observed three unsurprising associations: The idea that men and women differ naturally was highly correlated with the idea that they evolved differently, but was negatively correlated with the construct that proclaimed gender equality. The equality construct was also negatively correlated with the idea that men and women evolved differently, although that correlation was smaller than for natural differences.

At a more detailed level of analysis, several intriguing associations emerged. First, acceptance of general Darwinian evolution was positively associated with two seemingly conflicting constructs; namely, that men and women evolved differently and that they are the same. Moreover, evolution was negatively correlated with the idea that men and women are naturally different, even though evolution is one way in which such “natural” differences might have emerged. A similarly nuanced pattern obtained when the worldview constructs were used to predict gender attitudes. Although the over-arching All conservatism factor functioned as expected, with negative weights for gender equality and positive weights for the two constructs insisting on gender differences, there was an additional selective effect of religiosity on the rejection of evolved gender differences.

Further analysis revealed that the involvement of evolution, either on its own or in explaining gender differences, served as a “wedge issue” that disrupted otherwise straightforward associations between right-wing politics and opposition to gender equality (and, vice versa, rejection of gender differences and left-wing politics) and—as foreshadowed in Figure 1—created dilemmas for participants of all political persuasions. As noted in connection with Figure 6, conservatives who strongly rejected Darwinian evolution resolved their dilemma by endorsing “natural” gender differences while rejecting evolved gender differences. Those participants were thus willing to forego endorsement of gender differences to maintain consistency with their opposition to evolution. Conversely, liberals who are strongly committed to gender equality tended to reject the idea of evolved gender differences, even though those participants were demonstrably committed to accepting evolution. Those participants were thus willing to forego endorsement of a specific manifestation of evolution to maintain consistency with their commitment to equality. Thus, partisans of either stripe can agree in their rejection of the idea that men and women evolved differently, but they do so for entirely different reasons. Conservatives do so when they are committed to reject evolution, and liberals do so when they are committed to gender equality. Both groups therefore resolve the dilemmas posed by our gender constructs by “sacrificing” endorsement of evolved gender differences.

Conclusion

Our results contribute to two seemingly conflicting streams of outcomes in the literature on how worldviews moderate people’s responses to scientific issues. On the one hand, there is much evidence for pervasive attitudinal asymmetry, at least in the United States, with conservatives being more likely to reject well-established scientific propositions than liberals. To date, little or no evidence for left-wing science denial has been reported. We add to this stream by showing that, contrary to previous largely anecdotal reports, liberals are more likely to reject complementary and alternative medicines, in line with the scientific evidence, than conservatives.

On the other hand, there is considerable evidence that liberals and conservatives process scientific data in a symmetrical fashion. That is, liberals and conservatives alike resort to the same cognitive shortcuts when data conform to their biases, giving rise to a symmetric set of errors (Kahan, Peters, Dawson, & Slovic, 2017Washburn & Skitka, 2018). We also add to this stream of research by showing that, when confronted by worldview-triggered dilemmas, both liberals and conservatives resolve those dilemmas in an equally “rational” fashion, by selectively “sacrificing” endorsement of a specific construct about gender differences. Liberals, who generally endorse evolution, believe that for some reason it did not affect differences between the sexes; this could be rationalized perhaps by assuming that evolution causes differences only between but not within species. Conservatives, who frequently reject evolution, believe that men and women differ naturally without having evolved differently; this could be rationalized by assuming, for instance, that those natural differences were the result of divine intervention.

A final contribution of our study is that it points to the advantages of a more nuanced analysis of political worldviews, beyond a convenient but simplistic classification of people into left and right, or liberals and conservatives. While this classification is sufficient to explain some scientific attitudes—for example, it matters little how one measures political worldviews to explain rejection of climate science (e.g., Hornsey, Harris, Bain, & Fielding, 2016Kahan, 2015)—there are other circumstances in which a more nuanced differentiation between different aspects of worldviews provides considerably greater explanatory power.

Sunday, December 6, 2020

No negative Flynn effect in France: Why variations of intelligence should not be assessed using tests based on cultural knowledge

No negative Flynn effect in France: Why variations of intelligence should not be assessed using tests based on cultural knowledge. Corentin Gonthier, Jacques Grégoire, Maud Besançon. Intelligence, Volume 84, January–February 2021, 101512. https://doi.org/10.1016/j.intell.2020.101512


Highlights

• We tested the claim that intelligence decreases in France (negative Flynn effect).

• We re-analyzed princeps data (Dutton & Lynn, 2015) and collected a new sample.

• Performance only decreases on tests involving declarative knowledge, not reasoning.

• This is attributable to measurement bias for older items, due to cultural changes.

• There is fluctuation of knowledge, but no overall negative Flynn effect in France.

Abstract: In 2015, Dutton and Lynn published an account of a decrease of intelligence in France (negative Flynn effect) which had considerable societal impact. This decline was argued to be biological. However, there is good reason to be skeptical of these conclusions. The claim of intelligence decline was based on the finding of lower scores on the WAIS-III (normed in 1999) for a recent sample, but careful examination of the data suggests that this decline was in fact limited to subtests with a strong influence of culture-dependent declarative knowledge. In Study 1, we re-analyzed the data used by Dutton and Lynn (2015) and showed that only subtests of the WAIS primarily assessing cultural knowledge (Gc) demonstrated a significant decline. Study 2 replicated this finding and confirmed that performance was constant on other subtests. An analysis of differential item functioning in the five subtests with a decline showed that about one fourth of all items were significantly more difficult for subjects in a recent sample than in the original normative sample, for an equal level of ability. Decline on a subtest correlated 0.95 with its cultural load. These results confirm that there is currently no evidence for a decrease of intelligence in France, with prior findings being attributable to a drift of item difficulty for older versions of the WAIS, due to cultural changes. This highlights the role of culture in Wechsler's intelligence tests and indicates that when interpreting (negative) Flynn effects, the past should really be treated as a different country.

Keywords: Flynn effectNegative Flynn effectFluid intelligenceCrystallized intelligenceDifferential item functioning (DIF)

5. General discussion

The results of both Study 1 and Study 2 unambiguously indicated that there was no negative Flynn effect in France, in the sense of a general decrease of intelligence or a decrease in the ability to perform logical reasoning: there were no reliable differences between WAIS-III and WAIS-IV for any of the subtests reflecting visuo-spatial reasoning (Gf and Gv), or working memory and processing speed (Gsm and Gs), and which were based on abstract materials. We did find lower total performance on the WAIS-III for a recent sample, but contrary to the classic Flynn effect, this difference between cohorts was exclusively driven by the five subtests involving Gc - acquired declarative knowledge tied to a specific cultural setting.

When considered under the angle of item content, it appeared that this decrease on subtests involving declarative knowledge largely reflected, not an actual decrease of ability, but measurement bias due to differences of item difficulty for samples collected at different dates. All in all, in the five subtests demonstrating a decline, about one fourth of items were comparatively more difficult for the 2019 sample than for the 1999 sample for an equal level of ability. These differences could be traced down to a few specific skills. All but one of the Information items that were biased against a recent sample related to the names of famous people, and biased Comprehension items were all related to civic education; interestingly, the test publisher decided to practically eliminate both topics from the WAIS-IV. All but one of the biased Arithmetic items required computing mental division or proportions. For Vocabulary, the negative net effect of bias was partly compensated by the fact that some words were easier in the recent sample, more consistent with a change in language frequency patterns than with an absolute decrease in vocabulary skills. In all cases, these increases in item difficulty for a recent sample could be attributed to environmental changes in school programs, topics covered by the media, and other societal evolutions.

The fact that the performance decrease on a subtest correlated at 0.95 with its cultural load confirms this conclusion and runs counter to the interpretation that the observed decline is caused by biological factors (Woodley of Menie & Dunkel, 2015). This does not completely rule out biological factors, as cultural loads are not pure indicators of cultural influences: a possible alternative interpretation, as suggested by Edward Dutton and Woodley of Menie, is that a genetic decrease in fluid reasoning could negatively affect the culture of a country, in turn reverberating on Gc subtests (see Dutton et al., 2017; this is a variant of investment theory and of explanations assuming genotype-environment covariance; e.g. Kan et al., 2013). However, this idea would be almost impossible to falsify, and it would be difficult to reconcile with the facts that the correlation with heritability was non-significant and that there was no decline at all for the Gf and Gv subtests, which tend to have high heritability (e.g. Kan et al., 2013Rijsdijk, Vernon, & Boomsma, 2002van Leeuwen, van den Berg, & Boomsma, 2008), and which would be expected to decrease before effects on Gc could be observable. There is also a lack of plausible biological mechanisms that could create such a large decline in the dataset in such a short timeframe. All this converges to clearly suggest a role of cultural changes as the most parsimonious interpretation of the data.

In short, the conclusion that can be drawn from a comparison of WAIS-III and WAIS-IV is that over the last two decades, there has been no decline of reasoning abilities in the French population, but there has been an average decrease in a limited range of cultural knowledge (essentially related to using infrequent vocabulary words, knowing the names of famous people, discussing civic education and performing mental division), which biases performance on older items. In other words, the data do indicate a lower average performance on the WAIS-III in the more recent sample, in line with Dutton and Lynn (2015) results, but a more fine-grained analysis contradicts their interpretation of a general decrease of intelligence in France. In the terms of a hierarchical model of intelligence (Wicherts, 2007), there appears to be no decrease in latent ability at the first level of g; there is a decrease at the second level of broad abilities, but only for Gc; and this decrease seems essentially due to cultural changes creating measurement bias at the fourth level composed of performance for specific items.

This pattern is entirely distinct from the Flynn effect, which represents an increase in general intelligence, and especially in Gf performance, accompanied by much smaller changes on Gc (Pietschnig & Voracek, 2015). Hence it is our conviction that this pattern reflects substantially different mechanisms and cannot reasonably be labeled a “negative Flynn effect”, without extending the definition of the Flynn effect to the point where any difference between cohorts could be called a “Flynn effect” and where it would no longer be useful as a heuristic concept. This point is compounded by the fact that the difference reflected item-related measurement bias, rather than an actual change of ability. To quote Flynn (2009a): “Are IQ gains ‘cultural bias’? We must distinguish between cultural trends that render neutral content more familiar and cultural trends that really raise the level of cognitive skills. If the spread of the scientific ethos has made people capable of using logic to attack a wider range of problems, that is a real gain in cognitive skills. If no one has taken the trouble to update the words on a vocabulary test to eliminate those that have gone out of everyday usage, then an apparent score loss is ersatz.” The current pattern is clearly ersatz: “ersatz effect” may be a better name than “negative Flynn effect”.

There are two possible interpretations to the ersatz difference observed here. On one hand, this decline could be restricted to areas covered by the WAIS-III, and could be compensated by increases in other areas: in other words, the 2019 sample may possess different knowledge, but not less knowledge than the 1999 sample. On the other hand, this might represent a real decline and a cause for concern: results of the large-scale PISA surveys (performed on about 7.000 pupils) routinely point to significant inequalities in the academic skills of French pupils, and their average level of mathematics performance has declined since the early 2000s (e.g. OECD, 2019). It is impossible to adjudicate between these two possibilities (which would require having the 1999 sample perform the WAIS-IV), but even if there were an actual decrease in average knowledge, this conclusion would be significantly less bleak than the picture of a biologically-driven intelligence decrease painted by Dutton and Lynn (2015), and would highlight possible shortfalls of the French educational system (see also Blair, Gamson, Thorne, & Baker, 2005) rather than the downward trajectory of a population becoming less and less intelligent.

This conclusion is in line with a tradition of studies attributing fluctuations of intelligence scores to methodological biases, especially as they relate to [cultural] item content (e.g. Beaujean & Osterlind, 2008Beaujean & Sheng, 2010Kaufman, 2010Nugent, 2006Pietschnig et al., 2013Rodgers, 1998Weiss et al., 2016). As an example, Flieller (1988) reached the same conclusion in a French dataset over three decades ago; Brand et al. (1989) also found a similar result of decreasing scores due to changes of items difficulty, which they illustrated with an understandable decline of the proportion of correct answers for the item “What is a belfry?” between 1961 and 1984. This conclusion is also in line with studies arguing for the role of cultural environment and culture-based knowledge in Flynn-like fluctuations of intelligence over time (e.g. Bratsberg & Rogeberg, 2018). Note that drifts of item difficulty are only one aspect of such cultural changes; changes of test-taking pattern behavior, such as increased guessing, are another example (e.g. Must & Must, 2013; Pietschnig & Voracek, 2013).

Beyond the specific case of average intelligence in France, the current results constitute a reminder that intelligence scores are not pure reflections of intelligence and have multiple determinants, some of which can be affected by cultural factors that do not reflect intelligence itself. Put otherwise, this is an illustration of the principle that performance can differ between groups of subjects without representing a true difference of ability (Beaujean & Osterlind, 2008Beaujean & Sheng, 2010). This is a well-known bias of cross-country comparisons, where test performance can be markedly lower in a culture for which the test was not designed (e.g. Cockcroft, Alloway, Copello, & Milligan, 2015Greenfield, 1997Van de Vijver, 2016). In other words, this principle generalizes to all comparisons between samples, not just intelligence fluctuations over time: investigators should be skeptical of the origin of between-group differences whenever cultural content is involved. This also applies to clinical psychologists using intelligence tests to compare patients from specific cultural groups to a (culturally different) normative sample.

Seven major recommendations for cross-sample comparisons can be derived from the current results:

1) comparisons based on validity samples collected by the publishers of Wechsler scales have to be avoided due to uncertainties about sample composition (as already stressed by Zhu & Tulsky, 1999; the distribution of ages in Study 1 as represented in Fig. 1 constitutes a stark reminder of this fact);

2) comparisons involving multiple subtests should carefully consider which subtests exactly demonstrate differences, and especially which dimension of intelligence they measure (Gf or Gc?);

3) comparisons between different samples should never be performed using different tests with substantial differences of item content, if there is a possibility that the items will be differentially affected by cultural variables extraneous to ability itself (Kaufman, 2010Weiss et al., 2016);

4) even when the same version of a test involving cultural content is used, differences between samples collected at different dates in the same country should be treated as if the past sample were from a different country, due to the possibility of differential item functioning emerging over time;

5) as a consequence, comparisons between samples should primarily rely on tests that involve as little contribution of culture-based declarative knowledge as possible, such as Raven's matrices (e.g. Flynn, 2009b);

6) when only tests requiring culture-based declarative knowledge are available, differences should necessarily be interpreted taking into account possible measurement bias. The issue of measurement bias can be considered under the prism of IRT as a way to separate item parameters from ability estimates and test for DIF, and/or using multigroup confirmatory factor analyses as a way to more accurately specify at which level of a hierarchical model of intelligence samples actually differ (Wicherts et al., 2004);

7) lastly, and as exemplified by the pattern of correlations between performance decline, heritability and g-loadings, and cultural load, no conclusions about the biological origin of between-group differences in test scores can be drawn without also testing the role of cultural factors.

Only 54pct of newspapers than published erroneous research findings published the retraction; the retraction stories were balanced, but shorter than those on the article’s publication and often lacking in context & detail

Dissemination of Erroneous Research Findings and Subsequent Retraction in High-Circulation Newspapers: A Case Study of Alleged MDMA-Induced Dopaminergic Neurotoxicity in Primates. Brian S. Barnett & Richard Doblin. Journal of Psychoactive Drugs, Nov 26 2020. https://doi.org/10.1080/02791072.2020.1847365

Rolf Degen's take: https://twitter.com/DegenRolf/status/1335283847613788162

Abstract: Ensuring the public is informed of retractions has proven difficult for the scientific community. While it is possible that newspapers focus differential attention on publication of scientific articles and their subsequent retractions, this topic has received minimal attention from researchers. To learn more, we analyzed newspaper coverage of the high-profile 2002 article Severe dopaminergic neurotoxicity in primates after a common recreational dose regimen of MDMA (“ecstasy”) and its retraction in a case study. We searched the 50 largest American newspapers with available online archives for stories about the article’s publication and retraction. Of the 50 newspapers, 26 (52%) covered the article’s publication and 20 (40%) its retraction. Six of the 50 newspapers (12%) published stories on the article’s retraction without covering its initial publication. Of the 26 newspapers covering the article’s publication, only 14 (54%) covered its retraction. Stories about the retraction were balanced, but shorter than those on the article’s publication and often lacking in context and detail. While the decrease in coverage of the article’s retraction was moderate among the entire sample, the much lower retraction coverage in newspapers that had already covered the article’s publication is concerning and emphasizes the need for increased media coverage of retractions.

KEYWORDS: MDMA, ecstasy, retraction, media, newspaper



Lottery winners that keep working vs. retiring: Across samples and nations, participants morally praise needless work

A creative destruction approach to replication: Implicit work and sex morality across cultures. Warren Tierney et al. Journal of Experimental Social Psychology, Volume 93, March 2021, 104060. https://doi.org/10.1016/j.jesp.2020.104060

Rolf Degen's take: https://twitter.com/DegenRolf/status/1335225348900982786

• This “creative destruction” replication initiative added new measures and populations to four original study designs.

• The theory of Implicit Puritanism was competed against seven alternative accounts of work morality.

• A number of original findings replicated across multiple cultures, whereas two were identified as likely false positives.

• The best-fitting model suggests work is intuitively moralized across cultures.

Abstract: How can we maximize what is learned from a replication study? In the creative destruction approach to replication, the original hypothesis is compared not only to the null hypothesis, but also to predictions derived from multiple alternative theoretical accounts of the phenomenon. To this end, new populations and measures are included in the design in addition to the original ones, to help determine which theory best accounts for the results across multiple key outcomes and contexts. The present pre-registered empirical project compared the Implicit Puritanism account of intuitive work and sex morality to theories positing regional, religious, and social class differences; explicit rather than implicit cultural differences in values; self-expression vs. survival values as a key cultural fault line; the general moralization of work; and false positive effects. Contradicting Implicit Puritanism's core theoretical claim of a distinct American work morality, a number of targeted findings replicated across multiple comparison cultures, whereas several failed to replicate in all samples and were identified as likely false positives. No support emerged for theories predicting regional variability and specific individual-differences moderators (religious affiliation, religiosity, and education level). Overall, the results provide evidence that work is intuitively moralized across cultures.


Keywords: ReplicationTheory testingFalsificationImplicit social cognitionPrimingWork valuesCulture


9. General discussion

This large-scale creative destruction replication initiative, which involved over eight thousand participants from half a dozen nations, systematically competed theories of culture and work morality against one another. In addition to directly replicating a set of original experimental effects central to the theory of Implicit Puritanism (Poehlman, 2007Uhlmann et al., 2009Uhlmann et al., 2011), we included new measures and populations facilitating novel conceptual tests of the predictions of the Explicit American Exceptionalism, general moralization of work, self-expression values, social class, religious differences, and regional folkways accounts of work values.

The observed pattern of experimental and cross-national differences and similarities severely undermines the original theory of Implicit Puritanism. In every instance, the targeted effect either failed to replicate entirely, or unexpectedly replicated in multiple cultures when it had been predicted to emerge only among Americans. Two original effects— specifically, the moderating effect of target age on judgments of needless work, and influence of implicit salvation primes on work behavior— failed to replicate in all populations examined and are identified as likely false positives (Poehlman, 2007Uhlmann et al., 2011). In contrast, the main effect of moral praise for a lottery winner who continues to work, and false memories consistent with an implicit link between work and sex morality (Poehlman, 2007Uhlmann et al., 2009), were robust across cultures (India, the United States, Australia, and United Kingdom). Finally, the effects of an intuitive mindset on moral judgments of needless work replicated across the USA, Australia, and UK samples, but not the India sample. The emergence of a number of key effects across a number of different nations sharply contradicts Implicit Puritanism's core theoretical claim of a unique American work morality.

Rather than leaving a theoretical void in the form of reduced confidence in the original findings and the underlying ideas, these results point in new theoretical directions. Specifically, they provide initial evidence that work behavior elicits strong moral intuitions across cultures, and that the gap between intuitive and deliberative feelings about work could be larger in wealthier societies. Personal religion (e.g., Protestant faith), degree of religiosity, socioeconomic status, and region of the United States (e.g., historically Puritan-Protestant New England) did not moderate any of the observed experimental effects, failing to support the associated accounts of work values. More investigations involving larger samples of countries, especially societies in which survival rather than self-expression values are widely endorsed (Inglehart, 1997Inglehart & Welzel, 2005), and with varied historic backgrounds and diverse workways (Sanchez-Burks & Lee, 2007) are needed before drawing strong conclusions (Simons, Shoda, & Lindsay, 2017). At the same time, we believe the present investigation highlights the feasibility and generative nature of the creative destruction approach to replication, in identifying the most promising theories to guide further empirical research.

9.1. A Bayesian multiverse analysis

A pre-registered (https://osf.io/pgfm8) Bayesian multiverse analysis examined the consequences of different inclusion criteria, variable operationalizations, and statistical approaches for the replication results (see Haaf, Hoogeveen, Berkhout, Gronau, & Wagenmakers, 2020Haaf & Rouder, 2017Rouder, Haaf, Davis-Stober, & Hilgard, 2019). Overall, the results of the Bayesian multiverse are highly consistent with the frequentist analyses reported earlier (see Supplement 9 for a more detailed report). Strong evidence emerged that the tacit inference effect and overall valorization of needless work (regardless of target age or participant mindset) are true-positives and further present across samples. Although less strongly, the data also support an overall intuitive mindset effect across all samples combined. Finally, strong evidence emerged against the target age and needless work effect, and the salvation prime effect. The latter remained unsupported even in those conditions pre-specified as most favorable for priming effects, specifically controlled laboratory studies and excluding participants suspicious of being influenced or whom had failed to complete all the scrambled sentences. The Implicit Puritanism model performed worse than the winning model for all six original effects. The General Moralization of Work and False Positives accounts were the best fitting models overall, depending on the effect in question. The Protestant work ethic was found to positively predict the main effects of needless work (i.e., preference for worker over retiree regardless of target age or participant mindset), but such judgments did not vary across cultures as predicted by the Explicit American Exceptionalism account or any of the other competing theories (see Furnham et al., 1993, and Leong, Huang, & Mak, 2014, for evidence “Protestant” work ethic beliefs are broadly applicable). Empirical estimates converged across the different universes of potential analyses (see Fig. S9–1 in Supplement 9). Effects that were not replicated in the primary analyses were not supported under any specification in the Bayesian multiverse, and replicable effects found evidentiary support across many different specifications.

9.2. False inferences in cross-cultural experiments

The present replication results highlight potential broader challenges for producing robust and reliable cross-cultural experimental research (Milfont & Klein, 2018). We define an x-cultural experiment as a study containing a manipulation (e.g., random assignment to condition A or condition B) and sampling at least two distinct cultural populations (e.g., university students in China and the United States). More broadly than the typical concerns about false positive findings (Open Science Collaboration, 2015Simmons et al., 2011), such cross-cultural investigations are open to false inferences about patterns of experimental results across different human populations. In addition to the expected condition differences failing to emerge (e.g., salvation prime effect, target age and needless work effect), cross-cultural findings may prove over-robust, in other words emerging in societies where they were theoretically expected not to (e.g., the tacit inferences effect and intuitive work morality effect replicating outside the United States). False inferences could also involve concluding a phenomenon is culturally bounded when it is fact universal, and mis-estimating the direction or relative magnitude of an effect between two cultures, among other empirical patterns.

At least two major features of an x-cultural experiment increase the chances of drawing such false conclusions, relative to a simple two-condition experiment in a single population. First, x-cultural studies often rely on an interaction between membership in a cultural group and an experimental manipulation as the key statistical test of the hypothesized cultural difference. Between-subjects interaction tests are typically underpowered unless very large samples are recruited (Simonsohn, 2014Smith, Levine, Lachlan, & Fediuk, 2002). The Open Science Collaboration's Reproducibility Project: Psychology replicated 23 of 49 targeted studies (47%) whose key test was a main or simple effect, and only 8 of 37 studies (22%) when the key test was an interaction. Second, x-cultural experiments typically rely on small convenience samples and attempt to generalize to broader cultures. For example, 100 participants per location might be recruited from universities in New Haven, USA, and Xiamen, China. Since societies are quite heterogeneous (Kitayama et al., 2006Muthukrishna et al., 2020Nisbett & Cohen, 1996Talhelm et al., 2014), this approach may or may not capture central tendencies in the United States and China.

In the present replication initiative a number of the experimental condition differences emerged (i.e., tacit inferences effect, intuitive work morality effect, needless work main effect), yet none of the original condition x national culture interactions (Poehlman et al., 2007; Uhlmann et al., 2009Uhlmann et al., 2011) were obtained again. The Many Labs 2 crowd initiative likewise failed to replicate previously reported interactions between experimental manipulations and cultural populations, even some considered well-established findings (Klein et al., 2018). To guard against such problems, future cross-cultural behavioral research should seek to collect larger and more varied samples. Researchers might form a network of laboratories and crowdsource data collections at multiple sites in each nation (Cuccolo, Irgens, Zlokovich, Grahe, & Edlund, in pressMoshontz et al., 2018), or partner with a survey firm to systematically sample respondents from different regions of the same country, ideally achieving representative sampling.

Different cultural theories predict distinct patterns of empirical results, and some may be more subject to false inferences than others. In a presence-absence pattern, an experimental effect is hypothesized to emerge in one culture, but not in the other. Most of the original Implicit Puritanism studies predicted and found such a pattern, for example an implicit link between work and sex morality among Americans, but not members of other cultures. In a reduced pattern, the effect is in the same direction for both cultures, but diminished in some cultures relative to others (e.g., varying degrees of loss aversion among members of different nations; Arkes, Hirshleifer, Jiang, & Lim, 2010). Finally, in a reversal pattern, the effects of an experimental manipulation are expected to fully reverse between a focal culture and comparison culture. For example, Gelfand et al. (2002) predicted and found that whereas American participants were significantly more disposed to accept positive than negative feedback, Japanese participants exhibited the opposite pattern, accepting more personal responsibility for negative than for positive feedback. We suggest that future theorizing on culture focus on developing such reversal predictions, which rely on better powered crossover interactions, and are less likely to be confounded by measurement challenges than presence-absence patterns or reduced patterns.

9.3. The broader utility of the creative destruction approach

The present culture and work morality project is the first of several recent initiatives applying the creative destruction approach to replication to previously published findings from our research group (see Tierney et al., in press, for a review). Adding to the recent deluge of failed replications of experimental behavioral findings (e.g., Klein et al., 2014Klein et al., 2018Open Science Collaboration, 2015), none of these replication studies succeeding in reproducing the original patterns of results. However, unlike prior replication initiatives, we were able to obtain positive evidence for alternative theoretical accounts (Supplement 13).

We believe this highlights the general utility of the creative destruction approach to replication, which seeks to combine theory pruning methods from the management literature (Leavitt et al., 2010), with best practices from the open science movement in psychology such as pre-registration (Van't Veer & Giner-Sorolla, 2016Wagenmakers et al., 2012) to achieve critical tests (Mayo, 2018) of competing intellectual ideas. Unlike traditional replication approaches, in which the original finding is tested against the expectation of null effects, the creative destruction approach seeks to identify the strongest theory currently operating in a given intellectual space.

Of course, not all research topics and original findings are well suited for large-scale competitive theory testing. As discussed at greater length by Tierney et al. (in press), the creative destruction approach is best suited to mature research areas with substantial published evidence, common methodological approaches, and well-developed theories that make precise, bounded predictions distinct from those of other theories. In contrast, traditional replications simply repeating the original method are better suited to confirming or disconfirming potential new breakthrough findings. Scientists should carefully allocate scarce replication resources for maximum impact, leveraging the methods best suited to the situation. It is our hope the present line of research contributes to a Replication 2.0 movement, in which rather than solely probing the reliability of past findings, scientists also focus on replacing them with new and improved accounts of human behavior.