A creative destruction approach to replication: Implicit work and sex morality across cultures. Warren Tierney et al. Journal of Experimental Social Psychology, Volume 93, March 2021, 104060. https://doi.org/10.1016/j.jesp.2020.104060
Rolf Degen's take: https://twitter.com/DegenRolf/status/1335225348900982786
• This “creative destruction” replication initiative added new measures and populations to four original study designs.
• The theory of Implicit Puritanism was competed against seven alternative accounts of work morality.
• A number of original findings replicated across multiple cultures, whereas two were identified as likely false positives.
• The best-fitting model suggests work is intuitively moralized across cultures.
Abstract: How can we maximize what is learned from a replication study? In the creative destruction approach to replication, the original hypothesis is compared not only to the null hypothesis, but also to predictions derived from multiple alternative theoretical accounts of the phenomenon. To this end, new populations and measures are included in the design in addition to the original ones, to help determine which theory best accounts for the results across multiple key outcomes and contexts. The present pre-registered empirical project compared the Implicit Puritanism account of intuitive work and sex morality to theories positing regional, religious, and social class differences; explicit rather than implicit cultural differences in values; self-expression vs. survival values as a key cultural fault line; the general moralization of work; and false positive effects. Contradicting Implicit Puritanism's core theoretical claim of a distinct American work morality, a number of targeted findings replicated across multiple comparison cultures, whereas several failed to replicate in all samples and were identified as likely false positives. No support emerged for theories predicting regional variability and specific individual-differences moderators (religious affiliation, religiosity, and education level). Overall, the results provide evidence that work is intuitively moralized across cultures.
Keywords: ReplicationTheory testingFalsificationImplicit social cognitionPrimingWork valuesCulture
9. General discussion
This large-scale creative destruction replication initiative, which involved over eight thousand participants from half a dozen nations, systematically competed theories of culture and work morality against one another. In addition to directly replicating a set of original experimental effects central to the theory of Implicit Puritanism (Poehlman, 2007; Uhlmann et al., 2009, Uhlmann et al., 2011), we included new measures and populations facilitating novel conceptual tests of the predictions of the Explicit American Exceptionalism, general moralization of work, self-expression values, social class, religious differences, and regional folkways accounts of work values.
The observed pattern of experimental and cross-national differences and similarities severely undermines the original theory of Implicit Puritanism. In every instance, the targeted effect either failed to replicate entirely, or unexpectedly replicated in multiple cultures when it had been predicted to emerge only among Americans. Two original effects— specifically, the moderating effect of target age on judgments of needless work, and influence of implicit salvation primes on work behavior— failed to replicate in all populations examined and are identified as likely false positives (Poehlman, 2007; Uhlmann et al., 2011). In contrast, the main effect of moral praise for a lottery winner who continues to work, and false memories consistent with an implicit link between work and sex morality (Poehlman, 2007; Uhlmann et al., 2009), were robust across cultures (India, the United States, Australia, and United Kingdom). Finally, the effects of an intuitive mindset on moral judgments of needless work replicated across the USA, Australia, and UK samples, but not the India sample. The emergence of a number of key effects across a number of different nations sharply contradicts Implicit Puritanism's core theoretical claim of a unique American work morality.
Rather than leaving a theoretical void in the form of reduced confidence in the original findings and the underlying ideas, these results point in new theoretical directions. Specifically, they provide initial evidence that work behavior elicits strong moral intuitions across cultures, and that the gap between intuitive and deliberative feelings about work could be larger in wealthier societies. Personal religion (e.g., Protestant faith), degree of religiosity, socioeconomic status, and region of the United States (e.g., historically Puritan-Protestant New England) did not moderate any of the observed experimental effects, failing to support the associated accounts of work values. More investigations involving larger samples of countries, especially societies in which survival rather than self-expression values are widely endorsed (Inglehart, 1997; Inglehart & Welzel, 2005), and with varied historic backgrounds and diverse workways (Sanchez-Burks & Lee, 2007) are needed before drawing strong conclusions (Simons, Shoda, & Lindsay, 2017). At the same time, we believe the present investigation highlights the feasibility and generative nature of the creative destruction approach to replication, in identifying the most promising theories to guide further empirical research.
9.1. A Bayesian multiverse analysis
A pre-registered (https://osf.io/pgfm8) Bayesian multiverse analysis examined the consequences of different inclusion criteria, variable operationalizations, and statistical approaches for the replication results (see Haaf, Hoogeveen, Berkhout, Gronau, & Wagenmakers, 2020; Haaf & Rouder, 2017; Rouder, Haaf, Davis-Stober, & Hilgard, 2019). Overall, the results of the Bayesian multiverse are highly consistent with the frequentist analyses reported earlier (see Supplement 9 for a more detailed report). Strong evidence emerged that the tacit inference effect and overall valorization of needless work (regardless of target age or participant mindset) are true-positives and further present across samples. Although less strongly, the data also support an overall intuitive mindset effect across all samples combined. Finally, strong evidence emerged against the target age and needless work effect, and the salvation prime effect. The latter remained unsupported even in those conditions pre-specified as most favorable for priming effects, specifically controlled laboratory studies and excluding participants suspicious of being influenced or whom had failed to complete all the scrambled sentences. The Implicit Puritanism model performed worse than the winning model for all six original effects. The General Moralization of Work and False Positives accounts were the best fitting models overall, depending on the effect in question. The Protestant work ethic was found to positively predict the main effects of needless work (i.e., preference for worker over retiree regardless of target age or participant mindset), but such judgments did not vary across cultures as predicted by the Explicit American Exceptionalism account or any of the other competing theories (see Furnham et al., 1993, and Leong, Huang, & Mak, 2014, for evidence “Protestant” work ethic beliefs are broadly applicable). Empirical estimates converged across the different universes of potential analyses (see Fig. S9–1 in Supplement 9). Effects that were not replicated in the primary analyses were not supported under any specification in the Bayesian multiverse, and replicable effects found evidentiary support across many different specifications.
9.2. False inferences in cross-cultural experiments
The present replication results highlight potential broader challenges for producing robust and reliable cross-cultural experimental research (Milfont & Klein, 2018). We define an x-cultural experiment as a study containing a manipulation (e.g., random assignment to condition A or condition B) and sampling at least two distinct cultural populations (e.g., university students in China and the United States). More broadly than the typical concerns about false positive findings (Open Science Collaboration, 2015; Simmons et al., 2011), such cross-cultural investigations are open to false inferences about patterns of experimental results across different human populations. In addition to the expected condition differences failing to emerge (e.g., salvation prime effect, target age and needless work effect), cross-cultural findings may prove over-robust, in other words emerging in societies where they were theoretically expected not to (e.g., the tacit inferences effect and intuitive work morality effect replicating outside the United States). False inferences could also involve concluding a phenomenon is culturally bounded when it is fact universal, and mis-estimating the direction or relative magnitude of an effect between two cultures, among other empirical patterns.
At least two major features of an x-cultural experiment increase the chances of drawing such false conclusions, relative to a simple two-condition experiment in a single population. First, x-cultural studies often rely on an interaction between membership in a cultural group and an experimental manipulation as the key statistical test of the hypothesized cultural difference. Between-subjects interaction tests are typically underpowered unless very large samples are recruited (Simonsohn, 2014; Smith, Levine, Lachlan, & Fediuk, 2002). The Open Science Collaboration's Reproducibility Project: Psychology replicated 23 of 49 targeted studies (47%) whose key test was a main or simple effect, and only 8 of 37 studies (22%) when the key test was an interaction. Second, x-cultural experiments typically rely on small convenience samples and attempt to generalize to broader cultures. For example, 100 participants per location might be recruited from universities in New Haven, USA, and Xiamen, China. Since societies are quite heterogeneous (Kitayama et al., 2006; Muthukrishna et al., 2020; Nisbett & Cohen, 1996; Talhelm et al., 2014), this approach may or may not capture central tendencies in the United States and China.
In the present replication initiative a number of the experimental condition differences emerged (i.e., tacit inferences effect, intuitive work morality effect, needless work main effect), yet none of the original condition x national culture interactions (Poehlman et al., 2007; Uhlmann et al., 2009, Uhlmann et al., 2011) were obtained again. The Many Labs 2 crowd initiative likewise failed to replicate previously reported interactions between experimental manipulations and cultural populations, even some considered well-established findings (Klein et al., 2018). To guard against such problems, future cross-cultural behavioral research should seek to collect larger and more varied samples. Researchers might form a network of laboratories and crowdsource data collections at multiple sites in each nation (Cuccolo, Irgens, Zlokovich, Grahe, & Edlund, in press; Moshontz et al., 2018), or partner with a survey firm to systematically sample respondents from different regions of the same country, ideally achieving representative sampling.
Different cultural theories predict distinct patterns of empirical results, and some may be more subject to false inferences than others. In a presence-absence pattern, an experimental effect is hypothesized to emerge in one culture, but not in the other. Most of the original Implicit Puritanism studies predicted and found such a pattern, for example an implicit link between work and sex morality among Americans, but not members of other cultures. In a reduced pattern, the effect is in the same direction for both cultures, but diminished in some cultures relative to others (e.g., varying degrees of loss aversion among members of different nations; Arkes, Hirshleifer, Jiang, & Lim, 2010). Finally, in a reversal pattern, the effects of an experimental manipulation are expected to fully reverse between a focal culture and comparison culture. For example, Gelfand et al. (2002) predicted and found that whereas American participants were significantly more disposed to accept positive than negative feedback, Japanese participants exhibited the opposite pattern, accepting more personal responsibility for negative than for positive feedback. We suggest that future theorizing on culture focus on developing such reversal predictions, which rely on better powered crossover interactions, and are less likely to be confounded by measurement challenges than presence-absence patterns or reduced patterns.
9.3. The broader utility of the creative destruction approach
The present culture and work morality project is the first of several recent initiatives applying the creative destruction approach to replication to previously published findings from our research group (see Tierney et al., in press, for a review). Adding to the recent deluge of failed replications of experimental behavioral findings (e.g., Klein et al., 2014, Klein et al., 2018; Open Science Collaboration, 2015), none of these replication studies succeeding in reproducing the original patterns of results. However, unlike prior replication initiatives, we were able to obtain positive evidence for alternative theoretical accounts (Supplement 13).
We believe this highlights the general utility of the creative destruction approach to replication, which seeks to combine theory pruning methods from the management literature (Leavitt et al., 2010), with best practices from the open science movement in psychology such as pre-registration (Van't Veer & Giner-Sorolla, 2016; Wagenmakers et al., 2012) to achieve critical tests (Mayo, 2018) of competing intellectual ideas. Unlike traditional replication approaches, in which the original finding is tested against the expectation of null effects, the creative destruction approach seeks to identify the strongest theory currently operating in a given intellectual space.
Of course, not all research topics and original findings are well suited for large-scale competitive theory testing. As discussed at greater length by Tierney et al. (in press), the creative destruction approach is best suited to mature research areas with substantial published evidence, common methodological approaches, and well-developed theories that make precise, bounded predictions distinct from those of other theories. In contrast, traditional replications simply repeating the original method are better suited to confirming or disconfirming potential new breakthrough findings. Scientists should carefully allocate scarce replication resources for maximum impact, leveraging the methods best suited to the situation. It is our hope the present line of research contributes to a Replication 2.0 movement, in which rather than solely probing the reliability of past findings, scientists also focus on replacing them with new and improved accounts of human behavior.
No comments:
Post a Comment