Saturday, December 7, 2019

Perceptual limits of eyewitness identifications: The distance threshold of reliable identification

Nyman, T. J., Lampinen, J. M., Antfolk, J., Korkman, J., & Santtila, P. (2019). The distance threshold of reliable eyewitness identification. Law and Human Behavior, 43(6), 527-541. Dec 2019. http://dx.doi.org/10.1037/lhb0000342

Abstract: Increased distance between an eyewitness and a culprit decreases the accuracy of eyewitness identifications, but the maximum distance at which reliable observations can still be made is unknown. Our aim was to identify this threshold. We hypothesized that increased distance would decrease identification, rejection accuracy, confidence and would increase response time. We expected an interaction effect, where increased distance would more negatively affect younger and older participants (vs. young adults), resulting in age-group specific distance thresholds where diagnosticity would be 1. We presented participants with 4 live targets at distances between 5 m and 110 m using an 8-person computerized line-up task. We used simultaneous and sequential target-absent or target-present line-ups and presented these to 1,588 participants (age range = 6–77; 61% female; 95% Finns), resulting in 6,233 responses. We found that at 40 m diagnosticity was 50% lower than at 5 m and with increased distance diagnosticity tapered off until it was 1 (±0.5) at 100 m for all age groups and line-up types. However, young children (age range = 6–11) and older adults (age range = 45–77) reached a diagnosticity of 1 at shorter distances compared with older children (age range = 12–17) and young adults (age range = 18–44). We found that confidence dropped with increased distance, response time remained stable, and high confidence and shorter response times were associated with identification accuracy up to 40 m. We conclude that age and line-up type moderate the effect distance has on eyewitness accuracy and that there are perceptual distance thresholds at which an eyewitness can no longer reliably encode and later identify a culprit.

Public Significance Statement
The present study advances earlier findings regarding the negative impact that increased distance has on eyewitness accuracy by providing evidence for an upper distance threshold at 100 m for correct identifications. Our findings highlight the perceptual limits of eyewitness identifications and are relevant for use in courts of law by providing evidence that objective distance can be used as an estimation of eyewitness reliability.

KEYWORDS: eyewitness, identification, distance, face recognition


Discussion

Considering that distance negatively impacts facial encoding and later identification (e.g., Lampinen et al., 2014, 2015), we investigated the effect of distance on eyewitness accuracy in different age groups and line-ups. To achieve this, we conducted an ecologically valid outdoor experiment in which we presented participants with four live targets at distances between 5 m and 110 m, followed by an immediate identification task.

The Effect of Distance and Age on Eyewitness Accuracy
Distance had a significant negative effect in all age groups on identification accuracy in TP line-ups and rejection accuracy in TA line-ups. This held true for both simultaneous and sequential line-ups. There were also significant differences between age groups with the young children (ages 6 to 11) being significantly worse at identifying targets in TP sequential line-ups compared with young adults (ages 18 to 44). Further, older adults (ages 45 to 77) were significantly worse at making correct rejections in both TA simultaneous and sequential line-ups compared with young adults. On our initial analyses (post hoc analyses) the cut-offs in the simultaneous line-ups were 61 m (76 m) for young children, 98 m (110 m) for older children, 77 m (89 m) for young adults, and 69 m (89 m) for older adults. In the sequential line-ups the cut-offs were 47 m (60 m) for young children, 75 m (96 m) for older children, 63 m (79 m) for young adults, and 52 m (69 m) for older adults.
The initial results, which assumed unbiased line-ups, gave us diagnosticity cut-offs that were on average between 10 m and 20 m below the cut-offs found when using the most selected TA filler as the innocent suspect. Arguably, this means that the cut-offs from our initial analyses are perhaps too conservative (i.e., low). However, we would like to emphasize that the decline in diagnosticity with increased distance was similar using either approach and all diagnosticity levels had fallen to 1 (±0.5) at 100 m for all age groups and line-up types. Furthermore, by using the higher cut-offs that are based on the post hoc analyses, we can with adequate certainty define probable upper thresholds where there was no information gained from the line-ups (Wells & Lindsay, 1980; Wells & Olson, 2002). Our findings illustrate that distance has a dramatically negative effect on eyewitness accuracy so that even small variations in distance can play an important role in eyewitness accuracy. Collectively, these results indicated that when assessing eyewitness identifications, an objective measure of the distance between the eyewitness and the culprit is an important gauge of the odds of identification accuracy.
Interestingly, correct rejection rates for young adults decreased rather than increased with increased distance. For other age groups, correct rejection rates remained relatively stable over distance. Instead of an increase in rejection rates, we found an increase in filler selections (see Appendix B in the online supplemental materials), which is also reflected in the shift toward a more liberal response bias as distance increased. Only older children appear to have shifted to a slightly more conservative response bias at 90 m and above. Nevertheless, the overall increase in choosing suggests that participants were not good at taking into account the difficulty of the task and this could be taken as support for the hypotheses that when memory strength is low, more of the photographs match the target equally well, so participants may tend to choose rather than reject. In real life scenarios, where an eyewitness is asked to take part in a police line-up, the identification task inflates choosing rates due to less pristine conditions or the witness wanting to help the police (e.g., Wells et al., 2000). The results, therefore, suggest that the ability of participants to metacognitively judge the difficulty of the task was not proportional to the actual degree of difficulty; mirroring earlier findings (Smith et al., 2018).

Distance Estimation Accuracy
The main findings regarding distance estimation was that as distance increased the level of accuracy decreased and that young children and older adults made more erroneous distance estimations compared with young adults. Moreover, in comparison with young adults, increased distance increased error rates more for young children but less for older adults. Although it is difficult to make any clear interpretations regarding the age-related differences, it may be that experience plays an important role in estimating distance. It is possible that body height is a confounding variable, as a taller person (i.e., adults) might have an advantage when estimating larger distances. This could explain some of the age-related differences, although it does not explain why older adults had more errors and were less affected by increased distance compared with young adults. Notably, the large variation and overall low accuracy of distance estimation indicates that subjective estimations of distance are highly unreliable.

Simultaneous and Sequential Line-ups
Simultaneous line-ups provided an advantage over sequential line-ups, with higher accuracy and a less steep decline in diagnosticity and d′ for all age groups with increased distance. However, young children (ages 6 to 11) were worse at identifying targets in TP sequential lineups compared with young adults (ages 18 to 44) and older adults (ages 45 to 77) were worse at correctly rejecting line-ups in TA simultaneous and TA sequential line-ups, compared with young adults. The differences between age groups fall partly in line with earlier results (Fitzgerald & Price, 2015), because young children and older adults faired much worse compared with young adults. Interestingly, it was apparent from the simultaneous TA rejections that the older adults appear to have been almost equally good/bad at rejecting line-ups with increased distance. This suggests that older adults are prone to choose no matter the memory strength in the TA simultaneous line-ups (but not in the TA sequential line-ups), which could be seen as a dependency on familiarity rather than recollection (Healy et al., 2005; Shing et al., 2010, 2008). Generally, response bias was more liberal in sequential line-ups compared with simultaneous line-ups (see Appendix B in the online supplemental materials), indicating that sequential line-ups increased the likelihood of choosing compared with simultaneous line-ups. Nevertheless, response bias increased in both line-ups as distance increased, indicating that all age groups adopted a more liberal response criterion as the task became more difficult. We have interpreted this as reflecting a higher reliance on a familiarity-based rather than a recollection-based strategy.
Before placing too much emphasis on the differences between the simultaneous and sequential line-ups, it is important to note that the sequential line-ups differed from common U.S. police practice in that the task was absolute and no additional rounds were permitted (e.g., Steblay, Dietrich, Ryan, Raczynski, & James, 2011). Moreover, the number of images in the sequential line-ups was mentioned in the line-up instructions and this can decrease discriminability especially if the target image is presented late in the line-up (Horry, Palmer, & Brewer, 2012). It is, thus possible that the differences between the line-up types are partly due to different degrees of pristine conditions.
These results are relevant to the ongoing debate over simultaneous and sequential line-ups. There have been findings showing that simultaneous line-ups have an advantage, due perhaps to an increased discriminability in the relative judgment task (Clark, 2012; Clark et al., 2015; Gronlund et al., 2015; Wixted et al., 2016). Others have shown that sequential line-ups have an advantage because they decrease mistaken identifications without impacting the number of correct identifications (Steblay et al., 2003, 2011; Wells et al., 2015). Some have even proposed that sequential line-ups do not improve discriminability but have an advantage because they encourage the use of a more conservative criterion (Palmer & Brewer, 2012). The present results indicate that there are important differences in age groups depending on memory encoding and line-up type. When considering increased distance as a representation of lower memory quality, it is clear that most of the age differences disappeared at higher distances due to floor effects, representing the limits of perception and encoding. More research is needed to gain a more in-depth understanding of how different age groups make judgments based on variations in memory strength.

Confidence
A CAC analysis (Mickes, 2015) confirmed that high confidence is associated with high accuracy at distances up to 40m. This was true for all age groups. After 40m there were too few high-confidence observations to reliably analyze the results. The average levels of confidence fell with increased distance, meaning that participants perhaps understood, to a certain degree, the difficulty of the task and downshifted their confidence as distance increased. These results are interesting in relation to the continuing debate regarding the degree to which high confidence is a postdictive indicator of accuracy. It has previously been suggested that less optimal estimator variables will negatively impact the relationship between confidence and accuracy (Deffenbacher, 2008). However, a counterargument is that in pristine conditions and when memory is examined immediately, as in an immediate identification task, then high confidence is associated with high accuracy (Brewer & Wells, 2006; Clark et al., 2015; Sporer et al., 1995). It is also suggested that under such conditions, estimator variables such as distance, will not influence the confidence-accuracy relationship and that participants will adjust their confidence downward when the memory-match for photographs in the line-up is low (Semmler et al., 2018; Wixted & Wells, 2017). The current results appear to fit the latter hypothesis. Nevertheless, it is important to state that in the current sample, there were very few high-confidence responses after 40m; of which very few were correct. This suggests that in real world situations, high-confidence identifications at longer distances most likely reflect either very unique encoding conditions, as for example in the case of a familiar face, or the impact of suggestive factors that inflate confidence, such as an investigator positively reinforcement of the choice made (see, e.g., Wixted & Wells, 2017).

Response Times
The results regarding the relationship between response time and identification accuracy showed that shorter response times were robustly associated with higher identification accuracy at distances below approximately 40m in simultaneous TP line-ups. Earlier studies have suggested that there is a cut-off between 10 and 12 s, below which there is a higher degree of accuracy (Dunning & Perretta, 2002). However, more recent work has called this cut-off into question and has shown that there is great variability in response time and that the previously suggested cut-off point does not accurately distinguish between high and low accuracy (Sauer, Brewer, & Wells, 2008; Weber, Brewer, Wells, Semmler, & Keast, 2004). The present results suggest that shorter response times do have a postdictive value, at least below approximately 40m, with decisions made under five seconds being the most accurate. The implications are that, as with confidence, more research is needed to understand the effect that increased distance has on the relationship between response time and accuracy.

Practical Applications
The main take-home message of the current study is that both objective distance and age are crucial factors to take into consideration when assessing the benefit of conducting a line-up and that there are upper distance limits to eyewitness reliability. For practitioners in the field it is also important to emphasize that at 40 m diagnosticity was 50% lower compared with diagnosticity at 5 m. Moreover, as distance increased, diagnosticity tapered toward 1 so that by 100 m, no age group, using either line-up type, produced diagnosticity values higher than 1 (±0.5). Nevertheless, there were substantial differences between age groups, showing that older children (ages 12 to 17) and young adults (ages 18 to 44) had upper distance cut-offs that were roughly 10–20 m higher compared with young children (ages 6 to 11) and older adults (ages 45 to 77).
Importantly, the current results were obtained in pristine conditions (i.e., best practice methods), with optimal viewing conditions (i.e., 20-s viewing time, natural and optimal lighting, no distractions), and using an immediate line-up task. Therefore, the distance thresholds reported in this article are likely to be overestimates of thresholds in real life settings, where flawed line-up procedures, less optimal viewing conditions, and delayed identifications are much more common. For example, Felson and Poulsen (2003) estimated that approximately 50% of crimes take place after 8 p.m. (i.e., when lighting and visibility is low).
The current perceptual distance thresholds should be interpreted as the maximum thresholds possible in the best possible conditions. When an actual crime takes place, there are often other factors present, such as stress (Deffenbacher, Bornstein, Penrod, & McGorty, 2004) and weapon focus (Erickson, Lampinen, & Leding, 2014; Fawcett, Russell, Peace, & Christie, 2011), that make it more likely that correct identifications are already improbable at shorter distances. Additionally, it is known that (facial) memory is imperfect, susceptible to distortion, and decays with time (Deffenbacher, Bornstein, McGorty, & Penrod, 2008; Lacy & Stark, 2013). It can, therefore, be assumed that delayed identifications will produce less accurate results compared with the present findings. In addition to this, Lindsay and colleagues (2008) found that delayed responses gave rise to a significantly higher number of “not sure” and incorrect rejections compared with an immediate identification task.

Limitations
The current data collection is not without its limitations. One limitation is that we used very similar targets and that more variation in appearance, ethnicity, or age would have been informative. The setting was in a science center, so although the results are highly generalizable, there might be some variation in choosing and rejection rates in comparison with an actual or mock police setup, where the consequences of choosing or not choosing are more critical. The design was a prospective task where participants knew beforehand that they would be witnessing four targets and conducting four identifications. On the one hand this increases the significance of our results because this is an additional optimal condition factor, but on the other hand it would be very informative to investigate the effect of distance on an uninformed and a retrospective line-up task. Instructing the participants as to how many images would be shown in the sequential line-ups, also slightly hampered the interpretation of the results. Despite these limitations, the present research represents a substantial improvement on past research where less ecologically valid paradigms have been used.

Emotional support hinges on attributions of emotion control: People are more inclined to react supportively when they judge that the target individual cannot regulate their own emotions

Cusimano, Corey, "Attributions Of Mental State Control: Causes And Consequences" (2019). PhD Thesis, Publicly Accessible Penn Dissertations, 3524. https://repository.upenn.edu/edissertations/3524

Abstract: A popular thesis in psychology holds that ordinary people judge others’ mental states to be uncontrollable, unintentional, or otherwise involuntary. The present research challenges this thesis and documents how attributions of mental state control affect social decision making, predict policy preferences, and fuel conflict in close relationships. In Chapter 1, I show that lay people by-and-large attribute intentional control to others over their mental states. Additionally, I provide causal evidence that these attributions of control predict judgments of responsibility as well as decisions to confront and reprimand someone for having an objectionable attitude. By overturning a common misconception about how people evaluate mental states, these findings help resolve a long-standing debate about the lay concept of moral responsibility. In Chapter 2, I extend these findings to interpersonal emotion regulation in order to predict how observers react to close others who experience stress, anxiety, or distress. Across six studies, I show that people’s emotional support hinges on attributions of emotion control: People are more inclined to react supportively when they judge that the target individual cannot regulate their own emotions, but react unsupportively, sometimes evincing an intention to make others feel bad for their emotions, when they judge that those others can regulate their negative emotion away themselves. People evaluate others’ emotion control based on assessments of their own emotion regulation capacity, how readily reappraised the target’s emotion is, and how rational the target is. Finally, I show that judgments of emotion control predict self-reported supportive thoughts and behaviors in close relationships as well as preferences for university policies addressing microaggressions. Lastly, in Chapter 3, I show that people believe that others have more control over their beliefs than they themselves do. This discrepancy arises because, even though people conceptualize beliefs as controllable, they tend to experience the beliefs they hold as outside their control. When reasoning about others, people fail to generalize this experience to others and instead rely on their conceptualization of belief as controllable. In light of Chapters 1 and 2, I discuss how this discrepancy may explain why ideological disagreements are so difficult to resolve.


Limitations and Future Directions

Subjects in our studies were exclusively recruited through Amazon’s Mechanical
Turk. Although samples recruited from AMT are more representative of the U.S. than
typical university student samples, individuals on AMT tend to be less religious,
wealthier, and better educated than the average person in the United States (Paolacci &
Gabriele, 2014). Additionally, our entire sample consisted of people living in the United
States who, like other so-called WEIRD populations, are wealthier and better educated
than most people in the world, and are predominately Christian (Heinrich, Heine &
Norenzayan, 2010). Cross cultural work has revealed striking differences in how different
groups think about individuals’ agency. Of particular note, individuals in some non-U.S.
cultures appear to attribute less agency to individuals than do individuals in the United
States (e.g., Iyengar and Lepper, 1999; Kitayama et al., 2004; Miller, Das, &
Chakravarthy, 2011; Morris, Nisbett & Peng, 1995; Savani et al., 2010; Specktor et al.,
2004). For instance, compared to children in the United States, Nepalese children are
more inclined to view some behaviors as constrained by social rules and therefore outside
of their control, with this gap widening with age (Chernyak et al., 2013). In a similar
vein, Indian adults appear to be less likely than U.S. adults to construe everyday
behaviors as choices (Savani et al., 2010). Of clearest relevance to the present studies,
some work suggests that Christians tend to attribute more control to others over deviant
mental states (e.g., consciously entertaining thoughts of having an affair) than do Jews,
thus showing evidence for cultural moderation with respect to mental states in particular
(Cohen & Rozin, 2001). In light of this sort of evidence, we should not automatically
assume that the results from our studies will replicate across different cultural or religious
contexts.
Although we are uncertain as to whether our findings will generalize to all
cultures, our findings do suggest an important direction for cross-cultural work.
Specifically, future work measuring attributions of belief control should distinguish
between lay theories of belief control and the introspective-experience of belief control.
One virtue of measuring both is that we may expect different amounts of variation
between these two measures of control across cultures. For instance, assuming that
beliefs are indeed uncontrollable to a significant degree (see above), we should expect
that the felt-experience of low control will vary little from culture to culture. By contrast,
the lay theory of belief, which may be influenced by highly variable norms (e.g., religious
norms, Cohen & Rozin, 2001), or folk theories of agency (see paragraph above), may be
more likely to vary across cultures. For this reason, we speculate that self-other
differences in belief control are most likely to arise in cultures where the lay theory of
belief posits high control, as it is in these cultures where this lay theory will most likely
diverge from the felt-experience of belief.
Another limitation in our studies regards the limited range of beliefs that we
sampled. The beliefs in Studies 3.1-3.3 were highly abstract, complex, or value-laden
(e.g., belief in God, the correct policy for genetically modified foods, the wrongness of
not returning money to its rightful owner). We addressed this in Studies 3.4-3.5 by using
beliefs that subjects themselves provided – specifically, the first beliefs that came to
mind. This yielded a considerably wider sampling of belief contents (see Table 3.2 for a
list of examples). Yet, it still leaves open the question of how people reason about their
own control relative to that of others for very simple, concrete beliefs (e.g., “there is a
two thirds chance of pulling a marble out of the bucket,” “there is a quarter in my
pocket,” “it is raining”). We are ambivalent about whether to expect the same
discrepancy in cases such as these. It may be that the self-other difference is attenuated or
eliminated given that the relevant constraints on belief change are far more apparent for
beliefs of this sort. Continuing to delimit the bounds of the self-other discrepancy remains
a valuable goal for future research.
Finally, research should investigate whether, and when, self-other differences in
attributions of belief control extend to other mental states. Although the present paper
focuses only on the constraints on belief change, it may be that other mental states,
including desires, evaluative attitudes, and emotions, are subject to similar constraints. If
they are, then we might expect similar self-other discrepancies in perceived control –
particularly in light of past work showing that people generally attribute high control to
others over many mental states (Cusimano & Goodwin, in press). Indeed, there is
already one reason to expect the self-other discrepancy to extend to other mental states,
namely, that a person’s beliefs often play a pivotal role in determining his or her other
mental states. For instance, if someone is depressed because she believes she will not
recover from a severe illness, an observer may think she is more capable of cheering up
than she herself does, precisely because the observer judges her as more able to change
her belief about her prognosis than she does. However, whether such self-other
differences do in fact extend to other mental states awaits empirical testing.

Friday, December 6, 2019

Homo Politicus Was Born This Way: How Understanding the Biology of Political Belief Promotes Depolarization

Homo Politicus Was Born This Way: How Understanding the Biology of Political Belief Promotes Depolarization. Alexander Severson, Boise State University. https://static1.squarespace.com/static/5aaee6274eddec9e7a191db5/t/5db34cf7896fba3217a91b4b/1572031736718/Severson+%282019%29.pdf

Abstract: Most individuals perceive ideological beliefs as being freely chosen. Recent research in genopolitics and neuroscience, however, suggests that this conviction is partially unwarranted given that biological and genetic factors explain more variance in political attitudes than choice and environmental factors. Thus, it is worth exploring whether exposure to this research on the biological and genetic basis of political attitudes might influence levels of affective polarization because such exposure might reduce the perceived moral culpability of partisan outgroups
for the endorsement of oppositional beliefs. In this paper, I employ an online survey experiment on Amazon Mechanical Turk (N = 487) to assess whether exposure to research on the genetic and biological etiology of political attitudes influences warmth toward partisan outgroups and preferences over political compromise. I present evidence that nontrivial numbers of participants in the treatment group reject the underlying science and do not update their genetic trait attributions for political attitudes. However, I also find that when the treatment is successful at increasing biological and genetic trait attributions, exposure to this research depolarizes strong-identifying partisans. Moreover, as partisans increasingly endorse biological and genetic trait attributions for political attitudes, they increasingly hold favorable attitudes toward political outgroups. These patterns suggest a potentially profitable inroad for political polarization interventions going forward.

Keywords: polarization; biopolitics; ideology; trait attributions; survey experiment

Exerpts from the introduction:

On June 1, 2019, Democratic primary candidate Andrew Yang tweeted, “According to
twins [sic] studies between one-third and one-half of political alignment is linked to genetics;
that is most of us are born somewhat wired to be liberal or conservative. If this is the case
we need to build bridges as much as possible. It’s not just info or culture” (Yang 2019).
Yang’s tweet is notable as it suggests that one potential strategy to reduce growing partisan
antipathy (Iyengar et al., 2012; Kalmoe and Mason 2019) is to raise public awareness of recent
research in political science which demonstrates that a sizable proportion of individual-level
variation in political attitudes can be explained by biological and genetic factors (Dawes and
Fowler 2008; Hatemi and McDermott 2012). The unstated assumption of this argument is
that it is difficult to hold members of political outgroups responsible for the endorsement
of oppositional political beliefs when variation in such beliefs is best predicted by ascriptive
factors over which individuals have no control. Thus, in this view, awareness of the biological
substrates of political attitudes and of the minimized role of personal choice in generating
those attitudes should increase political tolerance toward partisan outgroups as “born that
way”-style explanations partially absolve members of partisan outgroups of the perceived
evilness of their belief systems (Snead 2011; Schneider, Smith, and Hibbing 2018).

However, it is also imaginable that exposure to information on the biological and genetic sources of political attitudes could further animate partisan tensions. Instead of this
information being used to exculpate members of political outgroups of the perceived offense
of their beliefs, exposure to this information might cause individuals to view the partisan
gulf as elementally unbridgeable. In this perspective, exposure to the degree of determinism
implied by biological models of political attitudes could reduce perceptions that members of
political outgroups are capable of opinion-change. Thus, if the political attitudes of partisan outgroups are viewed as increasingly resistant to change given the biological forces which
underlie them, then it follows that partisans might increasingly disengage from meaningful
social interactions with those across the aisle and come to devalue having conversations with
their outpartisan counterparts. Moreover, belief in the relative fixity of the political attitudes
of partisan outgroups could potentially translate into the adoption of more exaggerated and
essentialist views of the other (Haslam and Whelan 2008).


Conclusion

To summarize, in this paper, I used an online survey experiment to assess whether
exposure to recent scientific findings on the neurobiology and heritability of political belief
influenced affective polarization and preferences over compromise. Theoretically, a priori, it
was unclear whether such a strategy would increase or decrease levels of affective partisan
polarization. On the one hand, a subset of researchers in philosophy and moral psychology
have found that individuals tend to be more forgiving when they perceive that individuals
have less control over their decisions (Young 2009; Baumeister and Brewer 2012; Shariff et al.
2012). Conversely, other researchers in social psychology have found that individuals become
more antisocial when their belief in free-will and choice is undermined (Vohs and Schooler
2008; Baumeister, Masicampo, and DeWall 2009; MacKenzie, Vohs, and Baumeister 2014).
Thus, one of the goals of the present research was to provide a preliminary test of these two
divergent theoretical predictions to assess which, if either, held in the context of the debate
about the degree of determinism of political belief.
In this paper, I present evidence, consistent with recent work by Schneider, Smith, and
Hibbing (2018) and Willoughby et al. (2019), that most people view ideological beliefs and
partisanship as being largely determined by personal choice and to a lesser degree by socialization. Individuals are either unaware of or are psychologically-resistant to the idea that political
beliefs are even partially the byproduct of biological and genetic processes. Further, although
the experimental manipulation increased beliefs that ideology is biologically-determined, the
manipulation was not uniformly effective. However, among those who responded to the manipulation, affective polarization decreased in a rather pronounced fashion, particularly among
strongly-identifying partisans. Moreover, across both conditions, increased endorsement of biological and trait attributions correlated positively with the endorsement of warmer attitudes
toward political outgroups. Finally, my study demonstrated that exposure to such a frame
does not appreciably shift attitudes toward political compromise or whether participants felt
it was important to have ideologically-diverse discussant partners.
However, it is worth noting a few limitations to the present paper which suggest promising avenues for future research. First, the current design cannot rule out the possibility that
any narrative which outsources responsibility for political beliefs to an external locus may
promote depolarization. To this end, future studies should contrast the strength of depolarization effects between frames which emphasize the underlying biological science of ideology
against frames which emphasize the role of socialization factors, frames which would similarly
imply that individuals are not fully-responsible for their own political beliefs. Secondly, in
the present study, I did not directly measure perceptions of the moral culpability or blameworthiness of political outgroups. Future work should investigate whether exposure to frames
which minimize the role of personal choice in the construction of political belief, in turn, alter
perceptions of the moral responsibility of endorsing specific political beliefs. Relatedly, future
work might also explore whether different components of political beliefs (e.g., support for
policies; support for candidates) are perceived as more intentional than others. Third, the
present study made use of a convenience sample conducted on Amazon Mechanical Turk.
While previous work suggests that the use of online convenience samples can recover valid
treatment effect estimates (Mullinix et al. 2015; Coppock 2019), future work could replicate the present findings using a more nationally-representative sample. Finally, although my
results suggest that depolarization interventions which exclusively emphasize the biological
science of ideological belief alone are not likely to engender sweeping depolarizing effects, they
do suggest, perhaps hopefully, that exposure to this type of research neither increases the
partisan affective gulf nor harms the likelihood of cross-party interactions. Thus, concerns
about the potential negative or antisocial effects of encountering such frames, at least in the
context of political belief, may be overstated.
Given that most of us reflexively think that we choose and are responsible for our own
political beliefs, it can be admittedly troubling to confront the possibility that we may not
exercise as much control over these beliefs as our intuition seems to suggest. We proudly
weaponize bumper stickers and traffic in taunt-infused comment-thread witticisms in the war
against the political “other”, all in part because we believe that the other side chooses to
believe what they believe freely and unencumbered. The root of our frustrations, of increased
political violence and partisan discrimination (Lelkes and Westwood 2017), seems to hinge on
this often unquestioned assumption that we exercise agency over our belief systems. However, the emergent neurobiological and genetic science of political belief suggests that this
assumption is misguided and in lieu of accented partisan violence and taunting, potentially
dangerous. It seems odd, albeit perhaps quintessentially human, to believe that our political
beliefs are somehow completely separable from the biological and genetic programming which
circumscribes all of our cognitions. However, in disavowing this belief and accepting that our
own ideologies are partially the byproduct of biological and genetic processes over which we
have no control, we may end up promoting a more tolerant and kinder civil society.

Impact of spatial proximity to a concentration camp 1933-1945 in the 2013 & 2017 German federal elections: Such proximity is associated with a higher vote share of radical-right parties

The long-term impact of the location of concentration camps on radical-right voting in Germany. Julian M. Hoerner, Alexander Jaax, Toni Rodon. December 5, 2019. https://doi.org/10.1177/2053168019891376

Abstract: Of all atrocities committed by state actors in 20th century Europe, the systematic killings by Nazi Germany were arguably the most severe and best documented. While several studies have investigated the impact of the presence of concentration camps on surrounding communities in Germany and the occupied territories in terms of redistribution of wealth and property, the local-level impact on voting behaviour has not yet been explored. We investigated the impact of spatial proximity to a concentration camp between 1933 and 1945 on the likelihood of voting for far-right parties in the 2013 and 2017 federal elections. We find that proximity to a former concentration camp is associated with a higher vote share of such parties. A potential explanation for this finding could be a ‘memory satiation effect’, according to which voters who live in close proximity to former camps and are more frequently confronted with the past are more receptive to revisionist historical accounts questioning the centrality of the Holocaust in the German culture of remembrance.

Keywords: Voting behaviour, long-term effects, far right, Germany, mass violence, culture of memory


Of the salient political conflicts that reshaped political competition at the beginning of the 21st century, many are rooted in historical events that lie decades and sometimes centuries in the past. In many cases, these conflicts pit the right to remember past wrongs of territorial or ethnic communities that have been historically marginalized, discriminated and prosecuted against the desire of members of the majority to maintain a particular narrative of a country’s history. However, often these conflicts about how to remember the past also divide society along partisan lines. A substantial body of literature demonstrates that historical events and institutions tend to cast a shadow long after they have ceased to exist, particularly if they involved conflict and violence (Acemoglu et al., 2011Charnysh and Finkel, 2017).
In this context, we investigated the long-term political impact of the most extreme case of state mass violence – the Holocaust. While any intellectual engagement with the Holocaust should have the victims at its centre, it is also pertinent to analyse its impact on political outcomes in the country responsible for the crimes. We analysed the impact of one of the most visible and prominent symbols of the crimes conducted under the National Socialist dictatorship in Germany: former concentration camps. In particular, we were interested in the impact of living in spatial proximity to a former camp on voting for a far-right party (FRP). Our reasons for choosing this empirical design are twofold: first, physical monuments can be considered a particularly prominent and contentious object of memory, as their presence is visible to everyone in the area and permanent in time (Wüstenberg, 2017). Second, we believe that the impact of the Holocaust on electoral behaviour in Germany deserves particular attention. While there has long been a consensus on German responsibility and the centrality of the Holocaust for German history, this view is now challenged. We thus believe that the German case can tell us a lot about the dynamics of the long-term impact of mass violence and its interaction with political competition in shaping collective memory.
Perhaps surprisingly, we found that the vote share of far-right parties increased as we moved closer to a former concentration camp. Arguably, being repeatedly reminded of an in-group transgression led some voters to be receptive to a revisionist historical narrative that negates the centrality of German guilt. We thus found (indirect) evidence for a ‘political satiation’ effect, in which repeated exposure to cues of in-group responsibility led to higher receptiveness for a revisionist narrative rather than a ‘resilience effect’, in which being reminded of past crimes decreases the likelihood of voting for the far right.
Until now, the largest and most systematic act of state-induced mass violence, the Holocaust, has received rather limited attention by political scientists in terms of its long-term effect on political attitudes and behaviour. One of the few scholarly works focusing specifically on the long-term impact of mass killings in the context of the Holocaust is a recent article by Charnysh and Finkel (2017). The authors analysed the impact on the surrounding communities of the Nazi death camp Treblinka, in Poland, where Germans murdered nearly a million Jews. They show that communities located closer to the camp experienced a property boom, which eventually led these communities to show higher support for an anti-Semitic party, the League of Polish Families. We complement their paper by asking a related question, namely how the crimes of the Nazi dictatorship have impacted on voting behaviour in Germany, the country of the perpetrators.
In so doing, we also hope to contribute to the general literature on far-right voting. This now extensive literature has identified factors such as political opportunity structures (Arzheimer and Carter, 2006), economic grievances such as unemployment (Golder, 2003) and anti-immigrant sentiments (Van der Brug et al., 2005) as determinants of the electoral success of FRPs, even though the interaction between these different factors is complex and multidimensional (Golder, 2016). While there are some studies that focus on the historical antecedents of the success of FRPs, as mentioned above, we aim to provide an original contribution to the literature on far-right voting by focusing on the role of the spatial location of sites of mass violence and the politicization of a country’s culture of memory.
Remembering the Holocaust, the systematic killing of more than 6 million Jewish people and other minorities, has long been considered a defining feature of the raison d’état of the Federal Republic of Germany. The process of remembrance went through several phases. While the initial post-war period was characterized by denial and unwillingness to give a voice to the victims, the student-led revolts of the late 1960s and centre-left governments of the 1970s brought about the preconditions for an active questioning of the past and critical engagement with German guilt (Wüstenberg, 2017: 33). As Art claims, this contestation has given rise to two ‘frames’ of German history: a ‘contrition frame’, focusing on the victims and the responsibility resulting from German guilt, and a ‘normalization frame’, promoted by the right, arguing that discussions of German guilt had to end to allow the country to develop a ‘normal’ national identity (Art, 2005: 10).
Facilities previously serving as concentration camps can be considered one of the most prominent and powerful places of memory relating to the Holocaust. Memorials, places of remembrance or lieux de mémoire are arguably distinct from other forms of memory such as public debates or events in that they are permanent fixtures with which every resident or visitor of the area is confronted (Wüstenberg, 2017: 11). This high visibility makes memorials particularly prone to be subjects of societal mobilization and contestation (Wüstenberg, 2017: 11). We thus hypothesized that spatial proximity to such a lieu de mémoire would have a lasting impact on vote choice in the German context.
We had two distinct hypotheses about the direction of the relationship between living in spatial proximity to a former concentration camp and voting for an FRP. Our first hypothesis was that voters living in close proximity to a former concentration camp would be less likely to vote for such a party. We refer to this as the ‘resilience hypothesis’. In terms of a contemporaneous effect, being constantly reminded of the consequences and extent of German crimes might make voters resilient to any attempts of minimization of German crimes or a ‘normalization frame’. We also believed that there was an additional and related historical mechanism driving such an effect. After the liberation of concentration camps in 1945, the allied powers to varying degrees engaged in denazification measures, mostly carried out at the local level. This experience could have become a shared memory passed down through generations, leading to an aversion to far-right politics and any attempts to qualify or minimize the crimes.
However, revelations about in-group transgressions might also prompt defensive responses and minimization of in-group complicity (Branscombe et al., 2007). We term this the ‘satiation hypothesis’. Satiation as a psychological concept refers to the phenomenon that repeated exposure to a semantic stimulus – in this context embodied by former camps as places of memory – weakens the reaction and receptiveness of a subject to such assertions. Could reactions of defensiveness and minimization of in-group complicity be especially pronounced for those who have received a particularly strong ‘treatment’ of remembrance culture by living close to a former camp? In any case, we would expect both mechanisms to be especially pronounced in – or indeed even limited to – West Germany, as long-ranging debates on how the Holocaust should be remembered were restricted to the Federal Republic of Germany. The German Democratic Republic (GDR) considered itself anti-fascist and thus by definition not responsible for the crimes of the National Socialist dictatorship (Art, 2005: 43). In the next section, we describe our research design to test the resilience and satiation hypotheses empirically.

Happiness is negatively associated with Belief in Luck, but positively associated with Belief in Personal Luckiness

Do the happy-go-lucky? Edmund R. Thompson, Gerard P.  Prendergast, Gerard H. Dericks. Current Psychology, December 6 2019. https://link.springer.com/article/10.1007/s12144-019-00554-w

Abstract: While popular aphorisms and etymologies across diverse languages suggest an intrinsic association between happiness and luck beliefs, empirically testing the existence of any potential link has historically been constrained by varying and unclear conceptualizations of luck beliefs and by their sub-optimally valid measurement. Employing the Thompson and Prendergast Personality and Individual Differences, 54(4), 501-506, (2013) bi-dimensional refinement of trait luck beliefs into, respectively, ‘Belief in Luck’ and ‘Belief in Personal Luckiness’, we explore the relationship between luck beliefs and a range of trait happiness measures. Our analyses (N = 844) find broadly that happiness is negatively associated with Belief in Luck, but positively associated with Belief in Personal Luckiness, although results differ somewhat depending on which measure of happiness is used. We further explore interrelationships between luck beliefs and the five-factor model of personality, finding this latter fully accounts for Belief in Luck’s negative association with happiness, with additional analyses indicating this is wholly attributable to Neuroticism alone: Neuroticism appears to be a possible mediator of Belief in Luck’s negative association with happiness. We additionally find that the five-factor model only partially attenuates Belief in Personal Luckiness’ positive association with happiness, suggesting that Belief in Personal Luckiness may be either a discrete facet of trait happiness or a personality trait in and of itself.

Keywords: Happiness Belief in luck Belief in personal luckiness Five-factor personality model Irrational beliefs

Belief in Luck and Happiness

The Belief in Luck dimension of Thompson and Prendergast’s () bidimensional model distinguishes between, on one hand, luck believers who irrationally consider luck is a deterministic and external phenomenon with agentic qualities capable of influencing outcomes and, on the other, luck disbelievers who consider luck to be merely the product of purely stochastic and uninfluenceable chance. Thompson and Prendergast () found belief or disbelief in luck is not binary, but rather exists on a unidimensional continuum, substantiating Maltby et al.’s () suspicion that the apparently discrete beliefs they found in, respectively, good and bad luck are the product of scoring artifacts rather than separate underlying constructs.
Research to date on Belief in Luck specifically has been scant and limited to inter-item correlations without controls for possible confounding variables. Nonetheless, such correlations hint that believing in luck may be negatively correlated with affect-related measures. For example, Maltby et al. () find belief in luck correlates positively with a range of irrational beliefs and negative traits such as awfulizing and problem avoidance, and Thompson and Prendergast () find it correlates negatively with well-being. Considerable research has demonstrated more generally that irrational beliefs are linked to negative affect (Bridges and Harnish ; David and Cramer ; David et al. ; Kassinove and Eckhardt ; Rohsenow and Smith ; Smith ). Maltby et al. () also find that belief in luck correlates negatively with internal locus of control, while Thompson and Prendergast () find it correlates positively with the powerful others dimension of Levenson’s (1981) locus of control measure. External locus of control, with which belief in luck is commensurate, has long been empirically associated with negative affect (Abramowitz ; Buddelmeyer and Powdthavee 2016; Houston ; Johnson and Sarason ; Yu and Fan 2016). Taken together, these findings are consonant with Maltby et al.’s () suggestion that belief in luck is a facet of irrationality linked to low personal agency, maladaptivity and the negative affect found to be linked with these. Hence it would seem reasonable to suggest that Belief in Luck may be negatively linked with positive dimensions of affect:
  • H1. Belief in Luck will be negatively associated with happiness.

Belief in Personal Luckiness and Happiness

Thompson and Prendergast () find both luck believers and disbelievers alike make a subconscious semantic differentiation between luck conceived as a deterministic external phenomenon affecting future events, and luck as a descriptive metaphor for how fortunately past events and current circumstances are believed to have turned out for them personally. Like Maltby et al. (), Thompson and Prendergast () find belief in being personally lucky is discrete from and uncorrelated with belief in luck as a deterministic phenomenon. Maltby et al. () find belief in being personally lucky correlates negatively with discomfort-anxiety and with awfulizing, but positively with hope, self-acceptance, positive relations, environmental mastery, and other personality traits associated with positive affect. Similar positive associations between belief in being personally lucky and favorable affective outcomes are reported by Day and Maltby (), André (), and Jiang et al. (). Further mirroring some of Maltby et al.’s () findings, Thompson and Prendergast’s () efforts to establish the nomological validity of the Belief in Personal Luckiness construct find it correlates positively with some affect-related measures, and they speculate it might perhaps constitute a facet of overall well-being. Hence:
  • H2. Belief in Personal Luckiness will be positively associated with happiness.

Discussion


Luck Beliefs and Happiness

Our finding that Belief in Luck is broadly negatively associated with happiness is consonant with Maltby et al.’s () suggestion that Belief in Luck is perhaps a maladaptive trait. Consequently, any notion of happy-go-lucky individuals cheerfully trusting to luck would seem to be inaccurate, at least if those individuals believe in luck as a non-random, deterministic and external phenomenon. Indeed, insofar as such individuals may irrationally trust to luck as a deterministic phenomenon, they would seem to do so unhappily not happily.
However, our finding that Belief in Personal Luckiness is positively associated with happiness tends to suggest the happy may indeed go lucky, in the sense that happiness and believing oneself to be lucky are associated. Of course, the relatively large size of associations we find here suggests that Belief in Personal Luckiness might in fact be a facet of an overall happiness construct. A possible implication of this is that Belief in Personal Luckiness’ association with any particular happiness measure could, perhaps, be fully accounted for by controlling other happiness measures. To investigate this possibility, we separately regressed each of the four measures of happiness on Belief in Personal Luckiness while simultaneously controlling for the three remaining happiness measures in each respective case, to see if Belief in Personal Luckiness maintained a significant beta. Doing so we found Belief in Personal Luckiness is not associated with either Positive or Negative Affect. However, Belief in Personal Luckiness is still significantly associated with Happiness (β = .09, p < .01; ΔR2 = .05, p < .01), and Optimism (β = .09, p < .01; ΔR2 = .06, p < .01). This would seem to support, partly at least, that Belief in Personal Luckiness may represent either a facet of happiness or a discrete personality trait positively associated with happiness.

Luck Beliefs, Five-Factor Model and Happiness

Neither Belief in Luck nor Belief in Personal Luckiness appear from our findings to be mediators of the association between the five-factor model of personality and happiness.
Indeed, our analyses, in part, suggest the contrary: that Neuroticism fully mediates Belief in Luck’s association with happiness. This does not imply that Belief in Luck necessarily ‘causes’ Neuroticism, but it is reasonable to speculate that the underlying irrationality and the lack of both agency and self-determination that would seem to underpin Belief in Luck also to some extent underpin or are facets of Neuroticism. This would be consonant with previous research demonstrating significant relationships between Neuroticism and locus of control (Judge et al. ; Morelli et al. ), self-determination (Elliot and Sheldon ; Elliot et al. ), and irrational beliefs (Davies ; Sava ).
We do not find evidence for any component of the five-factor personality model mediating Belief in Personal Luckiness’ association with happiness, nor do we find evidence of any pronounced confounding effects between Belief in Personal Luckiness and the five-factor model and their respective associations with happiness. Hence, considering Belief in Personal Luckiness to be a trait discrete from fundamental personality models would on the basis of our findings not seem unreasonable. Nor would it seem unreasonable to suggest that Belief in Personal Luckiness might potentially be either a facet of happiness or a personality trait discrete from but associated with not just the five-factor model but also happiness.
Our conclusions here certainly seem to apply with greatest saliency to the most direct measure of trait happiness we used, Lyubomirsky and Lepper’s () Subjective Happiness Scale, and to a lesser extent to Optimism, a measure closely allied with happiness (Brebner et al. ; Chaplin et al. ; Furnham and Cheng ; Salary and Shaieri ). However, while the pattern of relationships is broadly similar for both Positive Affect and Negative Affect, the effect sizes are smaller and either less significant or insignificant. This would suggest that, while both Positive Affect and Negative Affect are often used as proxies for happiness, they might perhaps best be regarded as constructs related to, rather than directly synonyms of, happiness.

Limitations and Further Research

While our research sheds new empirical light on the relationships between luck beliefs, happiness and the five-factor personality model, a number of limitations need to be kept in mind. As with any findings based on cross-sectional data, interpreting our findings in terms of directions of causality would be imprudent and, of course, constrained by the assumption of our research that happiness, luck beliefs, and the five-factor model are all personality traits rather than individual difference states. Personality traits may, of course, be associated in systematic patterns, but the very notion of traits being essentially innate and non-manipulable, unlike individual difference states, intrinsically excludes the possibility that one might be ‘caused’ by another. To take the five-factor model as an example, its five personality traits have a well-established systematic pattern of associations, but it would be implausible to suggest any of the five in any mechanistic sense causes another: they exist together discretely, with none generally argued to be a facet or sub-component or effect of the other. This said, an area for further research might be to examine the effects of trait luck beliefs on state affect that varies temporally and is manipulable, so hence susceptible to theorization and testing using either longitudinal or experimental data.
A further limitation to our study relates to necessary caution in generalizing its findings in view of the deliberately homogeneous population we used. Further research to replicate our findings amongst heterogeneous populations in terms of nationality, occupation, and socio-economic status would be useful as it has been shown across multiple domains that psychological characteristics and their relationships may vary accordingly (Becker et al. ; Boyce and Wood ; John and Thomsen ; Rawwas ; Thompson and Phua , 2005; Winkelmann and Winkelmann ). Furthermore, although each of the happiness and luck measures we employ have been individually validated across internationally diverse samples including Hong Kong Chinese, underlying conceptions of both are known to exhibit nuanced cultural differences (Lu and Gilmour ; Lu and Shih ; Raphals ; Sommer ), which conceivably could modify measured associations between them.
We also note that our study, in common with most research, has limitations due to the limited selection of measures with which we operationalized our investigation. We selected just four measures commonly used in studies of trait happiness, but several others exist, although some, like the Satisfaction with Life Scale (Diener et al. ) can arguably be regarded as assessing state rather than trait happiness. We also selected a five-factor model measure that, while not as potentially prone to poor measurement validity as extremely short measures, is sufficiently brief as to exclude examination of possible relationships of each of the big-five elements on a sub-component basis. Certainly given our findings in relation to Neuroticism, further research using multi-component measures of this dimension of the five-factor model might prove illuminating.
In addition, research examining possible mediation and moderation effects of cognate psychology constructs such as, for example, locus of control (Pannells and Claxton ; Verme ), illusion of control (Larson ; Erez et al. ), and gratitude (Sun and Kong ; Toussaint and Friedman ) might help further the understanding of relationships between luck beliefs, happiness, and the five-factor model.

Predicting the replicability of social science lab experiments

Altmejd A, Dreber A, Forsell E, Huber J, Imai T, Johannesson M, et al. (2019) Predicting the replicability of social science lab experiments. PLoS ONE 14(12): e0225826. Dec 5 2019. https://doi.org/10.1371/journal.pone.0225826

Abstract: We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman ρ of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists [1, 2]. The predictive power is validated in a pre-registered out of sample test of the outcome of [3], where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.

1 Introduction

Replication lies at the heart of the process by which science accumulates knowledge. The ability of other scientists to replicate an experiment or analysis demonstrates robustness, guards against false positives, puts an appropriate burden on scientists to make replication easy for others to do, and can expose the various “researcher degrees of freedom” like p-hacking or forking [420].
The most basic type of replication is “direct” replication, which strives to reproduce the creation or analysis of data using methods as close to those used in the original science as possible [21].
Direct replication is difficult and sometimes thankless. It requires the original scientists to be crystal clear about details of their scientific protocol, often demanding extra effort years later. Conducting a replication of other scientists’ work takes time and money, and often has less professional reward than original discovery.
Because direct replication requires scarce scientific resources, it is useful to have methods to evaluate which original findings are likely to replicate robustly or not. Moreover, implicit subjective judgments about replicability are made during many types of science evaluations. Replicability beliefs can be influential when giving advice to granting agencies and foundations on what research deserves funding, when reviewing articles which have been submitted to peer-reviewed journals, during hiring and promotion of colleagues, and in a wide range of informal “post-publication review” processes, whether at large international conferences or small kaffeeklatches.
The process of examining and possibly replicating research is long and complicated. For example, the publication of [22] resulted in a series of replications and subsequent replies [2326]. The original findings were scrutinized in a thorough and long process that yielded a better understanding of the results and their limitations. Many more published findings would benefit from such examination. The community is in dire need of tools that can make this work more efficient. Statcheck [27] is one such framework that can automatically identify statistical errors in finished papers. In the same vein, we present here a new tool to automatically evaluate the replicability of laboratory experiments in the social sciences.
There are many potential ways to assess whether results will replicate. We propose a simple, black-box, statistical approach, which is deliberately automated in order to require little subjective peer judgment and to minimize costs. This approach leverages the hard work of several recent multi-investigator teams who performed direct replications of experiments in psychology and economics [272829]. Based on these actual replications, we fit statistical models to predict replication and analyze which objective features of studies are associated with replicability.
We have 131 direct replications in our dataset. Each can be judged categorically by whether it replicated or not, by a pre-announced binary statistical criterion. The degree of replication can also be judged on a continuous numerical scale, by the size of the effect estimated in the replication compared to the size of the effect in the original study. As binary criterion, we call replications with significant (p ≤ 0.05) effects in the same direction as the original study successful. For the continuous measure, we study the ratio of effect sizes, standardized to correlation coefficients. Our method uses machine learning to predict outcomes and identify the characteristics of study-replication pairs that can best explain the observed replication results [3033].
We divide the objective features of the original experiment into two classes. The first contains the statistical design properties and outcomes: among these features we have sample size, the effect size and p-value originally measured, and whether a finding is an effect of one variable or an interaction between multiple variables. The second class is the descriptive aspects of the original study which go beyond statistics: these features include how often a published paper has been cited and the number and past success of authors, but also how subjects were compensated. Furthermore, since our model is designed to predict the outcome of specific replication attempts we also include similar properties about the replication that were known beforehand. We also include variables that characterize the difference between the original and replication experiments—such as whether they were conducted in the same country or used the same pool of subjects. See S1 Table for a complete list of variables, and S2 Table for summary statistics.
The statistical and descriptive features are objective. In addition, for a sample of 55 of the study-replication pairs we also have measures of subjective beliefs of peer scientists about how likely a replication attempt was to result in a categorical Yes/No replication, on a 0-100% scale, based on survey responses and prediction market prices [12]. Market participants in these studies predicted replication with an accuracy of 65.5% (assuming that market prices reflect replication probabilities [34] and using a decision threshold of 0.5).

Our proposed model should be seen as a proof-of-concept. It is fitted on an arguably too small data set with an indiscriminately selected feature set. Still, its performance is on par with the predictions of professionals, hinting at a promising future for the use of statistical tools in the evaluation of replicability.