Agency and self-other asymmetries in perceived bias and shortcomings: Replications of the Bias Blind Spot and extensions linking to free will beliefs. Prasad Chandrashekar et al. March 2019. DOI: 10.13140/RG.2.2.19878.16961
Description: Bias Blind Spot is the phenomenon that people tend to perceive themselves to be less susceptible to biases than others. In three pre-registered experiments with samples from Hong Kong and the United States (overall N = 969), we replicated two experiments (Study 1-Survey 2 and Study 2) from Pronin, Lin, and Ross (2002), the first published demonstration of the effect. Participants rated themselves lower than others in both susceptibility to biases (mini meta-analysis: dz = - 1.00) and personal shortcoming (mini meta-analysis: dz = - 0.34). The self-other asymmetry of susceptibility for biases was larger than that of personal shortcomings (dz = - 0.43). Thus, the replication findings provide strong empirical support for the bias blind spot phenomenon. Extending the replications, belief in free will was positively associated with the susceptibility to better than average effect, and also with a stronger self-other asymmetry in rating personal shortcomings.
---
The bias blind spot is the phenomenon that people tend to perceive themselves as less
biased than others in their judgements and behaviors (Pronin, 2007; Pronin et al., 2002) and
further tend perceive their peers as more subject to the bias blind spot than they are. Broadly,
people seem to be able to detect or infer biases in others but fail to do so about themselves
(Pronin, Gilovich, & Ross, 2004).
We had two main goals for the present investigation: (1) to conduct replications of the
bias blind spot effect, and (2) to examine extensions about the link between agency beliefs and
the bias blind spot. The proposed extension was meant to answer further calls for future research
to investigate possible “causes of people’s unwarranted faith in their own introspections?”
(Pronin, 2007, p. 41). We begin by introducing the literature on the bias blind spot and the chosen
target article for replication - Pronin et al. (2002). We then introduce agency beliefs and
hypothesize about the relationship between free will beliefs and the bias blind spot effect.
Bias blind spot and free will beliefs 5
Bias blind spot
People have access to their own private mental lives but not to that of others, and so to
overcome this information asymmetry people continuously aim to detect and infer other people's
internal psyche. This information asymmetry often results in unintended and unaware self-other
attributional asymmetries (Nisbett & Ross, 1980).
Many such attribution asymmetries have been documented over the years, such as the
widely discussed actor-observer bias (Jones & Nisbett, 1972), in that divergent responses of
others are perceived to reflect others’ stable personality dispositions (Jones, 1990). Individuals
also tend to believe and perceive their perceptions of the world are objective and accurate, coined
as "naïve realism" (Ross & Ward, 1995), easily detecting flaws in this assumption in others but
failing to detect such flaws in themselves. They may further assume that they self-reflect more
than others do, since when people evaluate their own behaviors and judgments, they base their
evaluations on introspection, whereas when evaluating others their assessments must rely on
others' behavior (Pronin, 2007; Pronin et al., 2004; Ross & Ward, 1995; Wilson & Dunn, 2004).
Finally, people tend to view themselves in a positive light or construe reality in a way that would
make for a more positive self-evaluations. Together, naïve realism, introspection illusion, and
self-enhancement motives, result in a bias blind spot, a self-other asymmetry in perceptions bias.
People further reject and persist in their biased perceptions even if being made aware of its
existence (Ehrlinger, Gilovich, & Ross, 2005; Pronin, 2007; Pronin & Kugler, 2007).
The chosen bias blind spot experiments
Pronin et al. (2002)’s work was the first demonstration of the bias blind spot. The article
has been influential with over 765 citations according to Google Scholar at the time of writing,
with theoretical developments and applications across several domains, such as judgment and
Bias blind spot and free will beliefs 6
decision making, behavioral economics, assessment, and interpersonal and intergroup conflict
(e.g., Pronin et al., 2004). Bias blind sport is argued to be a distinct meta-bias with clear
implications on judgment and behavior (Scopelliti et al., 2015). For example, subsequent
empirical research showed relevance of the effect to law in viewing and remembering criminal
events (Jones, Crozier, & Strange, 2018), with observations of the effect in children as young as
six years old (Hagá, Olson, & Garcia-Marques, 2018), and evidence for the persistence of bias
despite being shown how it affected previous decisions (Hansen, Gerbasi, Todorov, Kruse, &
Pronin, 2014). To the best of our knowledge, there have been no previous attempts for a direct
replication of the experiments reported in the paper.
The target article consisted of three experiments, and the current replications focused on
Survey 2 of Study 1 and Study 2. Findings in Pronin et al. (2002) and summarized in Table 1.
In their Study 1, participants separately rated their own and average Americans'
susceptibility to eight cognitive and motivational biases and three personal shortcoming biases.
Pronin et al. (2002) proposed that participants would demonstrate clear asymmetry in
susceptibility to biases but not to shortcoming items because people are likely to be aware of their
visible personal shortcomings than their invisible cognitive biases. In view of this, they proposed
and tested three hypotheses. The first hypothesis suggested that participant would rate themselves
as less susceptible than others on biases. The second hypothesis suggested that there should be no
difference between the rating of self-susceptibility and other’s susceptibility towards personal
shortcomings (null hypothesis). The third hypothesis combining the first two suggested that
asymmetries effects would, therefore, be larger for biases than for personal shortcoming items.
Bias blind spot and free will beliefs 7
In their Study 2, participants compared themselves to others on six personality
dimensions, three positive and three negative. Findings indicated that participants rated
themselves as higher on positive personality dimensions and lower on negative ones, compared to
others. Furthermore, a large majority (76 %) of participants who claimed better-than-average
status insisted on this status even when alerted to the possible bias.
Adjustments to original design
We attempted a close replication of the original study by Pronin et al. (2002) yet made
several needed adjustment. First, we administered all surveys via an online Qualtrics survey.
Second, the two studies from Pronin et al. (2002) chosen for replication included Stanford
University undergraduate students and were not paid for completion of the study. The current
replication effort of three studies included one undergraduate sample from a university in HK,
and two paid samples using Amazon Mechanical Turk (MTurk). Thirdly, Study 2 and Study 3
replications combined the original article' Studies 1 and 2 into an integrated design. Forth, we
went beyond the replication and added extensions to the original design to examine the link
between agency beliefs and self-other bias asymmetries.
Extensions - belief in free will as a predictor of the bias blind spot
We aimed to extend the replication study by considering individuals’ perceived agency as
a predictor of the bias blind spot - whether perceiver’s beliefs in free will predict self-other
asymmetries regarding biases and personal shortcomings. Belief in free will is the general belief
that human behavior is free from internal and external constraints across situations (Feldman,
2017; Monroe & Malle, 2014). Studies on folk understanding of free will found that people
normally associate free will with having choice and understand free will as the absence of
Bias blind spot and free will beliefs 8
internal and external constraints (Baumeister, 2008; Feldman, Baumeister, & Wong, 2014;
Monroe, Dillon, & Malle, 2014; Monroe & Malle, 2010; Vonasch, Baumeister, & Mele, 2018).
Belief in free will has been associated with a range of adaptive behavioral and
psychological outcomes such as academic and job performance (Feldman, Chandrashekar, &
Wong, 2016; Stillman et al., 2010), perseverance for long-term goals (Li, Zhao, Lin, Chen, &
Wang, 2018), self-control (Goto, Ishibashi, Kajimura, Oka, & Kusumi, 2018), expressions of
love (Boudesseul, Lantian, Cova, & Bègue, 2016), job satisfaction (Feldman, Farh, & Wong,
2018), cooperation (Protzko, Ouimette, & Schooler, 2016), and well-being and meaning in life
(Crescioni, Baumeister, Ainsworth, Ent, & Lambert, 2016; Moynihan, Igou, & van Tilburg,
2017).
Although the research outlined above largely emphasize the effects of believing in free
will on individuals’ self-regulatory behavior, some recent work suggests that free-will beliefs
affect fundamental social-cognitive processes that are implicated in the judgements of self and
others. For example, Genschow, Rigoni, and Brass (2017) found a positive relationship between
the strength of the belief in free will and the correspondence bias, i.e., tendency to endorse
dispositional explanations than situational explanations. People with a strong belief in free will
tend to view the actions of wrongdoers as the result of choices that are freely made and therefore
endorse harsher punishments (Clark et al., 2014; Martin, Rigoni, & Vohs, 2017). Similarly, free
will beliefs influence judgements about the extent to which individuals’ choices determine their
economic outcomes (Mercier et al., 2018), and more broadly form the basis of a capacity for
change, not only for others but also for the self (Feldman, Wong, & Baumeister, 2016).
Bias blind spot and free will beliefs 9
As discussed above, several findings are suggestive of the possible relationship between
free will beliefs and the bias blind spot. Personal shortcomings can be viewed as a form of
internal constraints of free will (e.g., fear of public speaking, planning fallacy, and
procrastination). Those with stronger free will beliefs are likely to perceive own behaviors as the
outcome of their own choices (Feldman et al., 2014) and have more control over decisions in life
(Rigoni, Kühn, Gaudino, Sartori, & Brass, 2012; Stillman, Baumeister, & Mele, 2011). In
summary, free will beliefs reflect a view of the self as an active agent with freedom to choose
actions and pursue goals, and therefore, should encourage the view of the self as devoid of
internal constraints that may impose limitations on self’s ability to make choices. Supporting this
view, people with a history of addiction to alcohol, tobacco, and other drugs rate themselves
lower on belief in free will (Vonasch, Clark, Lau, Vohs, & Baumeister, 2017). This is suggestive
of a negative relationship between free will beliefs and perceived personal shortcomings.
Furthermore, the work on the association between free will beliefs and correspondence bias
suggest that free-will beliefs would be associated with perceiving others as more affected by their
personal shortcomings. Combined, the two are suggestive of a positive relationship between free
will beliefs and bias blind spot regarding personal shortcoming. We initially made no pre
registered predictions regarding perceived biases although similar arguments can be made.
Exploratory Hypotheses
We did not make priory predictions regarding associations between free will beliefs and
susceptibility to biases of the self and others. Belief in free will at its core is experienced as an
increased sense of agency, therefore believers perceive their own behaviors as generated by
themselves, rather than external forces (Rigoni et al., 2012). They therefore view their own
judgements and behaviors as lacking in biases, and by extension, may exhibit larger self-other
Bias blind spot and free will beliefs 10
asymmetry in perceived bias. Extending on this argument toward susceptibility to better than
average effect, the belief in free will likely to be negatively associated with ratings of negative
personality dimensions of self in comparison to others, and positively associated with positive
personality dimensions of self in comparison to others.
Overview of empirical studies
There was a two weeks gap between the two data collections of Studies 1 and 2. In each of
the replication studies, we first pre-registered the experiment on the Open Science Framework
(OSF) and data collection was launched later that week. Pre-registrations, power analyses, and all
materials used in these experiments are available in the supplementary materials. OSF pre
registration review links: Study 1a -
https://osf.io/fwthk/?view_only=744526890b674a9fbec72acc37a79c86 ; Study 1b -
https://osf.io/qmcrn/?view_only=4820ad08078b4b5a860b08c0234c7229 ; Study 2 -
https://osf.io/fm48b/?view_only=60e6cf6df39147e0af1b28f4e7da0d4c.
In light of findings from the first two studies, Study 3 was designed to extend the findings.
Importantly, we wanted to replicate the proposed extensions in the Study 2 with a larger sample
to be able to detect smaller effect sizes. We preregistered our hypotheses and analysis plan on the
OSF, review link: https://osf.io/u3vds/?view_only=42450fc3d6b74866a1c022e7bfd299a9.
Data and R/RMarkdown code for all studies is available on the OSF, review link:
https://osf.io/3df5s/?view_only=b29f8571eb874448907ce45c7379e371 . Full open-science
details and disclosures are provided in the supplementary. All measures, manipulations,
exclusions conducted for this investigation are reported, all studies were pre-registered with
Bias blind spot and free will beliefs 11
power analyses reported in the supplementary, and data collection was completed before
analyses.
Studies 1a and 1b
Studies 1a and 1b were meant as a pre-test of the effects in an undergraduate class.
Students worked in teams of 3-6 to design and run a series of replications, two of those were
Pronin et al.'s Study 1 Survey 2 and Study 2 corresponding to our Study 1a and 1b. The students
then served as the target sample for the experiments designed by their classmates, experiments
they were not involved in designing and had no prior knowledge of. The course materials covered
judgement and decision-making biases, which meant that the students were made aware a wide
array of other biases, and the experiments are, therefore, very conservative tests of the effect in a
non-naive sample.
Students were randomly assigned into groups and to the study for replication. Student
groups designed the survey, conducted effect size and confidence intervals calculations,
conducted power analyses, and wrote the pre-registrations for Studies 1a and 1b. The course
instructor completed the pre-registration on OSF and data collection. All the students registered
in the course were invited to take part as respondents in the study. To ensure anonymity, students
were only asked to indicate which replication group they belonged to and those were later
excluded from the data analysis of the study they designed.
Participants and procedures
A total of 49 undergraduate students took part in the online course survey, and of those we
excluded the four students who designed Study 1a and six students who designed Study 1b,
Bias blind spot and free will beliefs 12
resulting in a sample of 45 for Study 1a (Mage = 20.20, SD = 0.99; 31 females) and 43 for Study
1b.
Study 1a
Measures
Biases and Personal shortcomings.
Participants were presented with descriptions of eight biases and three personal
shortcomings: self-serving attributions for success or failures, dissonance reduction after free
choice, positive halo effect, biased assimilation of new information, reactive devaluation of
proposal one’s negotiation counterparts, perceptions of hostile media bias toward one’s group or
cause, fundamental attribution error (FAE) in “blaming the victim,” and judgments about the
“greater good” influences personal self-interest, procrastination, fear of public speaking, and
planning fallacy. The supplementary includes detailed descriptions of the biases and personal
shortcomings.
For each of the descriptions participants rated on their own susceptibility and
susceptibility of the average student at the university. Ratings were on a nine-point scale (1 = not
at all; 9 = strongly).
Results and discussion
Descriptive statistics of the ratings on the susceptibility to bases biases and personal
shortcomings are presented in Table 2 (see supplementary for the descriptive statistics and plots
for each of the biases and personal shortcomings). We conducted the paired sample t-test to test
the hypothesis, summarized in Table 3.
Bias blind spot and free will beliefs 13
Results of paired t-tests (one-tailed) indicated that participants, consistent with the original
study, reported themselves as less susceptible to biases (M = 5.60, SD = 0.86), than the average
students in the university (M = 6.35, SD = 0.91), Md = -0.75, t (44) = -4.54, p <.001, dz = -0.68,
95% CI [-1.01, -0.35] (Plotted in Figure 1). Self-others asymmetry was found for all individual
biases except for cognitive dissonance (Table S3 in the supplementary).
In the original study, the authors made no prediction regarding self-other personal
shortcomings asymmetry. We conducted a two-tail dependent t-test but failed to find support for
any differences with a weak effect for high ratings of self (self: M = 6.20, SD = 1.78; others: M =
6.49 , SD = 1.23; Md = -0.29; t (44) = -1.13, p = .265; dz = -0.17, 95% CI [-0.47, 0.13]; see Figure
2 and Table S3 in the supplementary for details per each shortcoming). Quite possibly, as in the
original article, the small sample failed to detect a weak effect.
Finally, self-other bias asymmetry (M = -0.75, SD = 1.11) was stronger than self-other
personal shortcomings asymmetry (M = -0.29, SD = 1.72; Md = -0.46, t (44) = -1.97, p = .055, dz
= -0.29, 95% CI [-0.60, 0.01]; see Figure 3).
Study 1b
Measures
Assessed Personality dimensions.
Participants were presented with three positive and three negative personality dimensions
in randomized order. The positive personalities assessed were dependability, objectivity, and
consideration. The negative personalities assessed were snobbery, deceptiveness, and selfishness.
The ratings were made on a 9-points (1 = much lower than the average student; 5 = same as the
average student; 9 = much higher than the average student).
Bias blind spot and free will beliefs 14
Bias recognition.
After rating their personalities, participants were briefed of the better-than-average effect
and asked whether they were influenced by the bias when assessing their personalities (1 -
Objective measures would rate me lower on positive characteristics and higher on negative
characteristics than I rated myself; 2 - Objective measures would rate me neither more positively
nor more negatively than I rated myself; 3 - Objective measures would rate me higher on positive
characteristics and lower on negative characteristics than I rated myself).
Results and discussion
Table 4 details descriptive statistics and Table 5 summarizes statistical tests (Table S4 in
the supplementary details ratings for each personality dimension).
We conducted one-sample one-tail t-tests and found that participants rated themselves as
having more positive personality dimensions (M = 5.74, t (42) = 5.09, p < .001, dz = 0.78, 95% CI
[0.43, 1.11]) and less negative personality dimensions than others (M = 4.16, t (42) = -4.55, p <
.001, dz = -0.69, 95% CI [-1.02, -0.36]).
We then conducted a chi-squared test to test the hypothesis that the majority of
participants deny having the better-than-average effect, comparing to a 50%-50% random split.
Despite being made aware of the potential bias, only 9 of the 43 participants (21%)
acknowledged their potential bias leaving 79% of participants still claiming to be better than their
average peers (χ2 (1, N = 43) = 14.53, p < .001, dz = 1.43, 95% CI [0.69, 2.16]).
Findings supported the better than average effect and denial of their own bias. Effect size
(dz = 1.43) of the replication was almost two times greater than the effect size of the original
Bias blind spot and free will beliefs 15
study (dz = 0.70) and the replication’s confidence intervals ([0.69, 2.16]) includes the original
effect size point estimate. We conclude the replications as successful.
Study 2
Method
Participants and procedures
A total of 303 American Amazon Mechanical Turk (MTurk) participants completed the
study using TurkPrime.com (Mage = 38.45, SD = 11.58; 166 females). First, participants rated
their free will beliefs on two scales and then rated their and others susceptibility to the
descriptions of eight biases and three personal shortcomings. The design was a 2 (self and other
ratings) by 2 (biases and personal shortcomings) within-subject design and display of conditions
was counterbalanced (see supplementary for more details and full measures). Participants then
answered a funneling section and provided demographic information.
Measures
Belief in Free will.
Free will beliefs (BFW) were measured using two free-will belief subscales: 5 items
measure of general BFW (Nadelhoffer, Shepard, Nahmias, Sripada, & Ross, 2014) (1 = Strongly
disagree, 7 = Strongly agree; 𝛼 = 0.91) and BFW personal agency subscale (Rakos, Laurene,
Skala, & Slane, 2008) (4 items; 1 = Not true at all, 5 = Almost always true; 𝛼 = 0.92). Details of
all measures are provided in the supplementary.
Bias blind spot and free will beliefs 16
Biases and Personal shortcomings.
Similarly to Study 1a, participants rated their own and other average Americans
susceptibility to biases and personal shortcomings (1 = not at all; 9 = strongly).
Results
Descriptive statistics are provided in Table 6 and statistical tests summary in Table 7 (see
Table S6 and Table S7 and Figures S5 to S8 in the supplementary for each of the biases and
personal shortcomings separately).We conducted a dependent sample t-test and found that
participants' perceived susceptibility to biases for self (N = 303; M = 4.64, SD = 1.35) was lower
than of others (M = 5.78, SD = 1.16 ; Md = -1.15; t (302) = -16.16, p < .001; dz = -0.93, 95% CI [
1.06, -0.79]; see Figure 4), and the self-other asymmetry effects were similar across all eight
biases (p < .001; see Table S8 in supplementary). In comparison, the original study found support
for only four of the eight biases and with weaker effects, possibly due to lacking power.
The original study found no support for self-other asymmetry in perceived personal
shortcomings, and the hypothesis was for a null (or weaker) effect. We conducted a dependent
sample t-test and found perceived personal shortcomings (M = 5.35, SD = 1.88) were lower than
perceived susceptibility to biases of other MTurk workers (M = 5.87, SD = 1.35; Md = -0.52; t
(302) = -5.22, p <.001, dz = -0.30, 95% CI [-0.42, -0.18]; see Figure 5). The original study
reported self as lower than others for all three perceived personal shortcomings, yet it was not
reported if any of the results reached statistical significance. Our dependent sample t-tests found
support for an asymmetry for two out of three personal shortcomings (procrastination and
planning fallacy; see Table S8 in supplementary for details on each of the personal
shortcomings). These findings deviate from the findings of the original study.
Bias blind spot and free will beliefs 17
Based on the findings in the original study, we expected a significant difference between
the biases and personal shortcomings asymmetries. We conducted a dependent sample t-test and
indeed found that self-other biases asymmetry (M = -1.15, SD = 1.24) was larger than the self
other personal shortcomings asymmetries (M = -0.52, SD = 1.75; N = 303 ; Md = -0.62, t (302) =
6.39, p <.001, dz = -0.37, 95% CI [-0.48, -0.25]; see Figure 6).
Finally, we examined the link between free will beliefs and perceived personal
shortcomings of self and others. Pearson correlations are detailed in Table 8. Both free will
beliefs scales were negatively correlated with perceived self personal shortcomings (general free
will: r = -0.22, p < .001, 95% CI [-0.32, -0.11]; personal agency: r = -0.17, p = .003, 95% CI [
0.28, -0.06]). However, we found no support for a link between free will beliefs measures and
perceived shortcomings in others (general free will: r = 0.00, p = .941, 95% CI [-0.11, 0.12];
personal agency: r = 0.05, p = .357, 95% CI [-0.06, 0.16]). Free will beliefs negatively correlated
with personal shortcomings self-other asymmetry (general free will: r = -0.24, p < .001, 95% CI
[-0.34, -0.13]; personal agency: r = -0.22, p < .001, 95% CI [-0.33, -0.11]).
Probing the link between free will beliefs and susceptibility to biases we only found
support for personal agency subscale as negatively correlated with self-other asymmetry for
susceptibility to bias (r = -0.17, p = .003, 95% CI [-0.28, -0.06]). We did not find support for
correlations between free-will beliefs and any of the measures associated susceptibility to biases:
self-bias (general free will: r = -0.01, p = .811, 95% CI [-0.13, 0.10]; personal agency: r = -0.11,
p = .055, 95% CI [-0.22, 0.00]), and others' bias (general free will: r = 0.02, p = .737, 95% CI [
0.09, 0.13]; personal agency: r = 0.05, p = .366, 95% CI [-0.06, 0.16]).
Bias blind spot and free will beliefs 18
Study 3
Method
Participants and procedures
A total of 621 American MTurk participants completed the study using TurkPrime.com
(Mage = 39.15, SDage = 11.88; 346 females). The rationale of the present study was as follows:
First, to test the robustness and reliability of the replication results found in Study 1a, Study 1b,
and Study 2 with a larger sample. Second, more importantly, replicate the proposed extension
hypotheses between free will beliefs and perceived personal shortcomings of self and others.
Study 3 combined Study 1a and Study 1b as one single study. Procedures were modeled to
remain as close as possible to the original studies. The study included three parts. Participants
first completed the measures of free will beliefs. Then, in randomized order, participants rated
their and others susceptibility to given descriptions of eight biases and three personal
shortcomings, and compared themselves to others on three positives and three negative
personality dimensions with a test for recognition of their bias.
Measures
The measures for biases and personal shortcomings followed the design of Study 2. The
measures of personality dimensions and bias recognition followed were the same as Study 1b.
Free will beliefs were measured by three of the most common scales: eight items of the free-will
and determinism personal will sub-scales (Rakos et al., 2008) (0 = Not true at all, 4 = Almost
always true; 𝛼 = 0.74), five items of the general free will beliefs scale (Nadelhoffer et al., 2014)
(1 = Strongly disagree, 7 = Strongly agree; 𝛼 = 0.89) and the seven items from free will and
Bias blind spot and free will beliefs 19
determinism plus scale (Paulhus & Carey, 2011) (1 = Not at all true, 5 = Always true; 𝛼 = 0.85;
recoded from a scale of 0 to 4 to match the original scale range).
Results
Table 9 details descriptive statistics and Table 10 summarizes statistical tests (see Table
S11 and Table S12 and Figures S9 to S12 in supplementary for each of the biases and personal
shortcomings separately).
Perceived susceptibility to bias and personal shortcomings
We conducted a series of dependent sample t-test mirroring Studies 1a and 2 (N = 621).
Perceived susceptibility to biases was lower for self (M = 4.69, SD = 1.30) than for others (M =
6.48, SD = 1.04; Md = -1.80; t (620) = -32.04, p <.001, dz = -1.29, 95% CI [-1.39, -1.18]; see
Figure 7), in all biases (p < .001; see Table S13 in supplementary). Perceived shortcomings were
lower for self (M = 5.52, SD = 1.71) than for others (M = 6.25, SD = 1.16; Md = -0.73; t (620) =
10.54, p <.001, dz = -0.42, 95% CI [-0.51, -0.34]; see Figure 8), especially in procrastination and
planning fallacy (see Table S13 in supplementary). Self-other asymmetry was larger for biases
(M = -1.80, SD = 1.40) than for personal shortcomings (M = -0.73, SD = 1.73; Md = -1.06; t (620)
= -13.01, p <.001, dz = -0.52, 95% CI [-0.61, -0.44]; see Figure 9). The results align with the
findings in Study 2.
Denying Personal Susceptibility to the Better Than Average Effect
We conducted a series of one-sample t-tests mirroring Study 1b. Participants rated
themselves as possessing more positive personality dimensions (M = 6.42; t (620) = 31.74, p
<.001; dz = 1.27, 95% CI [1.17, 1.38]) and less negative personality dimensions (M = 3.21; t
(620) = -30.38, p <.001; dz = -1.22, 95% CI [-1.32, -1.11]), compared to others.
Bias blind spot and free will beliefs 20
To assess denial of the bias, we conducted a chi-square comparing to a 50%-50% split.
Only 109 of the 621 participants (18%) admitted bias, leaving 82% denying the bias (χ2 (1, N =
621) = 261.53, p < .001; dz = 1.71, 95% CI [1.50, 1.91]).
Free-will beliefs and biases
Finally, we examined the link between free will beliefs and perceived personal
shortcomings of self and others. Pearson correlations are detailed in Table 11.
Belief in free will and personal shortcomings.
Personal shortcomings for self was negatively associated with free-will beliefs (general: r
= -0.16, p < .001, 95% CI [-0.23, -0.08]; personal agency: r = -0.15, p < .001, 95% CI [-0.22,
0.07]; personal will: r = -0.09, p = .022, 95% CI [-0.17, -0.01]). However, we found no consistent
support for a link between free will beliefs measures and perceived shortcomings in others
(general: r = 0.02, p = .551, 95% CI [-0.05, 0.10]; personal agency: r = 0.05, p = .231, 95% CI [
0.03, 0.13]; personal will: r = 0.11, p = .007, 95% CI [0.03, 0.18]).
Free will beliefs negatively correlated with personal shortcomings self-other asymmetry
(general free will: r = -0.17, p < .001, 95% CI [-0.25, -0.09]; personal agency: r = -0.18, p < .001,
95% CI [-0.25, -0.10]; personal will: r = -0.16, p < .001, 95% CI [-0.24, -0.09]).
Probing the link between free will belief and susceptibility to biases we found support for
personal will as negatively correlated with susceptibility to bias of the self (r = -0.14, p < .001,
95% CI [-0.22, -0.06]) and positively correlated with the susceptibility to bias of others (r = 0.12,
p = .003, 95% CI [0.04, 0.20]). Overall personal will was negatively correlated with self-other
asymmetry for susceptibility to bias (r = -0.22, p < .001, 95% CI [-0.29, -0.14]). We found no
Bias blind spot and free will beliefs 21
support for a correlation with the two other measures of free will beliefs (correlations ranged
between 0.01 CI [-0.07, 0.09] and -0.03 CI [-0.11, 0.05]).
Belief in free will and better than average effect
We found support for an exploratory negative relationship between free-will beliefs and
negative personality dimensions (general free-will: r = -0.09, p = .033, 95% CI [-0.16, -0.01];
personal agency: r = -0.16, p < .001, 95% CI [-0.23, -0.08]; personal will: r = -0.12, p = .003,
95% CI [-0.20, -0.04]). Positive personality dimensions were positively correlated with personal
will (r = 0.15, p < .001, 95% CI [0.07, 0.23]), but no support for a positive correlation with the
two other measures (General free will: r = 0.04, p = .341, 95% CI [-0.04, 0.12]; Personal agency:
r = 0.05, p = .222, 95% CI [-0.03, 0.13]).
Denial of bias correlated with general free-will (r = 0.11, p = .007, 95% CI [0.03, 0.19])
and personal agency (r = 0.14, p < .001, 95% CI [0.06, 0.22]), with no support for personal will
(r = 0.03, p = .531, 95% CI [-0.05, 0.10]).
Overall, the results of the free will related findings are consistent with the results of Study
2.
General Results: Mini Meta-Analysis
We summarized the findings of the three studies together with the findings from original
article using a mini meta-analysis to assess the overall effect size (Goh, Hall, & Rosenthal, 2016;
Lakens & Etz, 2017). The overall effects for Study 1 Survey 2 of the original study were as
follows: bias asymmetry = -0.98 (95% CI = [-1.25, -0.72], p < .001) (see Figure 10), personal
Bias blind spot and free will beliefs 22
shortcomings asymmetry = -0.19 (95% CI = [-0.47, 0.08], p = .158) (see Figure 11), bias versus
shortcomings difference = -0.44 (95% CI = [-0.56, -0.32], p < .001) (see Figure 12).
Similarly, overall effects for Study 2 of the original study were as follows: better than
average effect for positive personality dimensions = 1.22 (95% CI = [0.78, 1.66], p < .001) (see
Fig. 13), better than average effect for negative personality dimensions = -1.07 (95% CI = [-1.39,
-0.75], p < .001) (see Fig. 14), and denial to better than average effect = 1.32 (95% CI = [0.72,
1.91], p < .001) (see Fig. 15).
General Discussion
Summary and evaluation of replications
We conducted three replication studies of two studies from Pronin et al. (2002), testing the
bias blind spot effect. We summarized the findings of the three replication studies in Table 12.
Overall, we found that: (1) participants' perceived their susceptibility to biases as lower than that
of others, (2) participants perceived their own personal shortcomings as lower than that of others,
(3) bias asymmetry was larger than personal shortcomings asymmetry, (4) participants rated
themselves as higher on positive personality dimensions and lower on negative personality
dimensions, and (5) denied exhibiting the bias.
The first aim of the current replication effort is to evaluate—in a confirmatory manner—
the size of an effect observed in the original study. To interpret the replication results we
followed the framework by LeBel, McCarthy, Earp, Elson, and Vanpaemel (2018) that take into
account three distinct statistical aspects of the results: (a) whether a signal was detected in the
replication (i.e., the confidence interval for the replication Effect size (ES) excludes zero), (b)
Bias blind spot and free will beliefs 23
consistency of the replication ES with the original study’s ES, and (c) precision of the
replication’s ES estimate. The replication ES for asymmetry in bias in three individual studies
ranged between dz = -0.68 [-1.01, -0.35] and dz = -1.29 [-1.39, -1.18]. When pooled across all
studies with a mini-meta analysis, the overall estimate of the ES was: dz = -1.00 [-1.33, -0.67].
The results indicate signal was detected and that the replication ES is consistent with the original
study, i.e., the replication’s confidence interval includes the original ES point estimate of 0.86.
The replication results testing the asymmetry in personal shortcomings across three studies
ranged between dz = -0.17 [-0.47, 0.13] and dz = -0.42 [-0.51, -0.34]. Comparing the meta
analytic estimate (dz = -0.34 [-0.46, -0.23]) with original study suggest that, although, a signal
was detected, the replication ES is inconsistent and opposite in direction with the original ES
point estimate of 0.28. Therefore, a less favorable replication outcome suggesting small sample
size in the original study may have contributed to the observed effect. Finally, the hypothesis
testing the asymmetry between bias and personal shortcomings in current replication studies, ES
ranged between dz = -0.29 [-0.60, 0.01] and dz = -0.52 [-0.61, -0.44]. Meta-analytic estimate of
the ES (dz = -0.43 [-0.56, -0.29]) is inconsistent with the original ES point estimate of -0.61, i.e.,
similar in direction but smaller than the ES of the original study.
We followed a similar approach to summarize the replication of Study 2 of Pronin et al.
(2002). The replication ES’s for better than average effect for positive personality dimensions
were dz = 0.78 [0.43, 1.11] (Study 1b) and dz = 1.27 [1.17, 1.38] (Study 3). Meta-analytic
estimate of the ES (dz = 1.05 [0.57, 1.54]) is inconsistent with the original study’s ES point
estimate of 1.61, i.e., similar in direction but smaller than the ES of the original study. The
replication ES’s for better than average effect for negative personality dimensions were dz = -0.69
[-1.02, -0.36] (Study 1b) and dz = -1.22 [-1.32, -1.11] (Study 3). Meta-analytic estimate of the ES
Bias blind spot and free will beliefs 24
(dz = -0.98 [-1.49, -0.47]) is consistent with the original study’s ES point estimate of -1.24.
Similarly, the replication ES for denial of bias were dz = 1.43 [0.69, 2.16] (Study 1b) and dz =
1.71 [1.50, 1.91] (Study 3). Meta-analytic estimate of the ES (dz = 1.69 [1.49, 1.88]) is
inconsistent with the original study’s ES point estimate of 0.76, i.e., similar in direction but larger
than the ES of the original study.
In summary, the replication results show that ESs are similar in direction with the original
study for all the hypothesis tested and indicated signal (i.e., ES excludes zero) except for the
prediction of asymmetry in perceived shortcomings. As noted above effect size estimates in some
cases were inconsistent with the original study. However, we note that the sample size employed
in the original study was small. Overall, the replication results provide reasonable support for the
findings of the original study.
Agency beliefs extension
In Studies 2 and 3 we ran extensions examining the link between free will beliefs and the
bias blind spot effects, and the findings are summarized in Table 13.
To this end, we pre-registered the theoretical relationship between the strength of belief in
free will on individuals’ tendency toward bias blindness and better than average effect. Overall,
our findings provide support for the hypothesis that belief in free will is linked asymmetry in
perceived personal shortcomings of self and of others. This particular asymmetry is mainly
driven by the negative correlation between BFW and perceived personal shortcomings. The
findings are in line with the recent findings that indicate that people’s view on the free will
question can affect fundamental cognitive processes. Most importantly, belief in free will is
associated with an increased sense of agency (Lynn, Muhle-Karbe, Aarts, & Brass, 2014) and
Bias blind spot and free will beliefs 25
self-efficacy (Baumeister & Brewer, 2012). In the similar vein, current findings support the view
that more people believe in free-will the less they perceive the personal shortcomings of the self
because of the agentic view that their own behavior is generated by themselves (e.g., desires,
goals), rather than by constraints. Across two studies, results confirm the hypothesis.
The exploratory hypothesis that tested for the relationship between free will beliefs and
magnitude of blind spot related to biases did not indicate conclusive support. However, we did
not find any effects to the opposite direction, but rather effects indistinguishable from zero. The
findings suggest that free will may not have a meaningful influence on the invidious distinctions
people make between their own and others’ susceptibility to bias. Previous work by Genschow et
al. (2017) finds that free will beliefs are positively correlated with correspondence bias. The
current finding suggests free will beliefs not have the same nature of the relationship with
individuals’ susceptibility towards other kinds of biases.
Findings of study 3 are in support of the pre-registered exploratory hypothesis that belief
in free was negatively associated with negative personality dimensions (snobbery, deceptiveness,
and selfishness). Findings are consistent with the theoretical view that belief in free will is
associated with moral responsibility. For example, Vohs and Schooler (2008) found that inducing
disbelief in free will increased participants’ cheating behavior. Similarly, Baumeister,
Masicampo, and DeWall (2009) found that an attenuated belief in free will reduce participants’
pro-social inclinations. Martin et al. (2017) find that free will beliefs positively related to harsher
punishments of unethical behavior. Negative personality dimensions included in the current study
do correspond to the moral responsibility in a person. However, we found no support for the
prediction that free will beliefs are positively correlated with positive personality dimensions.
Results suggest that belief in free will may not be associated with better than average effect in
Bias blind spot and free will beliefs 26
regards to positive personality dimensions. The lack of support for this hypothesis is consistent
with the theoretical argument that free will underlies laypersons’ sense-making for accountability
and choice more so under negative circumstances (Feldman, Wong, & Baumeister, 2016).
However, the results of the correlation between free will beliefs and the extent of denial of bias is
positive and significant.
In summary, results from the pre-registered extension hypotheses indicated that direction
of correlation holds in almost all cases: with a couple of exceptions (noted above). When the
exceptions occur to the pre-registered hypothesis, we did not find effects to the opposite
direction, but rather effects indistinguishable from zero.
Conclusion
We aimed to replicate and extend previous findings of bias blind spot effect that refer to
the tendency to see bias in others while being blind to it in ourselves. For the most part, we
replicated the results reported by Pronin et al. (2002). The study contributes to the recent call for
systematic, large-scale, and preregistered replication and validation studies. Additionally, the
present investigation explored the relationship between free will beliefs and the tendency to
impute bias more to others than to the self is rooted. We extended the literature on bias blind spot
exploring the sources of bias blind spot (e.g., Pronin & Kugler, 2007).
Saturday, March 9, 2019
Friday, March 8, 2019
A Forensic Examination of China’s National Accounts 2008-2016: Growth overstated by more than 13pct
Chen, Wei, Xilu Chen, Chang-Tai Hseih and Zheng (Michael) Song. 2019. “A Forensic Examination of China’s National Accounts” BPEA Conference Draft, Spring. https://www.brookings.edu/bpea-articles/a-forensic-examination-of-chinas-national-accounts/
ABSTRACT: China’s national accounts are based on data collected by local governments. However, since local governments are rewarded for meeting growth and investment targets, they have an incentive to skew local statistics. China’s National Bureau of Statistics (NBS) adjusts the data provided by local governments to calculate GDP at the national level. The adjustments made by the NBS average 5% of GDP since the mid-2000s. On the production side, the discrepancy between local and aggregate GDP is entirely driven by the gap between local and national estimates of industrial output. On the expenditure side, the gap is in investment. Local statistics increasingly misrepresent the true numbers after 2008, but there was no corresponding change in the adjustment made by the NBS. We provide revised estimates of local and national GDP by re-estimating output of industrial, wholesale, and retail firms using data on value-added taxes. We also use several local economic indicators that are less likely to be manipulated by local governments to estimate local and aggregate GDP. The estimates also suggest that the adjustments by the NBS were insufficient after 2008. Relative to the official numbers, we estimate that GDP growth from 2008-2016 is 1.7 percentage points lower and the investment and savings rate in 2016 is 7 percentage points lower.
ABSTRACT: China’s national accounts are based on data collected by local governments. However, since local governments are rewarded for meeting growth and investment targets, they have an incentive to skew local statistics. China’s National Bureau of Statistics (NBS) adjusts the data provided by local governments to calculate GDP at the national level. The adjustments made by the NBS average 5% of GDP since the mid-2000s. On the production side, the discrepancy between local and aggregate GDP is entirely driven by the gap between local and national estimates of industrial output. On the expenditure side, the gap is in investment. Local statistics increasingly misrepresent the true numbers after 2008, but there was no corresponding change in the adjustment made by the NBS. We provide revised estimates of local and national GDP by re-estimating output of industrial, wholesale, and retail firms using data on value-added taxes. We also use several local economic indicators that are less likely to be manipulated by local governments to estimate local and aggregate GDP. The estimates also suggest that the adjustments by the NBS were insufficient after 2008. Relative to the official numbers, we estimate that GDP growth from 2008-2016 is 1.7 percentage points lower and the investment and savings rate in 2016 is 7 percentage points lower.
Greater male variability is currently universal in internationally comparable assessments; some of this heterogeneity can be attributed to some species universal mechanism or some other social/cultural phenomenon
Sex differences in variability across nations in reading, mathematics and science: a meta-analytic extension of Baye and Monseur (2016). Helen Gray, Andrew Lyth, Catherine McKenna, Susan Stothard, Peter Tymms and Lee Copping. Large-scale Assessments in EducationAn IEA-ETS Research Institute Journal 20197:2. https://doi.org/10.1186/s40536-019-0070-9
Abstract: A recent study by Baye and Monseur (Large Scale Assess Educ 4:1–16, 2016) using large, international educational data sets suggest that the “greater male variation hypothesis” is well supported. Males are often over-represented at the tails of the ability distribution despite similarity in measures of central tendency and the gradual closing of the attainment gap relative to females. In this study, we replicate and expand Baye and Monseur’s work, and explore greater male variability by country using meta-analysis and meta-regression. While we broadly confirm that variability is greater for males internationally, we find that there is significant heterogeneity between countries, and that much of this can be quantified using variables applicable across these assessments (such as test, year, male–female effect size, mean country score and Global Gender Gap Indicators). While it is still not possible to make any causal conclusions regarding why males are more varied than females in academic assessments, it is possible to show that some national level variables effect the magnitude of this variation. Results and suggestions for further work are discussed.
Introduction
Sex differences in cognitive abilities is a contentious issue, yet one that continues to draw the attention of the public and the research community alike. 21st Century society is motivated to ensure issues of equity between the sexes are adequately addressed, particularly within the sphere of educational opportunity (Marks 2008; UNESCO 2011). Despite best efforts however, inequities still exist internationally, for example, with females underrepresented in our most prestigious educational institutions and males overrepresented in school underperformance, particularly in core areas such as reading and mathematics (Baye and Monseur 2016; Dubet 2010; Jacobs 1996; Morgan and Kett 2003; Quinn and Wagner 2013).
While the evidence shows that the gap between men and women is closing on average across many educational outcomes (Hyde et al. 1990; OECD 2015) and, in some cases, it now favours women (Lietz 2006; OECD 2015), this shift does not appear to have translated directly into ensuring parity across higher professions and positions, a phenomenon which appears somewhat paradoxical. Baye and Monseur (2016) suggested that this may be due to the way in which sex differences have been historically examined, focussing on mean results which assume homogeneity of variance across the achievement distribution. In a study using international assessment data, they demonstrated that the magnitude of the sex differences in achievement across literacy, mathematics and science varied across the range of results, and that the largest differences are seen at the extreme tails of the distribution. Girls tended to outperform boys at both tails of the distribution on reading measures, and in the lower percentiles of mathematics and science, while boys outperformed girls in the higher percentiles of mathematics and science. While the differences at the top of the distribution were of note, they called attention to the fact that inequities in the lower percentiles of the distribution were much more striking.
Baye and Monseur (2016) also examined the variance ratios of boys and girls on these assessments and found that in 93% of cases, variances for boys were higher. The finding of greater male variances in assessments here is not in and of itself original and has been noted in studies for many decades (although rarely as a core focus). The “greater male variability” hypothesis in fact has its roots in the 19th century (Ellis 1894). However, if we are to understand differences between the sexes at different points of the distribution, we must attempt to determine how their respective distributions differ. It is to the issue of differences in variability, not average performance, that the rest of this paper attempts to address building on earlier work.
Male and female variability
Differences in the spread of scores between males and females have been noted in educational assessments for a long time, although often with contrasting findings. Maccoby and Jacklin (1974) showed that males were more variable than females in mathematical and spatial abilities, whereas variances showed parity in verbal measures. Feingold (1992) found larger male variances in the domains of general reasoning, mechanical reasoning, abstract reasoning, quantitative and spatial abilities, perceptual speed, memory and on verbal test batteries. Strand et al. (2006) found similar patterns in the domains of verbal, quantitative and non-verbal reasoning on a representative sample of 11-year olds in the UK, with greater male variances ranging between 7 and 17%. Similar results on U.S. students were found by Lohman and Lakin (2009) and later, Lakin (2013). IQ scores have also shown to reflect the same pattern (Johnson et al. 2008). Finally, assessments of non-cognitive and behavioural domains such as creativity (He et al. 2013; Karowski et al. 2016), sensation seeking (Cross et al. 2011), personality (Borkenau et al. 2013) and aggression (Archer and Mehdikhani 2003) appear subject to the effect. Combine these findings with the work reported earlier from Baye and Monseur and the fact that the above represents only a fraction of reported findings, one can see why many consider greater male variability to be ubiquitous.
Yet despite the volume of work related to differences in variances between the sexes, there has been little systematic attempt to explain this phenomenon (either partially or in its entirety). This is likely in part due to the contention that studies on sex differences in abilities tends to bring with it. Feingold (1992) noted that the explanation for greater male variability has become a polarised nature versus nurture debate. As a result, many empirical papers avoid proposing an explanation. Johnson et al. (2008) point out that although results have often seemed clear, studies are often attacked on methodological grounds pertaining to sample size, representativeness, sample selectivity and age amongst other things. While it is not our intent to repeat the full history of the greater male variability hypothesis (see Johnson et al. for an in-depth review) we will briefly consider some of the proposed explanations for this effect.
Explanations for greater male variability
As Feingold claimed, arguments regarding biological innateness are often invoked for theories of sex differences in cognitive and behavioural domains. Early theories (Ounsted and Taylor 1972) focused on the Y chromosome, claiming that differences in gene expression resulted in slower development and expressed more harmful as well as more beneficial traits, which would presumably lead to more variability in males. Gualtieri and Hicks (1985) suggested such differences could emerge from differences in the uterine environment, making males more differentially susceptible to physical and psychological disorders over the lifespan.
Evolutionary theories suggest that ancient adaptive mechanisms produced greater male variability to enhance survival in ancestral environments and that they are still in operation today. Evolutionary theories are based on sexual selection theory and parental investment theories (see Archer and Mehdikhani 2003 for a comprehensive review) and they would ultimately result in males showing greater variation across a range of traits in order to ensure reproductive fitness. Hill (2017) proposed two mathematical models simulating how one sex could have become more variable over evolutionary time if one sex in our ancestral past (presumably females in the case of homo sapiens, although Hill makes no explicit assumption) is more selective of the other for the purposes of mating, and that this greater variability will be independent of other measures of central tendency. Hill also suggested that in such circumstances where the selective sex is no longer being as selective, greater variability in the selected sex may in fact decline over successive generations. No direct test of this latter hypothesis has been made however.
While many support the biological and evolutionary basis for greater male variability, there are some shortfalls in this interpretation, as well as additional potential explanations as to why males are perhaps more variable. Miller (2001) claimed that susceptibility to defects resulting from prenatal conditions would only explain why males are overrepresented in the lower, not the higher tail of a distribution. As early as (1922), Hollingworth argued for an explanation based on gender roles, claiming that male employment, compared to the more restricted home role of women, allowed them the opportunity for greater diversification in education and environmental experiences. Noddings (1992) highlighted the issue of conformity, claiming that while most girls worked hard enough to avoid being in the bottom of the distribution in class, brighter girls are often pressured into not demonstrating the full extent of their abilities. Ceci et al. (2009) argued that biological accounts of differences in quantitative fields between the sexes are largely inconsistent and suggested that female preferences were a better explanation of underrepresentation in some professions. Critics of the evolutionary perspective also argue that if this phenomenon resulted from innate, evolved mechanisms, invariance of this effect across cultures would be expected. Several previous studies indicate that some nations show greater male variation, others greater female variation and many show homogeneity of variance (Feingold 1994). Feingold went on to attribute heterogeneity in his data to social and cultural factors rather than any innate biological mechanism. Feingold (1992) also argued that national test norms alone may not be sufficiently generalizable to afford definitive proof of a biological origin of greater male variability. However, more recent studies using international assessments such as PISA, PIRLS and TIMSS do seem to suggest that variability is greater for males in the domains of reading and mathematics across cultures (Baye and Monseur 2016; Machin and Pekkarinen 2008).
There has been some suggestion that elements of test design may also play a role in magnifying sex differences in terms of measures of central tendency and variances. Spelke (2005) claimed that supposed differences in ability, particularly in mathematics and science, resulted largely from item and test biases favouring males, and that research generally fails to support the greater male variability hypothesis in these domains. Lakin (2013) supports this to an extent, suggesting that changes to Cognitive Ability Tests (specifically, the introduction of new quantitative reasoning items with a lesser verbal load) may have been responsible for shifting more males into the upper echelons of the distribution compared with earlier versions of the assessment. Strand et al. however found few substantive sex differences related to item difficulties in non-verbal and verbal batteries and suggested that test construction was unlikely the root cause of differences in variability. They made a tentative suggestion that a speed-accuracy trade off favouring boys may account for some of the variability differences in quantitative domains, but cautiously note that that previous research has mirrored these effects in untimed assessments (such as Feingold 1992). Lakin also noted that the consistent trend of increasing variance ratios between cognitive ability tests at grades 4 and 7 is likely to be something more systematic than simple test design and potentially reflects changes to society in terms of educational opportunity and personal educational preferences. Arguments focussing purely on test construction and procedure are thus hard to substantiate in the current literature.
Machin and Pekkarinen (2008) highlighted a compositional effect of sex differences in central tendency and distribution of scores. In their analysis of TIMSS and PIRLS data in 15-year olds, they noted that greater male variance in maths was attributable to overrepresentation of males in the higher part of the test distribution, with males outperforming females on average. In reading, male overrepresentation was largely at the bottom of the distribution, with females outperforming males on average. Indeed, Nowell and Hedges (1998) found a correlation of 0.74 between variance ratios and male–female effect sizes. Baye and Monseur found a smaller overall correlation of 0.42. However, they noted that the strength of the relationship varied by the point in the distribution. At the 5th percentile, the relationship was 0.50. At the 95th percentile, this had declined to 0.31. These results seem to suggest that variability for males increases in line with superior female performance, particularly at the lower end of the distribution.
The current study
While Feingold’s work (1994) failed to show a consistent greater male variance in international test scores, this could be attributable to the methodology. He conducted a meta-analysis by searching the literature for reading, mathematics and spatial measures, which carries many issues with it including many different tests, test administrations, issues of representation etc. Baye and Monseur (2016), using more recently available international assessments (PISA, PIRLS and TIMSS) found different results, suggesting that greater male variability was effectively universal. They found that variances (on average) were 15% greater for males in reading, 12% greater in maths and 14% greater in science. Even using Feingold’s (1994) conservative estimate of any ratio falling between 0.90 and 1.10 as not representing evidence of greater variance, Baye and Monseur’s work is suggestive of greater male variability. The advantage of using these international assessments is that they are designed to be internationally comparable, with representative samples of children selected in each country and administered in a standardised fashion. This helps remove potentially confounding factors that may impact on assessment results.
However, Baye and Monseur’s work leaves many questions unanswered. How similar are countries to each other in terms of variance ratios, and are there some that are much more male biased than others? If countries vary in terms of male and female variances, are there any recorded factors that may account for this? Baye and Monseur did make some attempt to look at differences between primary and secondary school measures, as well as by IEA and OECD membership, but beyond this, no systematic heterogeneity analysis was conducted. Yet analysing heterogeneity is important and can be revealing. Furthermore, this international data could be linked to cross-country metrics that may elucidate meaningful patterns of variation. For example, Borkenau et al. (2013) showed that differences across countries in variances in personality were significantly linked to national measures of gender inequality and human development. Given earlier suggestions by Hollingworth (1922) that variances favouring males are largely due to gender roles, and later works (Ceci et al. 2009; Lakin 2013) suggesting that societal practices and female choice are likely to have a major impact on variance ratios, international indices of societal development, particularly forms of gender inequality, are potential sources that could be used to explain any cross-national heterogeneity. To our knowledge, this has not been examined in the context of large-scale international assessments.
In this study, we attempt to answer these questions and extend our knowledge surrounding the nature of greater male variability. We examined the same data sets used by Baye and Monseur, with the addition of more recent test administrations from years 2015 and 2016, to (1) replicate their findings using meta-analysis, (2) determine if greater male variability is homogenous both within and between countries and (3) quantify any meaningful sources of heterogeneity. For the purposes of the third aim, we link these data to international metrics on human progress (Human Development Index) and male–female participation in education, labour forces and politics (Global Gender Gap Index) as well as examining test specific factors such as grade, test, OECD membership, the size of the male–female difference at the mean and national means.
Method
Data sources
Data from three major international assessments were selected to allow an examination of variance ratios across countries: OECD PISA (Programme for International Student Assessment; 2000, 2003, 2006, 2009, 2012, 2015), IEA PIRLS (Progress in International Reading Literacy Study; 2001, 2006, 2011) and IEA TIMSS (Trends in International Mathematics and Science Study; 1995, 1998, 1999, 2003, 2007, 2008, 2011, 2015). These were selected due to having multiple testing points over time and having a wide coverage of countries across the globe. All data is freely available from the OECD website (http://www.pisa.oecd.org) and IEA Study Data Repository (http://rms.iea-dpc.org). Methodological information is available in the technical reports on each survey (Adams and Wu 2002; Martin et al. 2000, 2003, 2004, 2007, 2016; Martin and Kelly 1996, 1997; Martin and Mullis 1996, 2012; OECD 2005, 2009a, 2014, 2016; Olson et al. 2008).
International data on Human Development was also collected where available for each country. The Human Development Index (HDI) is made up of four sub-factors: expected years of schooling for children of school entry age, mean years of schooling for adults aged 25 and above, life expectancy and gross national income per capita (GNI). This data is freely available from the United Nations Development program website (http://hdr.undp.org/en/data).
International data on gender inequality was also gathered from the Global Gender Gap project. The Global Gender Gap Index (GGGI) is made of four sub-factors: economic participation, educational attainment, health and survival and political empowerment. Each factor represents an outcome and is measured on a scale of 0 to 1, where a score of 1 would represent parity between males and females. Data is freely available from the World Economic Forum’s website (http://reports.weforum.org).
Sample
Data from each country surveyed within each of the assessments was included in this analysis. For the purposes of this study, we used measures from three content areas: literacy, maths literacy and science literacy. In total, we included 564 cases for literacy, 1054 cases for mathematics literacy and 991 cases for science literacy gathered from over 100 nations worldwide (where each case represents a national test occurrence within a given year and within a specific content area). In terms of population size across all cases, in mathematics literacy it consists of 2,507,046 males and 2,512,273 females, for reading 1,471,698 males and 1,486,578 females and for science literacy 2,512,559 males and 2,515,645 females. It should be noted that for science literacy, we did not use data from TIMSS Advanced as these measures focussed on concepts from Physics only.
Data calculations
Statistics were calculated by generating means and standard deviations for males and females within each country for each measure within each assessment. These were calculated using each of the five plausible values within each database and aggregated according to the methodologies supplied by the OECD and IEA in their analyses manuals (OECD 2009b; Martin et al. 2016). Standard errors for these statistics were calculated using replicate weights within each database (80 Fay weights in PISA and 75 JK2 replicates in PIRLS and TIMSS). SPSS (V22; IBM Corp 2013) was used to calculate these statistics (see OECD 2009b; Martin et al. 2016 for technical details regarding the SPSS macros used to compute these statistics).
Variances were calculated from the standard deviations. The ratio of male to female variances was taken by dividing the male variance by the female variance. A variance ratio greater than one would indicate that the male variance is higher than the female variance. Variance ratios are a common method of examining variability between the sexes (see Hedges and Friedman 1993; Baye and Monseur 2016). In keeping with previous authors (Hedges and Friedman 1993; Katzman and Alliger 1992), but not Baye and Monseur (2016), ratios were logarithmically transformed to increase precision of the estimates and to avoid overestimation, as it ensures a normal distribution. Assuming that the log of the variances follows a normal distribution, the variances of these ratios were then calculated as:
v=2/(nf−1)+2(nm−1)
As we are examining variance ratios by country, some of the data points were combined for the purposes of the analysis. Countries such as Italy, Spain, Canada and the United States often report data for sub-regions but not consistently over assessments. These were collapsed for the purposes of this study. Where a nation has national and regional data within a given test administration, the subnational data points were used. China and the United Kingdom also report at the level of autonomous states (England, Scotland, Northern Ireland, Taipei, Machao, Shangai and Hong Kong). Countries falling into these states are denoted in the table but are not considered separately for aggregation. Assessments were considered together regardless of whether they were done in the primary or secondary years.
Meta-analysis
To examine the overall size of the variance ratio and to meaningfully quantify heterogeneity, meta-analyses were conducted using Comprehensive Meta-Analysis Version 3 (Borenstein et al. 2013). Many traditional analyses assume that effect size parameters are fixed and relatively homogenous. In this study, we are not assuming homogeneity of these parameters and are thus implementing a random effects model, assuming that effect size parameters are randomly sampled. The use of a random effects model is appropriate where heterogeneity is expected. In this study, we examined heterogeneity by country, whether the countries were OECD member states, test and grade.
Heterogeneity is examined by calculating Q statistics, which can be used to test for equality of effect sizes within and between analysis categories and follow the formulae below:
Q=∑i=1kw(di−d¯)2,
where w=1/v,v=(Nmale+Nfemale)/Ntotal+d2/2(Ntotal), and k is the number of effect sizes.
Q statistics follow a Chi square distribution of k − 1 degrees of freedom (Hedges and Olkin 1985). While significant Q statistics can detect the presence of homogeneity, they are not indicative of its magnitude. They are also sensitive to sample size (Hardy and Thompson 1998; Higgins and Thompson 2002) and its presence is generally expected when analysing large numbers of studies (Higgins 2008).
The mean of the log variance ratios, standard errors and confidence intervals for each country were then calculated (and presented in their un-transformed format for ease of understanding). For each country, we also tabulated the proportion of studies where; (1) the variances were significantly larger for males, (2) the variances were larger for males but not significantly so, (3) the variances were greater for females but not significantly so and finally (4) the variances were significantly greater for females.
Meta-regression
Meta-regression was used to explore and quantify potential sources of heterogeneity. We recorded the mean test score for each country in each year and calculated a weighted effect size of the gender difference between male and female means, as previous work has suggested that this effect size is related to the variance ratio (Baye and Monseur 2016). This was taken as the female mean subtracted from the male mean (a negative score therefore suggests higher scores for females). Using SPSS, this was converted into a standardised effect size (Hedges g) calculated from the effect size d multiplied by the correction factor J (correcting for small sample sizes):
d=μ1−μ2SDpooled
J=1−34df−1
Other additional moderators were derived from test administrations. Previous researchers (discussed earlier) have suggested that some differences may result from test design. As such, the test type, year, test grade and OECD membership were included as moderators to determine if these had a substantial impact on heterogeneity. Baye and Monseur (2016) found small differences in variance ratios between these variables and thus they may be contributing to some of the heterogeneity. Alongside these, the subfactors of the HDI and the GGGI were included to see if other country level contributing factors could account for variation across countries. As consistent data for both these indices is only available from 2006, meta-regression was performed only on cases from test administrations from 2006 onwards.
Results
Analysis of each content domain is presented separately. Countries with only one or two data points are included in the analysis although conclusions about the stability of their variance ratios must be treated cautiously. Variance ratios and their confidence intervals are presented in their un-transformed form for ease of interpretation. The percentage of cases that have a variance ratio below (significantly and non-significantly) and above (significantly and non-significantly) 1, with ratios above 1 representing greater male variance, are also presented. Q statistics and their significance are also reported for each nation.
Mathematics literacy
Table 1 shows the results for this analysis on international mathematics literacy data sources. Each of the 102 individual participating nations is listed in alphabetical order.
Table 1
Variance ratios, confidence intervals and heterogeneity statistics for countries in the domain of mathematics literacy
[tables]
These covariates predicted 31% of heterogeneity in Mathematics Literacy, 46% of the heterogeneity in Science Literacy and 54% of the heterogeneity in Reading. Many of the factors included in the model explain significant amounts of variance in effect sizes however, this varies by domain. By far the most significant predictor is the size of the gender difference in scores (across all three domains). As the gap becomes larger in favour of females, the variance for males increases. The mean score of the country is statistically significant for reading and science literacy but has a very small, positive impact. The same can be said for the test year in mathematics literacy. There are small and significant effects for the tests (with TIMSS and PIRLS showing slightly less male variance) but this is harder to interpret, as it is confounded by age. HDI indicators seem to have little impact on variance ratios, although GNI has a very small positive but statistically significant effect on mathematics literacy and science literacy. GGGI indicators have a stronger, negative impact on national variance ratios however. Countries with higher Economic Participation for women have ratios favouring females across all domains. Better Educational Attainment for women significantly increases the ratios in favour of males however in mathematics literacy and science literacy. Increased political empowerment for women also seems to increase variances for females in literacy.
Discussion
Results broadly confirm the previous works of Baye and Monseur (2016) and suggest that male variances are greater than female variances internationally. This was largely expected as, although the methodology differed, most of the data used in this study was the same. Baye and Monseur showed variances for males were greater by 15% in reading, 12% in maths and 14% in science. Our results indicated that these ratios are 16%, 12% and 13% respectively, and suggest that the inclusion of more recent international surveys has not altered them substantively. Similarly, the correlation between male–female effect sizes and variance ratios was in line with those found by previous authors, with superior female performance increasing the gap in variance between the sexes. As such, we can broadly support the findings of past research and conclude that over the studied period, male variances in the domains of reading, mathematics literacy and science literacy are almost universally greater.
However, these results suggest that we can take this conclusion a step further. Feingold (1994) suggested that a difference of about 10% in variance ratios should be considered a substantive difference. Tables 1, 2 and 3 clearly show that for most countries engaging with PISA, PIRLS and TIMSS assessments, male variances are greater by often more than this threshold in all three domains. There are no geographical areas in this study that show significantly greater female variances. It would seem therefore that the question currently should no longer be, do male and female variances differ, but by how much more varied are males compared with females?
While in over 95% of cases, males show greater amounts of variance, there is a significant heterogeneity in these results, both within and between countries. While we can say with confidence that males are certainly more varied and generate a fairly precise estimate of a global average, we cannot come to an absolute value for each country individually and must contend with a large amount of dispersion. This dispersion is telling however and shows that not only do countries differ (significantly in some cases, as is evident in Figs. 1, 2 and 3) but that they vary internally as well. There is a significant amount of heterogeneity across these data in most countries examined in this study which requires explaining.
Our meta-regression within each domain has gone some way in explaining close to half of the heterogeneity observed in the dataset for reading and science literacy and about a third for mathematics literacy. Some of the findings are harder to interpret than others. The variable with the largest impact is the male–female effect size. This is the most substantive factor across all three domains and suggests that as girls outperform boys, the variability of boys increases. This seems to support earlier works that demonstrated a correlation between effect sizes and variance ratios (Baye and Monseur 2016; Nowell and Hedges 1998). The mean score for the country also has a significant albeit smaller impact in the same direction for science literacy and reading. Countries that perform better on average are therefore more likely to have greater variability for boys.
PISA tests appear to result in slightly more variance for males than TIMSS and PIRLS. Baye and Monseur (2016) found slightly smaller ratios in the primary years across all three domains. As TIMSS and PIRLS assess younger children, it may be that this simply reflects an age or maturity effect. However, we cannot rule out that the actual tests themselves are not causing some of the heterogeneity or, that there may be a compositional effect between the two.
Interestingly, most of the HDI indicators were not significantly predictive of variance ratios across domains. The exception to this appears to be the GNI indicator (an adjusted form of GDP per capita) for mathematics literacy and science literacy but not reading. Reading is a specific skill that requires mastery and is often contingent on home environments for reinforcement. While this is to an extent true of basic mathematical concepts, later mathematics and science are likely tied more strongly to whatever specific curriculum is delivered, and this is largely coordinated at a national level. This may explain why national wealth may impact more upon maths and science as opposed to reading. However, it should be noted that, despite its statistical significance, it has only a minute impact on increasing male variance.
Measures from the Global Gender Gap Index however seem to have a larger impact on variance ratios. Increasing female economic participation appears to increase levels of female variance across all three domains. This suggests that countries actively incorporating more women into the labour force has an impact on educational outputs. Increased political empowerment for women also increases female variances in reading. Increased educational attainment for women has mixed impacts however. It has a significant effect of increasing male variances in mathematics and science but a non-significant effect of increasing variance for women in reading. Taken together, it suggests that cultural practices tied to increasing female participation generally appear to increase variances for females and suggests that greater male variance in educational outcomes may be practically reduced on national levels. While this study cannot isolate what specific national level practices are responsible for this, it does lead to interesting further questions regarding the processes underlying male/female variability.
The year of the test also had a very small but statistically significant effect on variance ratios in mathematics literacy. As with the test variable itself, why precisely this should be the case is difficult to rationalise. As mentioned earlier, there could be specific test administrations which have differences that create a small, positive effect. Alternatively, it could be that national educational systems have been adapting educational practices in order to improve their position in international rankings, and that these new practices are impacting upon the spread of scores. From this data alone, we can only speculate on the specifics as to why this may be the case.
Limitations and future work
There are several limitations to the data and the procedure we have used to explore it. First, a meta-analysis of international assessments such as PISA, PIRLS and TIMSS, while it controls for many extraneous variables not possible to account for in a meta-analysis via a literature search, does limit generalizability to alternative educational assessments. There could be something specific to these assessments that creates this effect. A limitation perhaps related to this applies to the assessments themselves. In PISA, the content being assessed is heavily based in literacy abilities. Even mathematics and science components are rooted in the ability to read and poor readers are unlikely to achieve if they cannot interpret the questions posed. As is evident from Table 2, the domain with the greatest amount of male variability is reading. As such, it is possible that mathematics and science show comparable overall ratios simply because they are rooted heavily in the ability to read. It is interesting to note that previous works using different assessments have shown greater variabilities in quantitative domains compared to verbal ones (Lakin 2013; Lohman and Lakin 2009). Thus, what this data may perhaps be showing is the greater variability in reading generally. This is still important and would pose the question ‘why are males more variable at reading’ but we must therefore be cautious regarding the conclusions we draw from the mathematics and science domains.
This study tentatively suggests (as does Baye and Monseur 2016) that age may be a factor, and that variability for males increases as candidates get older. To our knowledge, no study specifically examines this, either longitudinally or cross-sectionally (with perhaps the exception of Lakin 2013). Alternatively, attempting to quantify nation specific factors that could be included in additional regression analyses may be a future avenue worth exploring (particularly considering the impact of GGGI variables on ratios), potentially allowing us to quantify greater levels of heterogeneity in these results.
A final avenue of exploration would be to examine this effect over additional academic assessments. Research historically focuses on core domains of reasoning (Baye and Monseur 2016; Lohman and Lakin 2009; Strand et al. 2006). While this is important, do we get similar patterns across curricular subject examinations (anything from art to zoology), or different modes of assessment (pencil and paper tests compared to practical performance assessments)? These are often studied less, in part due to reasons of sample representation, or the fact that specific subjects are often self-selecting. As it stands from the data and the literature reviewed here, we would expect to see similar patterns across assessments generally. It would be telling if this was not the case. If there are exceptions, what are they and why do they differ?
Implications for theory and policy
From a theoretical perspective, we cannot contribute causal explanations for why males are more variable. Data suggests the effect is almost universal, which, while supportive of biological and evolutionary theories, doesn’t rule out specific cultural, educational, political, social or religious practices. Indeed, the fact that we can quantify substantial variation as dependent on increased female participation in society suggests that, at least in educational outcomes, it is not necessarily the case that males should vary more.
However, without a clear understanding of why males vary more and how this difference is maintained, we acknowledge that a meaningful discussion regarding what can be done to ensure parity is difficult. Increased female participation in the economy, education and political empowerment significantly reduce the size of the discrepancy in variances between males and females across the three educational domains studied here. If these increase, we might expect the variance gap to decrease. Which specific practices within countries are enabling this however are not discernible from the existing data, and more comparative, in-depth work within nations (with closer attention to specific educational practices) would be required before specific policy recommendations could be formulated to ensure parity between males and females across the ability distribution.
Differences in the spread of abilities are important for society. If, for example, we want to increase the representation of women in top positions and educational institutions, so that parity between the sexes exists at this level, it is important that males and females are equally represented in the higher percentiles of whatever qualifications or ability metrics that constitute the selection processes. Similarly, the large gap in reading ability between boys and girls in the lower percentiles (Baye and Monseur 2016) suggests that some boys are likely to be at a serious disadvantage in later education (and potentially later life outcomes). Whilst implementing measures that strive for parity in the right tail of the distribution are important, we must also be mindful to not neglect the left.
Conclusions
Our analysis seems to suggest that greater male variability is currently universal in internationally comparable assessments implemented over the past decade. However, this effect is far from homogenous, and there are quantifiable differences that exist over nations. Furthermore, some of this heterogeneity can be attributed to some yet unspecified practices or policies targeted at increasing male–female equality, general male–female performance as well as potentially the age of candidates and the type of test. Further work however is required to examine these factors in more detail, and analyses within nations may be informative to examine more specific practices that can explain national patterns. Comparative work examining high and low scoring GGGI countries may be informative in this endeavour. In doing so, it may be possible to determine if the root cause of these differences in distributions are attributable to some species universal mechanism or some other social or cultural phenomenon.
Abstract: A recent study by Baye and Monseur (Large Scale Assess Educ 4:1–16, 2016) using large, international educational data sets suggest that the “greater male variation hypothesis” is well supported. Males are often over-represented at the tails of the ability distribution despite similarity in measures of central tendency and the gradual closing of the attainment gap relative to females. In this study, we replicate and expand Baye and Monseur’s work, and explore greater male variability by country using meta-analysis and meta-regression. While we broadly confirm that variability is greater for males internationally, we find that there is significant heterogeneity between countries, and that much of this can be quantified using variables applicable across these assessments (such as test, year, male–female effect size, mean country score and Global Gender Gap Indicators). While it is still not possible to make any causal conclusions regarding why males are more varied than females in academic assessments, it is possible to show that some national level variables effect the magnitude of this variation. Results and suggestions for further work are discussed.
Introduction
Sex differences in cognitive abilities is a contentious issue, yet one that continues to draw the attention of the public and the research community alike. 21st Century society is motivated to ensure issues of equity between the sexes are adequately addressed, particularly within the sphere of educational opportunity (Marks 2008; UNESCO 2011). Despite best efforts however, inequities still exist internationally, for example, with females underrepresented in our most prestigious educational institutions and males overrepresented in school underperformance, particularly in core areas such as reading and mathematics (Baye and Monseur 2016; Dubet 2010; Jacobs 1996; Morgan and Kett 2003; Quinn and Wagner 2013).
While the evidence shows that the gap between men and women is closing on average across many educational outcomes (Hyde et al. 1990; OECD 2015) and, in some cases, it now favours women (Lietz 2006; OECD 2015), this shift does not appear to have translated directly into ensuring parity across higher professions and positions, a phenomenon which appears somewhat paradoxical. Baye and Monseur (2016) suggested that this may be due to the way in which sex differences have been historically examined, focussing on mean results which assume homogeneity of variance across the achievement distribution. In a study using international assessment data, they demonstrated that the magnitude of the sex differences in achievement across literacy, mathematics and science varied across the range of results, and that the largest differences are seen at the extreme tails of the distribution. Girls tended to outperform boys at both tails of the distribution on reading measures, and in the lower percentiles of mathematics and science, while boys outperformed girls in the higher percentiles of mathematics and science. While the differences at the top of the distribution were of note, they called attention to the fact that inequities in the lower percentiles of the distribution were much more striking.
Baye and Monseur (2016) also examined the variance ratios of boys and girls on these assessments and found that in 93% of cases, variances for boys were higher. The finding of greater male variances in assessments here is not in and of itself original and has been noted in studies for many decades (although rarely as a core focus). The “greater male variability” hypothesis in fact has its roots in the 19th century (Ellis 1894). However, if we are to understand differences between the sexes at different points of the distribution, we must attempt to determine how their respective distributions differ. It is to the issue of differences in variability, not average performance, that the rest of this paper attempts to address building on earlier work.
Male and female variability
Differences in the spread of scores between males and females have been noted in educational assessments for a long time, although often with contrasting findings. Maccoby and Jacklin (1974) showed that males were more variable than females in mathematical and spatial abilities, whereas variances showed parity in verbal measures. Feingold (1992) found larger male variances in the domains of general reasoning, mechanical reasoning, abstract reasoning, quantitative and spatial abilities, perceptual speed, memory and on verbal test batteries. Strand et al. (2006) found similar patterns in the domains of verbal, quantitative and non-verbal reasoning on a representative sample of 11-year olds in the UK, with greater male variances ranging between 7 and 17%. Similar results on U.S. students were found by Lohman and Lakin (2009) and later, Lakin (2013). IQ scores have also shown to reflect the same pattern (Johnson et al. 2008). Finally, assessments of non-cognitive and behavioural domains such as creativity (He et al. 2013; Karowski et al. 2016), sensation seeking (Cross et al. 2011), personality (Borkenau et al. 2013) and aggression (Archer and Mehdikhani 2003) appear subject to the effect. Combine these findings with the work reported earlier from Baye and Monseur and the fact that the above represents only a fraction of reported findings, one can see why many consider greater male variability to be ubiquitous.
Yet despite the volume of work related to differences in variances between the sexes, there has been little systematic attempt to explain this phenomenon (either partially or in its entirety). This is likely in part due to the contention that studies on sex differences in abilities tends to bring with it. Feingold (1992) noted that the explanation for greater male variability has become a polarised nature versus nurture debate. As a result, many empirical papers avoid proposing an explanation. Johnson et al. (2008) point out that although results have often seemed clear, studies are often attacked on methodological grounds pertaining to sample size, representativeness, sample selectivity and age amongst other things. While it is not our intent to repeat the full history of the greater male variability hypothesis (see Johnson et al. for an in-depth review) we will briefly consider some of the proposed explanations for this effect.
Explanations for greater male variability
As Feingold claimed, arguments regarding biological innateness are often invoked for theories of sex differences in cognitive and behavioural domains. Early theories (Ounsted and Taylor 1972) focused on the Y chromosome, claiming that differences in gene expression resulted in slower development and expressed more harmful as well as more beneficial traits, which would presumably lead to more variability in males. Gualtieri and Hicks (1985) suggested such differences could emerge from differences in the uterine environment, making males more differentially susceptible to physical and psychological disorders over the lifespan.
Evolutionary theories suggest that ancient adaptive mechanisms produced greater male variability to enhance survival in ancestral environments and that they are still in operation today. Evolutionary theories are based on sexual selection theory and parental investment theories (see Archer and Mehdikhani 2003 for a comprehensive review) and they would ultimately result in males showing greater variation across a range of traits in order to ensure reproductive fitness. Hill (2017) proposed two mathematical models simulating how one sex could have become more variable over evolutionary time if one sex in our ancestral past (presumably females in the case of homo sapiens, although Hill makes no explicit assumption) is more selective of the other for the purposes of mating, and that this greater variability will be independent of other measures of central tendency. Hill also suggested that in such circumstances where the selective sex is no longer being as selective, greater variability in the selected sex may in fact decline over successive generations. No direct test of this latter hypothesis has been made however.
While many support the biological and evolutionary basis for greater male variability, there are some shortfalls in this interpretation, as well as additional potential explanations as to why males are perhaps more variable. Miller (2001) claimed that susceptibility to defects resulting from prenatal conditions would only explain why males are overrepresented in the lower, not the higher tail of a distribution. As early as (1922), Hollingworth argued for an explanation based on gender roles, claiming that male employment, compared to the more restricted home role of women, allowed them the opportunity for greater diversification in education and environmental experiences. Noddings (1992) highlighted the issue of conformity, claiming that while most girls worked hard enough to avoid being in the bottom of the distribution in class, brighter girls are often pressured into not demonstrating the full extent of their abilities. Ceci et al. (2009) argued that biological accounts of differences in quantitative fields between the sexes are largely inconsistent and suggested that female preferences were a better explanation of underrepresentation in some professions. Critics of the evolutionary perspective also argue that if this phenomenon resulted from innate, evolved mechanisms, invariance of this effect across cultures would be expected. Several previous studies indicate that some nations show greater male variation, others greater female variation and many show homogeneity of variance (Feingold 1994). Feingold went on to attribute heterogeneity in his data to social and cultural factors rather than any innate biological mechanism. Feingold (1992) also argued that national test norms alone may not be sufficiently generalizable to afford definitive proof of a biological origin of greater male variability. However, more recent studies using international assessments such as PISA, PIRLS and TIMSS do seem to suggest that variability is greater for males in the domains of reading and mathematics across cultures (Baye and Monseur 2016; Machin and Pekkarinen 2008).
There has been some suggestion that elements of test design may also play a role in magnifying sex differences in terms of measures of central tendency and variances. Spelke (2005) claimed that supposed differences in ability, particularly in mathematics and science, resulted largely from item and test biases favouring males, and that research generally fails to support the greater male variability hypothesis in these domains. Lakin (2013) supports this to an extent, suggesting that changes to Cognitive Ability Tests (specifically, the introduction of new quantitative reasoning items with a lesser verbal load) may have been responsible for shifting more males into the upper echelons of the distribution compared with earlier versions of the assessment. Strand et al. however found few substantive sex differences related to item difficulties in non-verbal and verbal batteries and suggested that test construction was unlikely the root cause of differences in variability. They made a tentative suggestion that a speed-accuracy trade off favouring boys may account for some of the variability differences in quantitative domains, but cautiously note that that previous research has mirrored these effects in untimed assessments (such as Feingold 1992). Lakin also noted that the consistent trend of increasing variance ratios between cognitive ability tests at grades 4 and 7 is likely to be something more systematic than simple test design and potentially reflects changes to society in terms of educational opportunity and personal educational preferences. Arguments focussing purely on test construction and procedure are thus hard to substantiate in the current literature.
Machin and Pekkarinen (2008) highlighted a compositional effect of sex differences in central tendency and distribution of scores. In their analysis of TIMSS and PIRLS data in 15-year olds, they noted that greater male variance in maths was attributable to overrepresentation of males in the higher part of the test distribution, with males outperforming females on average. In reading, male overrepresentation was largely at the bottom of the distribution, with females outperforming males on average. Indeed, Nowell and Hedges (1998) found a correlation of 0.74 between variance ratios and male–female effect sizes. Baye and Monseur found a smaller overall correlation of 0.42. However, they noted that the strength of the relationship varied by the point in the distribution. At the 5th percentile, the relationship was 0.50. At the 95th percentile, this had declined to 0.31. These results seem to suggest that variability for males increases in line with superior female performance, particularly at the lower end of the distribution.
The current study
While Feingold’s work (1994) failed to show a consistent greater male variance in international test scores, this could be attributable to the methodology. He conducted a meta-analysis by searching the literature for reading, mathematics and spatial measures, which carries many issues with it including many different tests, test administrations, issues of representation etc. Baye and Monseur (2016), using more recently available international assessments (PISA, PIRLS and TIMSS) found different results, suggesting that greater male variability was effectively universal. They found that variances (on average) were 15% greater for males in reading, 12% greater in maths and 14% greater in science. Even using Feingold’s (1994) conservative estimate of any ratio falling between 0.90 and 1.10 as not representing evidence of greater variance, Baye and Monseur’s work is suggestive of greater male variability. The advantage of using these international assessments is that they are designed to be internationally comparable, with representative samples of children selected in each country and administered in a standardised fashion. This helps remove potentially confounding factors that may impact on assessment results.
However, Baye and Monseur’s work leaves many questions unanswered. How similar are countries to each other in terms of variance ratios, and are there some that are much more male biased than others? If countries vary in terms of male and female variances, are there any recorded factors that may account for this? Baye and Monseur did make some attempt to look at differences between primary and secondary school measures, as well as by IEA and OECD membership, but beyond this, no systematic heterogeneity analysis was conducted. Yet analysing heterogeneity is important and can be revealing. Furthermore, this international data could be linked to cross-country metrics that may elucidate meaningful patterns of variation. For example, Borkenau et al. (2013) showed that differences across countries in variances in personality were significantly linked to national measures of gender inequality and human development. Given earlier suggestions by Hollingworth (1922) that variances favouring males are largely due to gender roles, and later works (Ceci et al. 2009; Lakin 2013) suggesting that societal practices and female choice are likely to have a major impact on variance ratios, international indices of societal development, particularly forms of gender inequality, are potential sources that could be used to explain any cross-national heterogeneity. To our knowledge, this has not been examined in the context of large-scale international assessments.
In this study, we attempt to answer these questions and extend our knowledge surrounding the nature of greater male variability. We examined the same data sets used by Baye and Monseur, with the addition of more recent test administrations from years 2015 and 2016, to (1) replicate their findings using meta-analysis, (2) determine if greater male variability is homogenous both within and between countries and (3) quantify any meaningful sources of heterogeneity. For the purposes of the third aim, we link these data to international metrics on human progress (Human Development Index) and male–female participation in education, labour forces and politics (Global Gender Gap Index) as well as examining test specific factors such as grade, test, OECD membership, the size of the male–female difference at the mean and national means.
Method
Data sources
Data from three major international assessments were selected to allow an examination of variance ratios across countries: OECD PISA (Programme for International Student Assessment; 2000, 2003, 2006, 2009, 2012, 2015), IEA PIRLS (Progress in International Reading Literacy Study; 2001, 2006, 2011) and IEA TIMSS (Trends in International Mathematics and Science Study; 1995, 1998, 1999, 2003, 2007, 2008, 2011, 2015). These were selected due to having multiple testing points over time and having a wide coverage of countries across the globe. All data is freely available from the OECD website (http://www.pisa.oecd.org) and IEA Study Data Repository (http://rms.iea-dpc.org). Methodological information is available in the technical reports on each survey (Adams and Wu 2002; Martin et al. 2000, 2003, 2004, 2007, 2016; Martin and Kelly 1996, 1997; Martin and Mullis 1996, 2012; OECD 2005, 2009a, 2014, 2016; Olson et al. 2008).
International data on Human Development was also collected where available for each country. The Human Development Index (HDI) is made up of four sub-factors: expected years of schooling for children of school entry age, mean years of schooling for adults aged 25 and above, life expectancy and gross national income per capita (GNI). This data is freely available from the United Nations Development program website (http://hdr.undp.org/en/data).
International data on gender inequality was also gathered from the Global Gender Gap project. The Global Gender Gap Index (GGGI) is made of four sub-factors: economic participation, educational attainment, health and survival and political empowerment. Each factor represents an outcome and is measured on a scale of 0 to 1, where a score of 1 would represent parity between males and females. Data is freely available from the World Economic Forum’s website (http://reports.weforum.org).
Sample
Data from each country surveyed within each of the assessments was included in this analysis. For the purposes of this study, we used measures from three content areas: literacy, maths literacy and science literacy. In total, we included 564 cases for literacy, 1054 cases for mathematics literacy and 991 cases for science literacy gathered from over 100 nations worldwide (where each case represents a national test occurrence within a given year and within a specific content area). In terms of population size across all cases, in mathematics literacy it consists of 2,507,046 males and 2,512,273 females, for reading 1,471,698 males and 1,486,578 females and for science literacy 2,512,559 males and 2,515,645 females. It should be noted that for science literacy, we did not use data from TIMSS Advanced as these measures focussed on concepts from Physics only.
Data calculations
Statistics were calculated by generating means and standard deviations for males and females within each country for each measure within each assessment. These were calculated using each of the five plausible values within each database and aggregated according to the methodologies supplied by the OECD and IEA in their analyses manuals (OECD 2009b; Martin et al. 2016). Standard errors for these statistics were calculated using replicate weights within each database (80 Fay weights in PISA and 75 JK2 replicates in PIRLS and TIMSS). SPSS (V22; IBM Corp 2013) was used to calculate these statistics (see OECD 2009b; Martin et al. 2016 for technical details regarding the SPSS macros used to compute these statistics).
Variances were calculated from the standard deviations. The ratio of male to female variances was taken by dividing the male variance by the female variance. A variance ratio greater than one would indicate that the male variance is higher than the female variance. Variance ratios are a common method of examining variability between the sexes (see Hedges and Friedman 1993; Baye and Monseur 2016). In keeping with previous authors (Hedges and Friedman 1993; Katzman and Alliger 1992), but not Baye and Monseur (2016), ratios were logarithmically transformed to increase precision of the estimates and to avoid overestimation, as it ensures a normal distribution. Assuming that the log of the variances follows a normal distribution, the variances of these ratios were then calculated as:
v=2/(nf−1)+2(nm−1)
As we are examining variance ratios by country, some of the data points were combined for the purposes of the analysis. Countries such as Italy, Spain, Canada and the United States often report data for sub-regions but not consistently over assessments. These were collapsed for the purposes of this study. Where a nation has national and regional data within a given test administration, the subnational data points were used. China and the United Kingdom also report at the level of autonomous states (England, Scotland, Northern Ireland, Taipei, Machao, Shangai and Hong Kong). Countries falling into these states are denoted in the table but are not considered separately for aggregation. Assessments were considered together regardless of whether they were done in the primary or secondary years.
Meta-analysis
To examine the overall size of the variance ratio and to meaningfully quantify heterogeneity, meta-analyses were conducted using Comprehensive Meta-Analysis Version 3 (Borenstein et al. 2013). Many traditional analyses assume that effect size parameters are fixed and relatively homogenous. In this study, we are not assuming homogeneity of these parameters and are thus implementing a random effects model, assuming that effect size parameters are randomly sampled. The use of a random effects model is appropriate where heterogeneity is expected. In this study, we examined heterogeneity by country, whether the countries were OECD member states, test and grade.
Heterogeneity is examined by calculating Q statistics, which can be used to test for equality of effect sizes within and between analysis categories and follow the formulae below:
Q=∑i=1kw(di−d¯)2,
where w=1/v,v=(Nmale+Nfemale)/Ntotal+d2/2(Ntotal), and k is the number of effect sizes.
Q statistics follow a Chi square distribution of k − 1 degrees of freedom (Hedges and Olkin 1985). While significant Q statistics can detect the presence of homogeneity, they are not indicative of its magnitude. They are also sensitive to sample size (Hardy and Thompson 1998; Higgins and Thompson 2002) and its presence is generally expected when analysing large numbers of studies (Higgins 2008).
The mean of the log variance ratios, standard errors and confidence intervals for each country were then calculated (and presented in their un-transformed format for ease of understanding). For each country, we also tabulated the proportion of studies where; (1) the variances were significantly larger for males, (2) the variances were larger for males but not significantly so, (3) the variances were greater for females but not significantly so and finally (4) the variances were significantly greater for females.
Meta-regression
Meta-regression was used to explore and quantify potential sources of heterogeneity. We recorded the mean test score for each country in each year and calculated a weighted effect size of the gender difference between male and female means, as previous work has suggested that this effect size is related to the variance ratio (Baye and Monseur 2016). This was taken as the female mean subtracted from the male mean (a negative score therefore suggests higher scores for females). Using SPSS, this was converted into a standardised effect size (Hedges g) calculated from the effect size d multiplied by the correction factor J (correcting for small sample sizes):
d=μ1−μ2SDpooled
J=1−34df−1
Other additional moderators were derived from test administrations. Previous researchers (discussed earlier) have suggested that some differences may result from test design. As such, the test type, year, test grade and OECD membership were included as moderators to determine if these had a substantial impact on heterogeneity. Baye and Monseur (2016) found small differences in variance ratios between these variables and thus they may be contributing to some of the heterogeneity. Alongside these, the subfactors of the HDI and the GGGI were included to see if other country level contributing factors could account for variation across countries. As consistent data for both these indices is only available from 2006, meta-regression was performed only on cases from test administrations from 2006 onwards.
Results
Analysis of each content domain is presented separately. Countries with only one or two data points are included in the analysis although conclusions about the stability of their variance ratios must be treated cautiously. Variance ratios and their confidence intervals are presented in their un-transformed form for ease of interpretation. The percentage of cases that have a variance ratio below (significantly and non-significantly) and above (significantly and non-significantly) 1, with ratios above 1 representing greater male variance, are also presented. Q statistics and their significance are also reported for each nation.
Mathematics literacy
Table 1 shows the results for this analysis on international mathematics literacy data sources. Each of the 102 individual participating nations is listed in alphabetical order.
Table 1
Variance ratios, confidence intervals and heterogeneity statistics for countries in the domain of mathematics literacy
[tables]
These covariates predicted 31% of heterogeneity in Mathematics Literacy, 46% of the heterogeneity in Science Literacy and 54% of the heterogeneity in Reading. Many of the factors included in the model explain significant amounts of variance in effect sizes however, this varies by domain. By far the most significant predictor is the size of the gender difference in scores (across all three domains). As the gap becomes larger in favour of females, the variance for males increases. The mean score of the country is statistically significant for reading and science literacy but has a very small, positive impact. The same can be said for the test year in mathematics literacy. There are small and significant effects for the tests (with TIMSS and PIRLS showing slightly less male variance) but this is harder to interpret, as it is confounded by age. HDI indicators seem to have little impact on variance ratios, although GNI has a very small positive but statistically significant effect on mathematics literacy and science literacy. GGGI indicators have a stronger, negative impact on national variance ratios however. Countries with higher Economic Participation for women have ratios favouring females across all domains. Better Educational Attainment for women significantly increases the ratios in favour of males however in mathematics literacy and science literacy. Increased political empowerment for women also seems to increase variances for females in literacy.
Discussion
Results broadly confirm the previous works of Baye and Monseur (2016) and suggest that male variances are greater than female variances internationally. This was largely expected as, although the methodology differed, most of the data used in this study was the same. Baye and Monseur showed variances for males were greater by 15% in reading, 12% in maths and 14% in science. Our results indicated that these ratios are 16%, 12% and 13% respectively, and suggest that the inclusion of more recent international surveys has not altered them substantively. Similarly, the correlation between male–female effect sizes and variance ratios was in line with those found by previous authors, with superior female performance increasing the gap in variance between the sexes. As such, we can broadly support the findings of past research and conclude that over the studied period, male variances in the domains of reading, mathematics literacy and science literacy are almost universally greater.
However, these results suggest that we can take this conclusion a step further. Feingold (1994) suggested that a difference of about 10% in variance ratios should be considered a substantive difference. Tables 1, 2 and 3 clearly show that for most countries engaging with PISA, PIRLS and TIMSS assessments, male variances are greater by often more than this threshold in all three domains. There are no geographical areas in this study that show significantly greater female variances. It would seem therefore that the question currently should no longer be, do male and female variances differ, but by how much more varied are males compared with females?
While in over 95% of cases, males show greater amounts of variance, there is a significant heterogeneity in these results, both within and between countries. While we can say with confidence that males are certainly more varied and generate a fairly precise estimate of a global average, we cannot come to an absolute value for each country individually and must contend with a large amount of dispersion. This dispersion is telling however and shows that not only do countries differ (significantly in some cases, as is evident in Figs. 1, 2 and 3) but that they vary internally as well. There is a significant amount of heterogeneity across these data in most countries examined in this study which requires explaining.
Our meta-regression within each domain has gone some way in explaining close to half of the heterogeneity observed in the dataset for reading and science literacy and about a third for mathematics literacy. Some of the findings are harder to interpret than others. The variable with the largest impact is the male–female effect size. This is the most substantive factor across all three domains and suggests that as girls outperform boys, the variability of boys increases. This seems to support earlier works that demonstrated a correlation between effect sizes and variance ratios (Baye and Monseur 2016; Nowell and Hedges 1998). The mean score for the country also has a significant albeit smaller impact in the same direction for science literacy and reading. Countries that perform better on average are therefore more likely to have greater variability for boys.
PISA tests appear to result in slightly more variance for males than TIMSS and PIRLS. Baye and Monseur (2016) found slightly smaller ratios in the primary years across all three domains. As TIMSS and PIRLS assess younger children, it may be that this simply reflects an age or maturity effect. However, we cannot rule out that the actual tests themselves are not causing some of the heterogeneity or, that there may be a compositional effect between the two.
Interestingly, most of the HDI indicators were not significantly predictive of variance ratios across domains. The exception to this appears to be the GNI indicator (an adjusted form of GDP per capita) for mathematics literacy and science literacy but not reading. Reading is a specific skill that requires mastery and is often contingent on home environments for reinforcement. While this is to an extent true of basic mathematical concepts, later mathematics and science are likely tied more strongly to whatever specific curriculum is delivered, and this is largely coordinated at a national level. This may explain why national wealth may impact more upon maths and science as opposed to reading. However, it should be noted that, despite its statistical significance, it has only a minute impact on increasing male variance.
Measures from the Global Gender Gap Index however seem to have a larger impact on variance ratios. Increasing female economic participation appears to increase levels of female variance across all three domains. This suggests that countries actively incorporating more women into the labour force has an impact on educational outputs. Increased political empowerment for women also increases female variances in reading. Increased educational attainment for women has mixed impacts however. It has a significant effect of increasing male variances in mathematics and science but a non-significant effect of increasing variance for women in reading. Taken together, it suggests that cultural practices tied to increasing female participation generally appear to increase variances for females and suggests that greater male variance in educational outcomes may be practically reduced on national levels. While this study cannot isolate what specific national level practices are responsible for this, it does lead to interesting further questions regarding the processes underlying male/female variability.
The year of the test also had a very small but statistically significant effect on variance ratios in mathematics literacy. As with the test variable itself, why precisely this should be the case is difficult to rationalise. As mentioned earlier, there could be specific test administrations which have differences that create a small, positive effect. Alternatively, it could be that national educational systems have been adapting educational practices in order to improve their position in international rankings, and that these new practices are impacting upon the spread of scores. From this data alone, we can only speculate on the specifics as to why this may be the case.
Limitations and future work
There are several limitations to the data and the procedure we have used to explore it. First, a meta-analysis of international assessments such as PISA, PIRLS and TIMSS, while it controls for many extraneous variables not possible to account for in a meta-analysis via a literature search, does limit generalizability to alternative educational assessments. There could be something specific to these assessments that creates this effect. A limitation perhaps related to this applies to the assessments themselves. In PISA, the content being assessed is heavily based in literacy abilities. Even mathematics and science components are rooted in the ability to read and poor readers are unlikely to achieve if they cannot interpret the questions posed. As is evident from Table 2, the domain with the greatest amount of male variability is reading. As such, it is possible that mathematics and science show comparable overall ratios simply because they are rooted heavily in the ability to read. It is interesting to note that previous works using different assessments have shown greater variabilities in quantitative domains compared to verbal ones (Lakin 2013; Lohman and Lakin 2009). Thus, what this data may perhaps be showing is the greater variability in reading generally. This is still important and would pose the question ‘why are males more variable at reading’ but we must therefore be cautious regarding the conclusions we draw from the mathematics and science domains.
This study tentatively suggests (as does Baye and Monseur 2016) that age may be a factor, and that variability for males increases as candidates get older. To our knowledge, no study specifically examines this, either longitudinally or cross-sectionally (with perhaps the exception of Lakin 2013). Alternatively, attempting to quantify nation specific factors that could be included in additional regression analyses may be a future avenue worth exploring (particularly considering the impact of GGGI variables on ratios), potentially allowing us to quantify greater levels of heterogeneity in these results.
A final avenue of exploration would be to examine this effect over additional academic assessments. Research historically focuses on core domains of reasoning (Baye and Monseur 2016; Lohman and Lakin 2009; Strand et al. 2006). While this is important, do we get similar patterns across curricular subject examinations (anything from art to zoology), or different modes of assessment (pencil and paper tests compared to practical performance assessments)? These are often studied less, in part due to reasons of sample representation, or the fact that specific subjects are often self-selecting. As it stands from the data and the literature reviewed here, we would expect to see similar patterns across assessments generally. It would be telling if this was not the case. If there are exceptions, what are they and why do they differ?
Implications for theory and policy
From a theoretical perspective, we cannot contribute causal explanations for why males are more variable. Data suggests the effect is almost universal, which, while supportive of biological and evolutionary theories, doesn’t rule out specific cultural, educational, political, social or religious practices. Indeed, the fact that we can quantify substantial variation as dependent on increased female participation in society suggests that, at least in educational outcomes, it is not necessarily the case that males should vary more.
However, without a clear understanding of why males vary more and how this difference is maintained, we acknowledge that a meaningful discussion regarding what can be done to ensure parity is difficult. Increased female participation in the economy, education and political empowerment significantly reduce the size of the discrepancy in variances between males and females across the three educational domains studied here. If these increase, we might expect the variance gap to decrease. Which specific practices within countries are enabling this however are not discernible from the existing data, and more comparative, in-depth work within nations (with closer attention to specific educational practices) would be required before specific policy recommendations could be formulated to ensure parity between males and females across the ability distribution.
Differences in the spread of abilities are important for society. If, for example, we want to increase the representation of women in top positions and educational institutions, so that parity between the sexes exists at this level, it is important that males and females are equally represented in the higher percentiles of whatever qualifications or ability metrics that constitute the selection processes. Similarly, the large gap in reading ability between boys and girls in the lower percentiles (Baye and Monseur 2016) suggests that some boys are likely to be at a serious disadvantage in later education (and potentially later life outcomes). Whilst implementing measures that strive for parity in the right tail of the distribution are important, we must also be mindful to not neglect the left.
Conclusions
Our analysis seems to suggest that greater male variability is currently universal in internationally comparable assessments implemented over the past decade. However, this effect is far from homogenous, and there are quantifiable differences that exist over nations. Furthermore, some of this heterogeneity can be attributed to some yet unspecified practices or policies targeted at increasing male–female equality, general male–female performance as well as potentially the age of candidates and the type of test. Further work however is required to examine these factors in more detail, and analyses within nations may be informative to examine more specific practices that can explain national patterns. Comparative work examining high and low scoring GGGI countries may be informative in this endeavour. In doing so, it may be possible to determine if the root cause of these differences in distributions are attributable to some species universal mechanism or some other social or cultural phenomenon.
Exposure to female estrous, a natural rewarding experience, alleviates anxiety & depression, & is beneficial to recovery following transient ischemic stroke in male mice
Exposure to female estrous is beneficial for male mice against transient ischemic stroke. Yuan Qiao, Qing Ma, Haifeng Zhai, Ya Li & Minke Tang. Neurological Research, Feb 27 2019, https://doi.org/10.1080/01616412.2019.1580461
Objective: Exposure to female estrous, a natural rewarding experience, alleviates anxiety and depression, and the contribution of this behavior to stroke outcome is unknown. The aim of this study was to evaluate whether exposure to female estrous is beneficial to recovery following transient ischemic stroke in male mice.
Methods: Cerebral ischemia was induced in male ICR mice with thread occlusion of the middle cerebral artery (MCAO) for 30 min followed by reperfusion. MCAO mice were randomly divided into MCAO group and Estrous Female Exposure (EFE) group. The mice in the EFE group were subjected to estrous female mouse interaction from day 1 until the end of the experiment. Mortality was recorded during the investigation. Behavioral functions were assessed by a beam-walking test and corner test from day 1 to day 10 after MCAO. Serum testosterone levels were analyzed with ELISA, and the expression levels of growth-associated protein-43 (GAP-43) and synaptophysin in the cortex of the ischemic hemisphere were determined by western blot on day 7 after MCAO.
Results: Exposure to female estrous reduced the mortality induced by cerebral ischemic lesions. The beam-walking test demonstrated that exposure to female estrous significantly improved motor function recovery. The serum testosterone levels and ischemic cortex GAP-43 expression were significantly higher in MCAO male mice exposed to female estrous.
Conclusion: Exposure to female estrous reduces mortality and improves functional recovery in MCAO male mice. The study provides the first evidence to support the importance of female interaction to male stroke rehabilitation.
Abbreviations: GAP-43: growth-associated protein-43; SYP: Synaptophysin; MCAO: middle cerebral artery occlusion; OVXs: ovariectomies; CCA: common carotid artery; ECA: external carotid artery; EFE: estrous female exposure; TTC: 2,3,5-triphenyltetrazolium chloride; PAGE: polyacrylamide gel electrophoresis; PVDF: polyvinylidene difluoride; ANOVA: analysis of variance; LSD: least significant difference
KEYWORDS: Exposure to female estrous, transient ischemic stroke, GAP-43, SYP
Objective: Exposure to female estrous, a natural rewarding experience, alleviates anxiety and depression, and the contribution of this behavior to stroke outcome is unknown. The aim of this study was to evaluate whether exposure to female estrous is beneficial to recovery following transient ischemic stroke in male mice.
Methods: Cerebral ischemia was induced in male ICR mice with thread occlusion of the middle cerebral artery (MCAO) for 30 min followed by reperfusion. MCAO mice were randomly divided into MCAO group and Estrous Female Exposure (EFE) group. The mice in the EFE group were subjected to estrous female mouse interaction from day 1 until the end of the experiment. Mortality was recorded during the investigation. Behavioral functions were assessed by a beam-walking test and corner test from day 1 to day 10 after MCAO. Serum testosterone levels were analyzed with ELISA, and the expression levels of growth-associated protein-43 (GAP-43) and synaptophysin in the cortex of the ischemic hemisphere were determined by western blot on day 7 after MCAO.
Results: Exposure to female estrous reduced the mortality induced by cerebral ischemic lesions. The beam-walking test demonstrated that exposure to female estrous significantly improved motor function recovery. The serum testosterone levels and ischemic cortex GAP-43 expression were significantly higher in MCAO male mice exposed to female estrous.
Conclusion: Exposure to female estrous reduces mortality and improves functional recovery in MCAO male mice. The study provides the first evidence to support the importance of female interaction to male stroke rehabilitation.
Abbreviations: GAP-43: growth-associated protein-43; SYP: Synaptophysin; MCAO: middle cerebral artery occlusion; OVXs: ovariectomies; CCA: common carotid artery; ECA: external carotid artery; EFE: estrous female exposure; TTC: 2,3,5-triphenyltetrazolium chloride; PAGE: polyacrylamide gel electrophoresis; PVDF: polyvinylidene difluoride; ANOVA: analysis of variance; LSD: least significant difference
KEYWORDS: Exposure to female estrous, transient ischemic stroke, GAP-43, SYP
Females having a younger brother: Adult earnings are about 7 pct less in adulthood; parents have lower academic expectations for the girls, & the girls develop more traditionally feminine roles
The Brother Earnings Penalty. Angela Cools, Eleonora Patacchini. Labour Economics, https://doi.org/10.1016/j.labeco.2019.02.009
Highlights
• We examine the impact of having a younger brother on females’ adolescent environment and adult earnings
• We use unique longitudinal data on a recent cohort of U.S. women
• Girls with a younger brother earn about 7 percent less in their adulthood
• Parents have lower academic expectations for girls with a younger brother
• Girls with a younger brother develop more traditional gender roles and behaviors
Abstract: This paper examines the impact of sibling gender on adolescent experiences and adult labor market outcomes for a recent cohort of U.S. women. We document an earnings penalty from the presence of a younger brother (relative to a younger sister), finding that a next-youngest brother reduces adult earnings by about 7 percent. Using rich data on parent-child interactions, parents’ expectations, disruptive behaviors, and adult outcomes, we provide a first step at examining the mechanisms behind this result. We find that brothers reduce parents’ expectations and school monitoring of female children while also increasing females’ propensity to engage in more traditionally feminine tasks. These factors help explain a portion of the labor market penalty from brothers.
Highlights
• We examine the impact of having a younger brother on females’ adolescent environment and adult earnings
• We use unique longitudinal data on a recent cohort of U.S. women
• Girls with a younger brother earn about 7 percent less in their adulthood
• Parents have lower academic expectations for girls with a younger brother
• Girls with a younger brother develop more traditional gender roles and behaviors
Abstract: This paper examines the impact of sibling gender on adolescent experiences and adult labor market outcomes for a recent cohort of U.S. women. We document an earnings penalty from the presence of a younger brother (relative to a younger sister), finding that a next-youngest brother reduces adult earnings by about 7 percent. Using rich data on parent-child interactions, parents’ expectations, disruptive behaviors, and adult outcomes, we provide a first step at examining the mechanisms behind this result. We find that brothers reduce parents’ expectations and school monitoring of female children while also increasing females’ propensity to engage in more traditionally feminine tasks. These factors help explain a portion of the labor market penalty from brothers.
Null Effects of Game Violence, Game Difficulty, and 2D:4D digit ratio, thought to index prenatal testosterone exposure, on Aggressive Behavior
Null Effects of Game Violence, Game Difficulty, and 2D:4D Digit Ratio on Aggressive Behavior. Joseph Hilgard et al. Psychological Science, March 7, 2019. https://doi.org/10.1177/0956797619829688
Abstract: Researchers have suggested that acute exposure to violent video games is a cause of aggressive behavior. We tested this hypothesis by using violent and nonviolent games that were closely matched, collecting a large sample, and using a single outcome. We randomly assigned 275 male undergraduates to play a first-person-shooter game modified to be either violent or less violent and hard or easy. After completing the game-play session, participants were provoked by a confederate and given an opportunity to behave aggressively. Neither game violence nor game difficulty predicted aggressive behavior. Incidentally, we found that 2D:4D digit ratio, thought to index prenatal testosterone exposure, did not predict aggressive behavior. Results do not support acute violent-game exposure and low 2D:4D ratio as causes of aggressive behavior.
Keywords: violent video games, aggressive behavior, digit ratio, Bayesian analysis, open data, open materials
Abstract: Researchers have suggested that acute exposure to violent video games is a cause of aggressive behavior. We tested this hypothesis by using violent and nonviolent games that were closely matched, collecting a large sample, and using a single outcome. We randomly assigned 275 male undergraduates to play a first-person-shooter game modified to be either violent or less violent and hard or easy. After completing the game-play session, participants were provoked by a confederate and given an opportunity to behave aggressively. Neither game violence nor game difficulty predicted aggressive behavior. Incidentally, we found that 2D:4D digit ratio, thought to index prenatal testosterone exposure, did not predict aggressive behavior. Results do not support acute violent-game exposure and low 2D:4D ratio as causes of aggressive behavior.
Keywords: violent video games, aggressive behavior, digit ratio, Bayesian analysis, open data, open materials
Thursday, March 7, 2019
A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images
A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images. John Leuner. Master's thesis. Feb 2017, https://arxiv.org/pdf/1902.10739.pdf
Abstract: Recent research used machine learning methods to predict a persons sexual orientationfrom their photograph (Wang and Kosinski, 2017). To verify this result, two of thesemodels are replicated, one based on a deep neural network (DNN) and one on facialmorphology (FM). Using a new dataset of 20,910 photographs from dating websites, theability to predict sexual orientation is confirmed (DNN accuracy male 68%, female 77%,FM male 62%, female 72%). To investigate whether facial features such as brightness orpredominant colours are predictive of sexual orientation, a new model trained on highlyblurred facial images was created. This model was also able to predict sexual orienta-tion (male 63%, female 72%). The tested models are invariant to intentional changesto a subjects makeup, eyewear, facial hair and head pose (angle that the photograph istaken at). It is shown that the head pose is not correlated with sexual orientation. Whiledemonstrating that dating profile images carry rich information about sexual orientationthese results leave open the question of how much is determined by facial morphologyand how much by differences in grooming, presentation and lifestyle. The advent ofnew technology that is able to detect sexual orientation in this way may have seriousimplications for the privacy and safety of gay men and women.
Abstract: Recent research used machine learning methods to predict a persons sexual orientationfrom their photograph (Wang and Kosinski, 2017). To verify this result, two of thesemodels are replicated, one based on a deep neural network (DNN) and one on facialmorphology (FM). Using a new dataset of 20,910 photographs from dating websites, theability to predict sexual orientation is confirmed (DNN accuracy male 68%, female 77%,FM male 62%, female 72%). To investigate whether facial features such as brightness orpredominant colours are predictive of sexual orientation, a new model trained on highlyblurred facial images was created. This model was also able to predict sexual orienta-tion (male 63%, female 72%). The tested models are invariant to intentional changesto a subjects makeup, eyewear, facial hair and head pose (angle that the photograph istaken at). It is shown that the head pose is not correlated with sexual orientation. Whiledemonstrating that dating profile images carry rich information about sexual orientationthese results leave open the question of how much is determined by facial morphologyand how much by differences in grooming, presentation and lifestyle. The advent ofnew technology that is able to detect sexual orientation in this way may have seriousimplications for the privacy and safety of gay men and women.
“Neuromyths” (misconceptions about the brain), show a high prevalence among teachers in different countries; the teachers say they are strongly influenced by their intuitions, by what seems logical to them
Neuromyths and Their Origin Among Teachers in Quebec. Jérémie Blanchette Sarrasin, Martin Riopel, Steve Masson. Mind, Brain, and Education, Mar 7 2019. https://doi.org/10.1111/mbe.12193
Abstract: Previous studies have revealed that “neuromyths,” which are misconceptions about the brain, show a high prevalence among teachers in different countries. However, little is known about the origin of these ideas; that is to say, the sources that may influence their presence among teachers. This research aims to identify the prevalence of five frequent neuromyths among teachers in Quebec (belief in neuromyths and reported practices) and the reported sources of these beliefs (e.g., reading popular science texts). A total of 972 teachers from Quebec responded to an online questionnaire. Results show a lower prevalence than previous studies (although it remains high), and that the main sources cited by participants are related to cognitive biases and university training. To our knowledge, this study is the first to report data supporting the idea that cognitive biases are related to the prevalence of neuromyths.
Abstract: Previous studies have revealed that “neuromyths,” which are misconceptions about the brain, show a high prevalence among teachers in different countries. However, little is known about the origin of these ideas; that is to say, the sources that may influence their presence among teachers. This research aims to identify the prevalence of five frequent neuromyths among teachers in Quebec (belief in neuromyths and reported practices) and the reported sources of these beliefs (e.g., reading popular science texts). A total of 972 teachers from Quebec responded to an online questionnaire. Results show a lower prevalence than previous studies (although it remains high), and that the main sources cited by participants are related to cognitive biases and university training. To our knowledge, this study is the first to report data supporting the idea that cognitive biases are related to the prevalence of neuromyths.
After experiencing a threat to their abilities, individuals who misrepresent their performance as better than it is boost their feelings of competence, restoring positive self-evaluations
A counterfeit competence: After threat, cheating boosts one's self-image. S. Wiley Wakeman, Celia Moore, Francesca Gino. Journal of Experimental Social Psychology, Volume 82, May 2019, Pages 253-265. https://doi.org/10.1016/j.jesp.2019.01.009
Abstract: In six studies, we show that after experiencing a threat to their abilities, individuals who misrepresent their performance as better than it actually is boost their feelings of competence. We situate these findings in the literature on self-protection. We show that this “counterfeit competence” effect holds when threat is measured (Study 1), manipulated (Study 2), and when the opportunity to cheat is randomly assigned (Study 3). We extend our findings to a workplace context, and show that threatened individuals who lie on a job application feel more capable than those who report them honestly (Study 4). Finally, consistent with the argument that counterfeit competence is driven by self-protection, we find individuals do not predict they would experience such a boost (Study 5), and that cheating after threat offers benefits similar to those provided by other established methods of self-protection (Study 6). Together, our findings suggest that, after threat, misrepresenting one's performance can function as a mechanism that helps to restore positive self-evaluations about one's capabilities.
Keywords: CompetenceUnethical behaviorEgo threatSelf-protectionSelf-deception
Abstract: In six studies, we show that after experiencing a threat to their abilities, individuals who misrepresent their performance as better than it actually is boost their feelings of competence. We situate these findings in the literature on self-protection. We show that this “counterfeit competence” effect holds when threat is measured (Study 1), manipulated (Study 2), and when the opportunity to cheat is randomly assigned (Study 3). We extend our findings to a workplace context, and show that threatened individuals who lie on a job application feel more capable than those who report them honestly (Study 4). Finally, consistent with the argument that counterfeit competence is driven by self-protection, we find individuals do not predict they would experience such a boost (Study 5), and that cheating after threat offers benefits similar to those provided by other established methods of self-protection (Study 6). Together, our findings suggest that, after threat, misrepresenting one's performance can function as a mechanism that helps to restore positive self-evaluations about one's capabilities.
Keywords: CompetenceUnethical behaviorEgo threatSelf-protectionSelf-deception
Although we are undoubtedly omnivores, we evolved quite early to become highly carnivorous and we continue to retain a biologic adaptation to carnivory
Ben-Dor, Miki (2019) "How carnivorous are we? The implication for protein consumption," Journal of Evolution and Health: Vol. 3: Iss. 1, Article 10. https://doi.org/10.15310/2334-3591.1096
Full text, references, etc., in the link above.
Introduction
The Paleo Diet evolutionary mismatch principle suggests that the closer we stay to the diet that we evolved to consume the better chances we have to stay healthy. There is little doubt that meat was a significant component of the Paleolithic diet and that it was acquired largely by hunting [1] and thus Paleolithic humans can be defined as carnivores.The definition of carnivory, however, is vague as a dietary pattern. There are 'carnivores' belonging to the Carnivora familythat doesn't eat meat (Panda bears).There are 'obligate carnivores' that rely on very high protein consumption (cats).There arehypercarnivoresthat by definition consume more than 70% of the calories from animal sources and there are even 'epic carnivores 'at the very top of the food chain (lions). The purpose of the present investigation is not to assign humans to any of these categories but to find out whether during our evolution we became adapted to consume large quantities of meat on account ofaprevious adaptation to consume large quantities of plants. If so, we can assume that a relativelylarge quantityof meat will be safer than consuming a relativelylarge quantityofplant foods. Another question that comes up is to what level of protein consumption we became adapted. Since in diet, every item that we consume replaces anitem that we could consume, if we are adapted to consume animal sourced protein, we can consider it to be a safer food than other foods, like domesticated plants, In this context, the question of the evolutionary level ofprotein consumption during the Paleolithic has never received adequate attention. Since there is relatively little protein in plants, the answer is derivedfrom the relative amount of animal food in the human diet.If animal food consumption wererelatively high during the Paleolithic,then relative protein consumption would have also been high.Quite a few authors tried to estimate the caloric Plant:Animal ratio (DPA) in the humans ’Paleolithic diet [2-8]. A wide variation of DPA’s was predictedwith averages ranging between 66% plants and 33% animal[4] to 35% plantsand 65% animal[2]. Alas, because in the archaeological record plants preserve poorlyor not at all, all of the estimatesrelied to a great extenton the ethnographic record of diets of recent hunter-gatherers' (HG) groups with a tacit or expressed claim for the analogybetween the periods. However, I claim that the HG's ethnographic record 1 should not be used to predict Paleolithic diets, or indeed even variability in the diet, as the ecologies of the two periods are so different as to denyany scientific validity to such prediction. Here I outline a short review of the relevant ecological conditions in support of my claim. A full paper is in preparation. Recent hunter-gatherers ethnography is a misleadingsource of Paleolithic diet reconstructionIn discussing the use ofethnographicsourced analogies in archaeology, Ascher (9)summarized his contemporaries, Clack, Willey, and Childes’ opinions thus: “...the cannon is: seek analogies in cultures which manipulates similar environments in similar ways.” In other words, the degree of similarity between the ecological and technological conditions of the known and unknown periods is the key criteria in judging the validity of ethnographic sourced analogies. A review of the recent ecological conditions revealsthat especially in one crucialaspect, availabilityand size offaunal and floral resources, there is a drastic and unbridgeablegap between the Paleolithic and the recent modern HG period. In a recent paper, Smith et al. [10] calculated the mean body weight of non-volant (not flying) terrestrial mammals during the last 2.5 million years. A drastic decline in terrestrialmammals took place from approximately 500 kgs at the beginning of the Pleistocene 2.5 million years ago to about 10kgs today. In the same vein, Bibi et al.[11] compared the faunal assemblagesof Olduvai Middle Bed II at 1.7-1.4 million years ago (Mya) to faunal communitiesin the present day Serengeti. They concluded that “The sheer diversity of species, including many large-bodied species, at Neogene and Pleistocene African sites like Olduvai, is perplexing and makes extant African faunas look depauperate in comparison.”Indeed, they present a hypothesis, supported by reduced carnivore richness in the Early Pleistocene [12], that human predation may have been the cause of the loss of largeherbivores during the Pleistocene. A significant part of the reduction occurred in the Late Pleistocene and is a global phenomenon.During the Late Quaternary Megafauna Extinction,about 90 genera of animals weighing >44 kg became extinct beginning some 50 Kya [13]. The rate of extinctionby body size follows a typical pattern in which the largest size genera became more completely extinct. In all the continents, apart from Africa and the Indian sub-continent, all genera exceeding 1000kg became completelyextinct, and those in the 1000-320 kg category became 50-100% extinct. In Africa, Some 25% of what was left in the Late Quaternary’s megafauna (>45 kg) became extinct [14].
In Africa, however, even the few large animals that remained were hardly available for hunting by HG groups that form the basis for many analogies with the Paleolithic, the Hadza, and the San. Elephants were huntedby Europeans with guns in the Hadza and San’s territories for over a hundred years. There is evidence for a drastic decline in the availability of animals as a result of herders and farmers encroachment abound [15, 16].The result is that the Hadza no longer hunt the three largest animals in Africa, elephants, rhinos, and hippos. Moreover, the disappearance of large animals, and especially elephants, caused a substantial increase in the availability of plant food sources. Elephants are known to be a formidable predator of baobab trees[17]. Baobab isthe single largest contributor of calories to the Hadza as well as a homefor theirmost popular species of honey bees.A similar phenomenon occursin the San (!Kung) territory where the mongongo tree, their staple food source, was subject to partial destruction and growth retardation when elephants were present in its vicinity [18:312]. In summary, the differences in the relative availability of plants and animals and especially big animals, between the Paleolithic and modern HG'speriod are so criticalthat they prevent any inferencefrom the recent HG DPA to Paleolithic DPA, including any conclusion regarding the degree of DPA variability during the Paleolithic. So, if ethnography and archaeologyare poor sources for DPA estimates, are there other fields of knowledge we can explore? As it turns out, physiology can be a trove of information for evolutionary DPA, as adaptations to one DPA or another are stored in our body in the forms of genetics, morphology,metabolism,andsensitivity to pathogens. Reconstruction of the Paleolithic diet based on human physiologyA more detailed reconstructionwhich was performed as a part of my Ph.D. thesis and is in preparation for publication. What follows is a short review of some of the physiological adaptations or lack thereof that provide evidence for the nature of our past diet. The first three adaptations are unique in that the authors themselves point out (maybe to their surprise) that according to their finds, we have various physiological processes that align with that of carnivores.
Weaning like a carnivore
Life history, the age at which animal reach certain stages in life like gestation, weaning, mating, and death,is strongly defined in a species. Psouni et al. [19]found that adult brain mass, limb biometrics,anddietary profile can explain 89.2% of the total variance in time to weaning. Comparing 67 species, they found humans to be in the carnivores’ group while chimpanzees and other primates with the non-carnivore's group. They conclude: "Our findings highlight the emergence of carnivory as a process fundamentally determining human evolution."Many smaller fat cellslike all carnivoresPond and Mattacks (20) compared the structure of fat cells in various types of animals. Carnivores were found to have a higher number of smaller fat cells and omnivores a smaller number of larger fat cells. Humans were found to beat the top of the carnivorous pattern. Pond and Mattacks conclude: “These figures suggest that the energy metabolism of humans is adaptedto a diet in which lipids and proteins rather than carbohydrates, make a major contribution to the energy supply.”
Stomach acidity of a unique carnivore
Beasley, Koltz (21)emphasize the role of stomach acidity in protection against pathogens. The found that carnivores’ stomachs at a pH of 2.2 are more acidic than omnivores’stomachs at a pH of 2.9 but less acidic than obligate scavengers at pH of 1.3. According to Beasley, Koltz (21) Humans had a high level of acidity of 1.5 that lies between that of obligate and facultative scavengers. Producing acidity,and retaining the stomach walls to contain that acidity,is energetically expensive, so would presumably only evolve if the level of pathogens in the human diet was high. The authors surmise that humans were more of a scavenger than we thought. However, there is a more likely conclusion if we take into account that humans were a particular kind of carnivore. Unlike other carnivores, they consumed the meat over several days either in a central place (home base) [22] or, for very large animals,where it was acquired [23]. Big animals, like elephants and bison, and even smaller animals like zebra, provide enough calories to last a 25-member HG group for days and weeks [24]. During this time the pathogen load is bound tobuildup to a higher level than even a regular scavenger encounters under normal circumstances and hence the presumed need for high acidity.
Reduced energy extraction capacity from plants.
Most plant eaters extract a large part of their energy from the fermentationof fiber by gut bacteria[25]. In primates,the fermentation takes place in the large intestine.For example, a gorillaextractssome 60% of its energyfrom fiber[26]. The fruits that chimps are consuming are also very fibrous [27].Their large intestines form 4
52% of the volume of the gut, similar to the 53% in the gorilla [28], indicating that,like a gorilla,they also drive a similarly highportion of their energy from fiber.An adaptation that preventshumans from efficientexploitation of fiber to energymay point to a shift in the dietary emphasisaway from plantstowards specialization in animal’s sourced food [See 29 considering criteria for specialization]. Our gut is 40% smaller[30], and one can therefore calculate that our large intestine, where fiber is processedto energy, is 77% smallerby volume than that of a chimpanzee our size[28]. The size and our small intestine, where -macronutrients are absorbed is 62% larger than that of a chimpanzee our size. Since the Chimpanzee was able to absorb a large amount of sugar with a shorter small intestine, The 66% extension could representan adaptation to consuming more fat and proteinin humans. Since the masticationsystem prepare the food for the gutareduced mastication system already1.7 million years ago (Mya) in H. erectussuggests that the gut size of H. erectuswas already reduced [31].We can thus propose that H. erectusspecialed in non-plant food items.The omnivorous pigs are sometimes mentionedas a good model for human nutrition [32], however,the volume of their large intestine is higher than the volume of their small intestine [32]the reserveratio in humans[28], pointing to the adaptationof pigs to highly fibrous food.The changed gut composition meets the criteria for specialization proposed by Wood and Strait (29). They propose that adaptation towards specialization is marked by a change that enables the acquisition of one resource while interrupting in the acquisition of another resource.In our case,the gut morphology adaptations both improvedanimal food exploitationand at the same time hindered the full exploitation of fibrous plant foods.Endurance running Bramble and Lieberman [33] list 22 specific adaptations to endurance running and claim they represent an adaptationto ‘persistence hunting’. There is some disagreement as to the significance of the 'persistence hunting' technique[34],butas it representsan adaptationto better mobility, it may also indicate adaptationto operating in a largerhome-range. Carnivores with a large proportion of flesh in their diets such as Canids and Felids have particularly large home-ranges whereas omnivorous carnivores like Ursidae have a narrower home-range[35].Adaptation to aspear throwingRoach et al. [36]claim that the structure of our shoulderrepresents an adaptation to carnivory. They describe howour shoulder is perfectly adaptedto throwing, which must be useful, in their opinion, mainly in hunting and protection from predators. They show that in contrast, thechimpanzee’s shoulder is adaptedto climbing trees.
This evidence may serve as another evidence for specialization in carnivory, like the smaller gut,the improved ability toobtain animal food comes at the account of reduced ability to obtain plant-sourcedfood, fruits in this case.High-fatreservesHumans have much higher fat reserves than chimps,our closest relatives [37]. Carrying a high amount of fat cost energy and reduce the speed of chasing or fleeting[38]. Most carnivores and fleeting herbivores do not pack much fatas, unlike humans, they rely on speed for predation or evasion.Recent HG were found to have enough fat reserves to fast for three weeks for men and six weeks for women[39]. This ability may represent anadaptation that is unique to carnivory of large animals by a predator who does not rely on speed. The large fat reserves may have allowed human tobridgelonger periods between less frequent hunts of largeranimalsdue to their relativelylower abundance.
The AMY1 gene -Incomplete adaptation to metabolize starch?
Humans have a varying number of AMY1 gene copies (2-12 copies [40]) which synthesize salivary amylase whereas chimpanzees have only two copies.The higher copy number may represent different degrees of adaptation to consuming starch[40] although the results of actual health markers associations with the number of copiesare equivocal [41-47].Herbivores and carnivores donotseem to have salivary amylase (although the data are limited) whereas omnivores usually produce high quantities of the enzyme [48]. This variancein the number of copies in humans in itself can be(but doesn't have to be)a testimony that the adaptationis relatively recent and have not beenfixedyet. However, until better grasp is obtainedon the timing of the change in copy number, little can be said about its significance to the question of DPA in humans.Recent genetic adaptation to tuber consumptionTubers, which are available year-round and are as energy dense as wild fruits,are mentionedas a good candidate for Paleolithic plant-based diet [49]. Populations that presently depend on tubersare enrichedin genes that are associatedwith starch metabolism, folic acid synthesis,andglycosides neutralization, but other populationsare not[50]. These adaptations presumably compensate for these tubers’ poor folic acid and relatively high contentof glycosides. The very limitedgeographicdistribution of these genes[50]maymeanthat their presence in humansis quite recentso that tubers did not form an importantpart of the human Paleolithic diet.
The earliestevidence for caries -15,000 years ago
High consumption of starch and sugars is associatedwith the development of oral cariescavities [51].Frequenciesof carious lesions in archaeological populations rangefrom 2.2–48.1% of teeth for agricultural populations, but only0–14.3% for hunter-gatherers[52]. A high prevalence of cariesfirstappearedsome 15.0 Kya in a site in Morocco, together with evidence for exploitation of starchy foods[53]. Thisrecent phenomenon may mean that high carbohydrates (plants)consumption is a relatively recentend-of-Pleistocenephenomenon.It should be pointedout that in some more recent traditional societies high starch consumption was not associated with a highprevalence of caries [54].
Paleolithic dietary reconstruction based on human Physiology –conclusion
Although physiology is only one of the sourcesfor Paleolithic dietary reconstruction, looking into the information that is storedin our bodyprovide an interesting and sometimes new evidence that we underwent substantial adaptation towards carnivory and that it started quite early in our evolution as the genus Homo. It also supports the notion thatwe remain adapted to carnivory despite over 10,000 years of agricultural subsistence. Consequently, it seems, in reply to the question at the heart of this paper,that we are adaptedto consume high quantities of protein. How high? The answer lies in reconstructing our behavior during prehistory regardingfat [24, 55].What was the protein consumption level during human evolution? The question of the desirable level of dietary protein consumption comes up in the literature and among professional and lay people who are interested in nutrition. This section tries to answer that question by discerning the Paleolithic level of consumption, assuming that it is a safelevel, following the evolutionary mismatch theory of chronic disease[56]. Protein processing for energy in humans is estimated to be physiologically limited to 35-45% of the daily calories[57, 58].If humans were at the protein limitduring the Paleolithic era, the remaining 55-65% of the calories should have come either from fat or carbohydrates, namely plants.There is ample ethnographic evidence for human dependence on and preference for animal fat as afood source. Kelly [59]writesin his authoritative book on HG: “...although ethnographic accounts abound with references to theimportance of meat they equally convey the importance of fat...”. He adds: “It, therefore,may be fatrather than protein that drives the desire for meat in many foraging societies”.
Lee [16] writes about the !Kung of the K alahari: “Fat animals are keenly desired, and all !Kungexpressa constant craving for animal fat”. The essentiality of fat is best demonstrated in Tindale’s account of the Pitjandjara of Australia[60]. He writes: "Whenkilling the animal they immediately feel the body for evidence of the presence of caul fat. If the animal is 'njuka',fatless, it is usually left unless they are themselves starving”. Coote and Shelton [61]report a similar behavioramong the Yolngu of Arnhem, Australia, saying that "Animals without fat may indeed be rejected as food".The importance of fat is also evident initsuse as a symboloffertility, sacredness, wealth, health and even life itself in recent traditional societies' rituals, linguistics and mythology [55]The archaeological record similarly showsthat many of humans’ particular acquisition and food exploitation behaviors can be interpreted as stemming from the need to obtain fat. Behaviors like the hunting of fatter animal or processing of fat from body parts at greater energetic expenditure thanwould have otherwise been needed indicate a concentration on fat as the primarycriterion in prey selectionand butchering. The preference of hunting larger animals and prime adultanimals within prey species[24, 62, 63], the preferenceto bring fatty parts to a central place and the extraction of bone grease [64],at great energetic costs,all point to a strategy of fat maximization. Thisenergetically expensive set of behaviors also supports the conclusion that plants could not provide a sufficient contribution to complement the protein at the limit of its consumption.This energetically expensive behavior is difficult to explain unless we assume that humans were at the limit of their protein consumption. Therefore, the implication for protein consumption from this reconstruction is that throughout our evolution as humans we obtaineda high portion of our calories from protein. Although no clear official statement of the upper limit on the consumption of protein has ever been published, there are reports of consumption of over 40% of the daily calories,or about 4 grams per kg body weight per day (g/kg/d) by circumpolar groups [65]. Rudman, Difulco (66)found the limit on urea removal to be 3.8 g/kg/dof protein to which the demand of structural protein at a minimum of 0.8grams per kg per day should be added [57]to a total of 4.6 g/kg/d. The present level of protein intake in the U.S.is some15.7% [67]of the daily calories. Based on consumption of 2000 calories for a 60 kgs person the currentconsumption is 314 calories whereas the Paleolithic level of consumption, according to this analysis was in the vicinity of 800 calories (40% of 2000) and possibly even higher at 1100 calories (4.6 g/kg/d X60 kgs X 4 cal/g).
Conclusion
As mentioned, this paper is just a part of a wider review, in preparation, of scientific evidence for the human evolutionary diet. Although we are undoubtedly omnivores, the biologic evidence that was presentedhere claims to show that we evolved, quite early in our evolution as the genus Homo, to becomehighly carnivorousand that we continue toretain abiologic adaptation to carnivory. This high level of carnivorymeans that during a largepart of our evolution our diet was high in protein besides being high in fat. If we look at the Paleo nutrition templateas a safety templet,this paper concludesthat it seems to be safe to consume a highportion of the diet from animal protein, possiblyto the tune of 30-40% of the daily calories. Since every calorie of protein that we do not consume is a calorie that will be consumed from another food source, the Paleo template guides us to consider the relative safety of alternatives to proteinwhen deciding on the actual level of protein consumption. Not many alternatives foods can claim to have nearly two million years of safe consumption.
References:
1.Domínguez-Rodrigo M, Pickering TR. The meat of the matter: an evolutionary perspective on human carnivory. Azania: Archaeological Research in Africa. 2017;52(1):4-32.2.Cordain L, Miller JB, Eaton SB, Mann N, Holt SHA, Speth JD. Plant-animal subsistence ratios and macronutrient energy estimations in worldwide hunter-gatherer diets 1 , 2. The American Journal of Clinical Nutrition. 2000;71(3):682-92.3.Marlowe FW. Hunter‐gatherers and human evolution. Evolutionary Anthropology: Issues, News, and Reviews. 2005;14(2):54-67.4.Lee RB. What hunters do for a living, or, how to make out on scarce resources. Man the Hunter. Chicago: Aldine Publishing Company; 1968.5.Eaton SB, Konner M. Paleolithic nutrition -a consideration of its nature and current implications. New Engl J Med. 1985;312(5):283-9. doi: 10.1056/nejm198501313120505. PubMed PMID: WOS:A1985AAQ2000005.6.Ströhle A, Hahn A. Diets of modern hunter-gatherers vary substantially in their carbohydrate content depending on ecoenvironments: results from an ethnographic analysis. Nutr Res.2011;31(6):429-35.7.Konner M, Eaton SB. Paleolithic nutrition twenty-five years later. Nutr Clin Pract. 2010;25(6):594-602.8.Kuipers RS, Joordens JC, Muskiet FA. A multidisciplinary reconstruction of Palaeolithic nutrition that holds promise for the prevention and treatment of diseases of civilisation. Nutr Res Rev.2012;25(01):96-129.9.Ascher R. Analogy in archaeological interpretation. Southwestern journal of Anthropology. 1961;17(4):317-25.10.Smith FA, Smith REE, Lyons SK, Payne JL. Body size downgrading of mammals over the late Quaternary. Science. 2018;360(6386):310-3.11.Bibi F, Pante M, Souron A, Stewart K, Varela S, Werdelin L, et al. Paleoecology of the Serengeti during the Oldowan-Acheulean transition at Olduvai Gorge, Tanzania: The mammal and fish evidence. J Hum Evol. 2017.12.Werdelin L, Lewis ME. Temporal change in functional richness and evenness in the eastern African Plio-Pleistocene carnivoran guild. PLoS ONE. 2013;8(3):e57944.13.Koch PL, Barnosky AD. Late Quaternary Extinctions : State of the Debate. Annu Rev Ecol, Evol Syst.2006;37:215-52. doi: 10.1146/annurev.ecolsys.34.011802.132415.14.Faith JT. Late Pleistocene and Holocene mammal extinctions on continental Africa. Earth-Sci Rev.2014;128:105-21.10Journal of Evolution and Health, Vol. 3 [2018], Iss. 1, Art. 10https://jevohealth.com/journal/vol3/iss1/10DOI: 10.15310/2334-3591.10961115.Marlowe F. The Hadza: Hunter-gatherers of Tanzania: University of California Press; 2010. 325 p.16.Lee RB. The Kung San: men, women, and work in a foraging society. Cambridge: Cambridge University Press; 1979. 526 p.17.Barnes R. The decline of the baobab tree in Ruaha National Park, Tanzania. Afr J Ecol.1980;18(4):243-52.18.Lee RB. Mongongo: the ethnography of a major wild food resource. Ecol Food Nutr.1973;2(4):307-21.19.Psouni E, Janke A, Garwicz M. Impact of carnivory on human development and evolution revealed by a new unifying model of weaning in mammals. PLoS ONE. 2012;7(4):e32452.20.Pond CM, Mattacks CA. Body mass and natural diet as determinants of the number and volume of adipocytes in eutherian mammals. J Morphol.1985;185(2):183-93.21.Beasley DE, Koltz AM, Lambert JE, Fierer N, Dunn RR. The evolution of stomach acidity and its relevance to the human microbiome. PLoS ONE. 2015;10(7):e0134116.22.Isaac GL. The Harvey Lecture series, 1977-1978. Food sharing and human evolution: archaeological evidence from the Plio-Pleistocene of east Africa. J Anthrop Res.1978;34(3):311-25.23.Yravedra J, Rubio-Jara S, Panera J, Martos JA. Hominins and Proboscideans in the Lower and Middle Palaeolithic in the central Iberian Peninsula. Quat Int.2017.24.Ben-Dor M, Gopher A, Hershkovitz I, Barkai R. Man the fat hunter: the demise of Homo erectus and the emergence of a new hominin lineage in the Middle Pleistocene (ca. 400 kyr)Levant. PLoS ONE. 2011;6(12):e28689. doi: 10.1371/journal.pone.0028689.25.McNeil N. The contribution of the large intestine to energy supplies in man. The American journal of clinical nutrition. 1984;39(2):338-42.26.Popovich DG, Jenkins DJ, Kendall CW, Dierenfeld ES, Carroll RW, Tariq N, et al. The western lowland gorilla diet has implications for the health of humans and other hominoids. The Journal of nutrition. 1997;127(10):2000-5.27.Wrangham RW, Conklin-Brittain NL, Hunt KD. Dietary response of chimpanzees and cercopithecines to seasonal variation in fruit abundance. I. Antifeedants. Int J Primatol. 1998;19(6):949-70.28.Milton K. Primate diets and gut morphology: implications for hominid evolution. In: Harris M, Ross E, editors. Food and evolution: toward a theory of human food habits. Philadelphia: Temple University Press; 1987. p. 93-115.29.Wood B, Strait D. Patterns of resource use in early Homo and Paranthropus.J Hum Evol.2004;46(2):119-62.11
1230.Aiello LC, Wheeler P. The expensive-tissue hypothesis: the brain and the digestive system in human and primate evolution. CurrAnthr.1995;36(2):199-221.31.Lucas PW, Sui Z,Ang KY, Tan HTW, King SH, Sadler B, et al. Meals versus snacks and the human dentition and diet during the Paleolithic. The Evolution of Hominin Diets: Springer; 2009. p. 31-41.32.Miller E, Ullrey D. The pig as a model for human nutrition. Annu Rev Nutr.1987;7(1):361-82.33.Bramble DM, Lieberman DE. Endurance running and the evolution of Homo. Nature. 2004;432(7015):345-52.34.Pickering TR, Bunn HT. The endurance running hypothesis and hunting and scavenging in savanna-woodlands. J Hum Evol. 2007;53(4):434-8.35.Gittleman JL, Harvey PH. Carnivore home-range size, metabolic needs and ecology. Behav Ecol Sociobiol. 1982;10(1):57-63.36.Roach NT, Venkadesan M, Rainbow MJ, Lieberman DE. Elastic energy storage in the shoulder and the evolution of high-speed throwing in Homo. Nature. 2013;498(7455):483-6.37.Zihlman AL, Bolter DR. Body composition in Pan paniscus compared with Homo sapiens has implications for changes during human evolution. Proceedings of the National Academy of Sciences. 2015:201505071.38.Pond CM. Morphological aspects and the ecological and mechanical consequences of fat deposition in wild vertebrates. Annu Rev Ecol Syst.1978;9(1):519-70.39.Pontzer H, Raichlen DA, Wood BM, Emery Thompson M, Racette SB, Mabulla AZ, et al. Energy expenditure and activity among Hadza hunter‐gatherers. Amer J Hum Biol.2015;27(5):628-37.40.Perry G, Dominy N, Claw K, Lee A. Diet and the evolution of human amylase gene copy number variation. Nature. 2007;39(10):1256.41.Falchi M, Moustafa JSE-S, Takousis P, Pesce F, Bonnefond A, Andersson-Assarsson JC, et al. Low copy number of the salivary amylase gene predisposes to obesity. Nat Genet.2014;46(5):492-7.42.Des Gachons CP, Breslin PA. Salivary amylase: digestion and metabolic syndrome. Curr Diab Rep. 2016;16(10):102.43.Fernández CI, Wiley AS. Rethinking the starch digestion hypothesis for AMY1 copy number variation in humans. Amer J Phys Anthrop.2017;163(4):645-57.44.Atkinson FS, Hancock D, Petocz P, Brand-Miller JC. The physiologic and phenotypicsignificance of variation in human amylase gene copy number. The American journal of clinical nutrition. 2018;108(4):737-48.
Full text, references, etc., in the link above.
Introduction
The Paleo Diet evolutionary mismatch principle suggests that the closer we stay to the diet that we evolved to consume the better chances we have to stay healthy. There is little doubt that meat was a significant component of the Paleolithic diet and that it was acquired largely by hunting [1] and thus Paleolithic humans can be defined as carnivores.The definition of carnivory, however, is vague as a dietary pattern. There are 'carnivores' belonging to the Carnivora familythat doesn't eat meat (Panda bears).There are 'obligate carnivores' that rely on very high protein consumption (cats).There arehypercarnivoresthat by definition consume more than 70% of the calories from animal sources and there are even 'epic carnivores 'at the very top of the food chain (lions). The purpose of the present investigation is not to assign humans to any of these categories but to find out whether during our evolution we became adapted to consume large quantities of meat on account ofaprevious adaptation to consume large quantities of plants. If so, we can assume that a relativelylarge quantityof meat will be safer than consuming a relativelylarge quantityofplant foods. Another question that comes up is to what level of protein consumption we became adapted. Since in diet, every item that we consume replaces anitem that we could consume, if we are adapted to consume animal sourced protein, we can consider it to be a safer food than other foods, like domesticated plants, In this context, the question of the evolutionary level ofprotein consumption during the Paleolithic has never received adequate attention. Since there is relatively little protein in plants, the answer is derivedfrom the relative amount of animal food in the human diet.If animal food consumption wererelatively high during the Paleolithic,then relative protein consumption would have also been high.Quite a few authors tried to estimate the caloric Plant:Animal ratio (DPA) in the humans ’Paleolithic diet [2-8]. A wide variation of DPA’s was predictedwith averages ranging between 66% plants and 33% animal[4] to 35% plantsand 65% animal[2]. Alas, because in the archaeological record plants preserve poorlyor not at all, all of the estimatesrelied to a great extenton the ethnographic record of diets of recent hunter-gatherers' (HG) groups with a tacit or expressed claim for the analogybetween the periods. However, I claim that the HG's ethnographic record 1 should not be used to predict Paleolithic diets, or indeed even variability in the diet, as the ecologies of the two periods are so different as to denyany scientific validity to such prediction. Here I outline a short review of the relevant ecological conditions in support of my claim. A full paper is in preparation. Recent hunter-gatherers ethnography is a misleadingsource of Paleolithic diet reconstructionIn discussing the use ofethnographicsourced analogies in archaeology, Ascher (9)summarized his contemporaries, Clack, Willey, and Childes’ opinions thus: “...the cannon is: seek analogies in cultures which manipulates similar environments in similar ways.” In other words, the degree of similarity between the ecological and technological conditions of the known and unknown periods is the key criteria in judging the validity of ethnographic sourced analogies. A review of the recent ecological conditions revealsthat especially in one crucialaspect, availabilityand size offaunal and floral resources, there is a drastic and unbridgeablegap between the Paleolithic and the recent modern HG period. In a recent paper, Smith et al. [10] calculated the mean body weight of non-volant (not flying) terrestrial mammals during the last 2.5 million years. A drastic decline in terrestrialmammals took place from approximately 500 kgs at the beginning of the Pleistocene 2.5 million years ago to about 10kgs today. In the same vein, Bibi et al.[11] compared the faunal assemblagesof Olduvai Middle Bed II at 1.7-1.4 million years ago (Mya) to faunal communitiesin the present day Serengeti. They concluded that “The sheer diversity of species, including many large-bodied species, at Neogene and Pleistocene African sites like Olduvai, is perplexing and makes extant African faunas look depauperate in comparison.”Indeed, they present a hypothesis, supported by reduced carnivore richness in the Early Pleistocene [12], that human predation may have been the cause of the loss of largeherbivores during the Pleistocene. A significant part of the reduction occurred in the Late Pleistocene and is a global phenomenon.During the Late Quaternary Megafauna Extinction,about 90 genera of animals weighing >44 kg became extinct beginning some 50 Kya [13]. The rate of extinctionby body size follows a typical pattern in which the largest size genera became more completely extinct. In all the continents, apart from Africa and the Indian sub-continent, all genera exceeding 1000kg became completelyextinct, and those in the 1000-320 kg category became 50-100% extinct. In Africa, Some 25% of what was left in the Late Quaternary’s megafauna (>45 kg) became extinct [14].
In Africa, however, even the few large animals that remained were hardly available for hunting by HG groups that form the basis for many analogies with the Paleolithic, the Hadza, and the San. Elephants were huntedby Europeans with guns in the Hadza and San’s territories for over a hundred years. There is evidence for a drastic decline in the availability of animals as a result of herders and farmers encroachment abound [15, 16].The result is that the Hadza no longer hunt the three largest animals in Africa, elephants, rhinos, and hippos. Moreover, the disappearance of large animals, and especially elephants, caused a substantial increase in the availability of plant food sources. Elephants are known to be a formidable predator of baobab trees[17]. Baobab isthe single largest contributor of calories to the Hadza as well as a homefor theirmost popular species of honey bees.A similar phenomenon occursin the San (!Kung) territory where the mongongo tree, their staple food source, was subject to partial destruction and growth retardation when elephants were present in its vicinity [18:312]. In summary, the differences in the relative availability of plants and animals and especially big animals, between the Paleolithic and modern HG'speriod are so criticalthat they prevent any inferencefrom the recent HG DPA to Paleolithic DPA, including any conclusion regarding the degree of DPA variability during the Paleolithic. So, if ethnography and archaeologyare poor sources for DPA estimates, are there other fields of knowledge we can explore? As it turns out, physiology can be a trove of information for evolutionary DPA, as adaptations to one DPA or another are stored in our body in the forms of genetics, morphology,metabolism,andsensitivity to pathogens. Reconstruction of the Paleolithic diet based on human physiologyA more detailed reconstructionwhich was performed as a part of my Ph.D. thesis and is in preparation for publication. What follows is a short review of some of the physiological adaptations or lack thereof that provide evidence for the nature of our past diet. The first three adaptations are unique in that the authors themselves point out (maybe to their surprise) that according to their finds, we have various physiological processes that align with that of carnivores.
Weaning like a carnivore
Life history, the age at which animal reach certain stages in life like gestation, weaning, mating, and death,is strongly defined in a species. Psouni et al. [19]found that adult brain mass, limb biometrics,anddietary profile can explain 89.2% of the total variance in time to weaning. Comparing 67 species, they found humans to be in the carnivores’ group while chimpanzees and other primates with the non-carnivore's group. They conclude: "Our findings highlight the emergence of carnivory as a process fundamentally determining human evolution."Many smaller fat cellslike all carnivoresPond and Mattacks (20) compared the structure of fat cells in various types of animals. Carnivores were found to have a higher number of smaller fat cells and omnivores a smaller number of larger fat cells. Humans were found to beat the top of the carnivorous pattern. Pond and Mattacks conclude: “These figures suggest that the energy metabolism of humans is adaptedto a diet in which lipids and proteins rather than carbohydrates, make a major contribution to the energy supply.”
Stomach acidity of a unique carnivore
Beasley, Koltz (21)emphasize the role of stomach acidity in protection against pathogens. The found that carnivores’ stomachs at a pH of 2.2 are more acidic than omnivores’stomachs at a pH of 2.9 but less acidic than obligate scavengers at pH of 1.3. According to Beasley, Koltz (21) Humans had a high level of acidity of 1.5 that lies between that of obligate and facultative scavengers. Producing acidity,and retaining the stomach walls to contain that acidity,is energetically expensive, so would presumably only evolve if the level of pathogens in the human diet was high. The authors surmise that humans were more of a scavenger than we thought. However, there is a more likely conclusion if we take into account that humans were a particular kind of carnivore. Unlike other carnivores, they consumed the meat over several days either in a central place (home base) [22] or, for very large animals,where it was acquired [23]. Big animals, like elephants and bison, and even smaller animals like zebra, provide enough calories to last a 25-member HG group for days and weeks [24]. During this time the pathogen load is bound tobuildup to a higher level than even a regular scavenger encounters under normal circumstances and hence the presumed need for high acidity.
Reduced energy extraction capacity from plants.
Most plant eaters extract a large part of their energy from the fermentationof fiber by gut bacteria[25]. In primates,the fermentation takes place in the large intestine.For example, a gorillaextractssome 60% of its energyfrom fiber[26]. The fruits that chimps are consuming are also very fibrous [27].Their large intestines form 4
52% of the volume of the gut, similar to the 53% in the gorilla [28], indicating that,like a gorilla,they also drive a similarly highportion of their energy from fiber.An adaptation that preventshumans from efficientexploitation of fiber to energymay point to a shift in the dietary emphasisaway from plantstowards specialization in animal’s sourced food [See 29 considering criteria for specialization]. Our gut is 40% smaller[30], and one can therefore calculate that our large intestine, where fiber is processedto energy, is 77% smallerby volume than that of a chimpanzee our size[28]. The size and our small intestine, where -macronutrients are absorbed is 62% larger than that of a chimpanzee our size. Since the Chimpanzee was able to absorb a large amount of sugar with a shorter small intestine, The 66% extension could representan adaptation to consuming more fat and proteinin humans. Since the masticationsystem prepare the food for the gutareduced mastication system already1.7 million years ago (Mya) in H. erectussuggests that the gut size of H. erectuswas already reduced [31].We can thus propose that H. erectusspecialed in non-plant food items.The omnivorous pigs are sometimes mentionedas a good model for human nutrition [32], however,the volume of their large intestine is higher than the volume of their small intestine [32]the reserveratio in humans[28], pointing to the adaptationof pigs to highly fibrous food.The changed gut composition meets the criteria for specialization proposed by Wood and Strait (29). They propose that adaptation towards specialization is marked by a change that enables the acquisition of one resource while interrupting in the acquisition of another resource.In our case,the gut morphology adaptations both improvedanimal food exploitationand at the same time hindered the full exploitation of fibrous plant foods.Endurance running Bramble and Lieberman [33] list 22 specific adaptations to endurance running and claim they represent an adaptationto ‘persistence hunting’. There is some disagreement as to the significance of the 'persistence hunting' technique[34],butas it representsan adaptationto better mobility, it may also indicate adaptationto operating in a largerhome-range. Carnivores with a large proportion of flesh in their diets such as Canids and Felids have particularly large home-ranges whereas omnivorous carnivores like Ursidae have a narrower home-range[35].Adaptation to aspear throwingRoach et al. [36]claim that the structure of our shoulderrepresents an adaptation to carnivory. They describe howour shoulder is perfectly adaptedto throwing, which must be useful, in their opinion, mainly in hunting and protection from predators. They show that in contrast, thechimpanzee’s shoulder is adaptedto climbing trees.
This evidence may serve as another evidence for specialization in carnivory, like the smaller gut,the improved ability toobtain animal food comes at the account of reduced ability to obtain plant-sourcedfood, fruits in this case.High-fatreservesHumans have much higher fat reserves than chimps,our closest relatives [37]. Carrying a high amount of fat cost energy and reduce the speed of chasing or fleeting[38]. Most carnivores and fleeting herbivores do not pack much fatas, unlike humans, they rely on speed for predation or evasion.Recent HG were found to have enough fat reserves to fast for three weeks for men and six weeks for women[39]. This ability may represent anadaptation that is unique to carnivory of large animals by a predator who does not rely on speed. The large fat reserves may have allowed human tobridgelonger periods between less frequent hunts of largeranimalsdue to their relativelylower abundance.
The AMY1 gene -Incomplete adaptation to metabolize starch?
Humans have a varying number of AMY1 gene copies (2-12 copies [40]) which synthesize salivary amylase whereas chimpanzees have only two copies.The higher copy number may represent different degrees of adaptation to consuming starch[40] although the results of actual health markers associations with the number of copiesare equivocal [41-47].Herbivores and carnivores donotseem to have salivary amylase (although the data are limited) whereas omnivores usually produce high quantities of the enzyme [48]. This variancein the number of copies in humans in itself can be(but doesn't have to be)a testimony that the adaptationis relatively recent and have not beenfixedyet. However, until better grasp is obtainedon the timing of the change in copy number, little can be said about its significance to the question of DPA in humans.Recent genetic adaptation to tuber consumptionTubers, which are available year-round and are as energy dense as wild fruits,are mentionedas a good candidate for Paleolithic plant-based diet [49]. Populations that presently depend on tubersare enrichedin genes that are associatedwith starch metabolism, folic acid synthesis,andglycosides neutralization, but other populationsare not[50]. These adaptations presumably compensate for these tubers’ poor folic acid and relatively high contentof glycosides. The very limitedgeographicdistribution of these genes[50]maymeanthat their presence in humansis quite recentso that tubers did not form an importantpart of the human Paleolithic diet.
The earliestevidence for caries -15,000 years ago
High consumption of starch and sugars is associatedwith the development of oral cariescavities [51].Frequenciesof carious lesions in archaeological populations rangefrom 2.2–48.1% of teeth for agricultural populations, but only0–14.3% for hunter-gatherers[52]. A high prevalence of cariesfirstappearedsome 15.0 Kya in a site in Morocco, together with evidence for exploitation of starchy foods[53]. Thisrecent phenomenon may mean that high carbohydrates (plants)consumption is a relatively recentend-of-Pleistocenephenomenon.It should be pointedout that in some more recent traditional societies high starch consumption was not associated with a highprevalence of caries [54].
Paleolithic dietary reconstruction based on human Physiology –conclusion
Although physiology is only one of the sourcesfor Paleolithic dietary reconstruction, looking into the information that is storedin our bodyprovide an interesting and sometimes new evidence that we underwent substantial adaptation towards carnivory and that it started quite early in our evolution as the genus Homo. It also supports the notion thatwe remain adapted to carnivory despite over 10,000 years of agricultural subsistence. Consequently, it seems, in reply to the question at the heart of this paper,that we are adaptedto consume high quantities of protein. How high? The answer lies in reconstructing our behavior during prehistory regardingfat [24, 55].What was the protein consumption level during human evolution? The question of the desirable level of dietary protein consumption comes up in the literature and among professional and lay people who are interested in nutrition. This section tries to answer that question by discerning the Paleolithic level of consumption, assuming that it is a safelevel, following the evolutionary mismatch theory of chronic disease[56]. Protein processing for energy in humans is estimated to be physiologically limited to 35-45% of the daily calories[57, 58].If humans were at the protein limitduring the Paleolithic era, the remaining 55-65% of the calories should have come either from fat or carbohydrates, namely plants.There is ample ethnographic evidence for human dependence on and preference for animal fat as afood source. Kelly [59]writesin his authoritative book on HG: “...although ethnographic accounts abound with references to theimportance of meat they equally convey the importance of fat...”. He adds: “It, therefore,may be fatrather than protein that drives the desire for meat in many foraging societies”.
Lee [16] writes about the !Kung of the K alahari: “Fat animals are keenly desired, and all !Kungexpressa constant craving for animal fat”. The essentiality of fat is best demonstrated in Tindale’s account of the Pitjandjara of Australia[60]. He writes: "Whenkilling the animal they immediately feel the body for evidence of the presence of caul fat. If the animal is 'njuka',fatless, it is usually left unless they are themselves starving”. Coote and Shelton [61]report a similar behavioramong the Yolngu of Arnhem, Australia, saying that "Animals without fat may indeed be rejected as food".The importance of fat is also evident initsuse as a symboloffertility, sacredness, wealth, health and even life itself in recent traditional societies' rituals, linguistics and mythology [55]The archaeological record similarly showsthat many of humans’ particular acquisition and food exploitation behaviors can be interpreted as stemming from the need to obtain fat. Behaviors like the hunting of fatter animal or processing of fat from body parts at greater energetic expenditure thanwould have otherwise been needed indicate a concentration on fat as the primarycriterion in prey selectionand butchering. The preference of hunting larger animals and prime adultanimals within prey species[24, 62, 63], the preferenceto bring fatty parts to a central place and the extraction of bone grease [64],at great energetic costs,all point to a strategy of fat maximization. Thisenergetically expensive set of behaviors also supports the conclusion that plants could not provide a sufficient contribution to complement the protein at the limit of its consumption.This energetically expensive behavior is difficult to explain unless we assume that humans were at the limit of their protein consumption. Therefore, the implication for protein consumption from this reconstruction is that throughout our evolution as humans we obtaineda high portion of our calories from protein. Although no clear official statement of the upper limit on the consumption of protein has ever been published, there are reports of consumption of over 40% of the daily calories,or about 4 grams per kg body weight per day (g/kg/d) by circumpolar groups [65]. Rudman, Difulco (66)found the limit on urea removal to be 3.8 g/kg/dof protein to which the demand of structural protein at a minimum of 0.8grams per kg per day should be added [57]to a total of 4.6 g/kg/d. The present level of protein intake in the U.S.is some15.7% [67]of the daily calories. Based on consumption of 2000 calories for a 60 kgs person the currentconsumption is 314 calories whereas the Paleolithic level of consumption, according to this analysis was in the vicinity of 800 calories (40% of 2000) and possibly even higher at 1100 calories (4.6 g/kg/d X60 kgs X 4 cal/g).
Conclusion
As mentioned, this paper is just a part of a wider review, in preparation, of scientific evidence for the human evolutionary diet. Although we are undoubtedly omnivores, the biologic evidence that was presentedhere claims to show that we evolved, quite early in our evolution as the genus Homo, to becomehighly carnivorousand that we continue toretain abiologic adaptation to carnivory. This high level of carnivorymeans that during a largepart of our evolution our diet was high in protein besides being high in fat. If we look at the Paleo nutrition templateas a safety templet,this paper concludesthat it seems to be safe to consume a highportion of the diet from animal protein, possiblyto the tune of 30-40% of the daily calories. Since every calorie of protein that we do not consume is a calorie that will be consumed from another food source, the Paleo template guides us to consider the relative safety of alternatives to proteinwhen deciding on the actual level of protein consumption. Not many alternatives foods can claim to have nearly two million years of safe consumption.
References:
1.Domínguez-Rodrigo M, Pickering TR. The meat of the matter: an evolutionary perspective on human carnivory. Azania: Archaeological Research in Africa. 2017;52(1):4-32.2.Cordain L, Miller JB, Eaton SB, Mann N, Holt SHA, Speth JD. Plant-animal subsistence ratios and macronutrient energy estimations in worldwide hunter-gatherer diets 1 , 2. The American Journal of Clinical Nutrition. 2000;71(3):682-92.3.Marlowe FW. Hunter‐gatherers and human evolution. Evolutionary Anthropology: Issues, News, and Reviews. 2005;14(2):54-67.4.Lee RB. What hunters do for a living, or, how to make out on scarce resources. Man the Hunter. Chicago: Aldine Publishing Company; 1968.5.Eaton SB, Konner M. Paleolithic nutrition -a consideration of its nature and current implications. New Engl J Med. 1985;312(5):283-9. doi: 10.1056/nejm198501313120505. PubMed PMID: WOS:A1985AAQ2000005.6.Ströhle A, Hahn A. Diets of modern hunter-gatherers vary substantially in their carbohydrate content depending on ecoenvironments: results from an ethnographic analysis. Nutr Res.2011;31(6):429-35.7.Konner M, Eaton SB. Paleolithic nutrition twenty-five years later. Nutr Clin Pract. 2010;25(6):594-602.8.Kuipers RS, Joordens JC, Muskiet FA. A multidisciplinary reconstruction of Palaeolithic nutrition that holds promise for the prevention and treatment of diseases of civilisation. Nutr Res Rev.2012;25(01):96-129.9.Ascher R. Analogy in archaeological interpretation. Southwestern journal of Anthropology. 1961;17(4):317-25.10.Smith FA, Smith REE, Lyons SK, Payne JL. Body size downgrading of mammals over the late Quaternary. Science. 2018;360(6386):310-3.11.Bibi F, Pante M, Souron A, Stewart K, Varela S, Werdelin L, et al. Paleoecology of the Serengeti during the Oldowan-Acheulean transition at Olduvai Gorge, Tanzania: The mammal and fish evidence. J Hum Evol. 2017.12.Werdelin L, Lewis ME. Temporal change in functional richness and evenness in the eastern African Plio-Pleistocene carnivoran guild. PLoS ONE. 2013;8(3):e57944.13.Koch PL, Barnosky AD. Late Quaternary Extinctions : State of the Debate. Annu Rev Ecol, Evol Syst.2006;37:215-52. doi: 10.1146/annurev.ecolsys.34.011802.132415.14.Faith JT. Late Pleistocene and Holocene mammal extinctions on continental Africa. Earth-Sci Rev.2014;128:105-21.10Journal of Evolution and Health, Vol. 3 [2018], Iss. 1, Art. 10https://jevohealth.com/journal/vol3/iss1/10DOI: 10.15310/2334-3591.10961115.Marlowe F. The Hadza: Hunter-gatherers of Tanzania: University of California Press; 2010. 325 p.16.Lee RB. The Kung San: men, women, and work in a foraging society. Cambridge: Cambridge University Press; 1979. 526 p.17.Barnes R. The decline of the baobab tree in Ruaha National Park, Tanzania. Afr J Ecol.1980;18(4):243-52.18.Lee RB. Mongongo: the ethnography of a major wild food resource. Ecol Food Nutr.1973;2(4):307-21.19.Psouni E, Janke A, Garwicz M. Impact of carnivory on human development and evolution revealed by a new unifying model of weaning in mammals. PLoS ONE. 2012;7(4):e32452.20.Pond CM, Mattacks CA. Body mass and natural diet as determinants of the number and volume of adipocytes in eutherian mammals. J Morphol.1985;185(2):183-93.21.Beasley DE, Koltz AM, Lambert JE, Fierer N, Dunn RR. The evolution of stomach acidity and its relevance to the human microbiome. PLoS ONE. 2015;10(7):e0134116.22.Isaac GL. The Harvey Lecture series, 1977-1978. Food sharing and human evolution: archaeological evidence from the Plio-Pleistocene of east Africa. J Anthrop Res.1978;34(3):311-25.23.Yravedra J, Rubio-Jara S, Panera J, Martos JA. Hominins and Proboscideans in the Lower and Middle Palaeolithic in the central Iberian Peninsula. Quat Int.2017.24.Ben-Dor M, Gopher A, Hershkovitz I, Barkai R. Man the fat hunter: the demise of Homo erectus and the emergence of a new hominin lineage in the Middle Pleistocene (ca. 400 kyr)Levant. PLoS ONE. 2011;6(12):e28689. doi: 10.1371/journal.pone.0028689.25.McNeil N. The contribution of the large intestine to energy supplies in man. The American journal of clinical nutrition. 1984;39(2):338-42.26.Popovich DG, Jenkins DJ, Kendall CW, Dierenfeld ES, Carroll RW, Tariq N, et al. The western lowland gorilla diet has implications for the health of humans and other hominoids. The Journal of nutrition. 1997;127(10):2000-5.27.Wrangham RW, Conklin-Brittain NL, Hunt KD. Dietary response of chimpanzees and cercopithecines to seasonal variation in fruit abundance. I. Antifeedants. Int J Primatol. 1998;19(6):949-70.28.Milton K. Primate diets and gut morphology: implications for hominid evolution. In: Harris M, Ross E, editors. Food and evolution: toward a theory of human food habits. Philadelphia: Temple University Press; 1987. p. 93-115.29.Wood B, Strait D. Patterns of resource use in early Homo and Paranthropus.J Hum Evol.2004;46(2):119-62.11
1230.Aiello LC, Wheeler P. The expensive-tissue hypothesis: the brain and the digestive system in human and primate evolution. CurrAnthr.1995;36(2):199-221.31.Lucas PW, Sui Z,Ang KY, Tan HTW, King SH, Sadler B, et al. Meals versus snacks and the human dentition and diet during the Paleolithic. The Evolution of Hominin Diets: Springer; 2009. p. 31-41.32.Miller E, Ullrey D. The pig as a model for human nutrition. Annu Rev Nutr.1987;7(1):361-82.33.Bramble DM, Lieberman DE. Endurance running and the evolution of Homo. Nature. 2004;432(7015):345-52.34.Pickering TR, Bunn HT. The endurance running hypothesis and hunting and scavenging in savanna-woodlands. J Hum Evol. 2007;53(4):434-8.35.Gittleman JL, Harvey PH. Carnivore home-range size, metabolic needs and ecology. Behav Ecol Sociobiol. 1982;10(1):57-63.36.Roach NT, Venkadesan M, Rainbow MJ, Lieberman DE. Elastic energy storage in the shoulder and the evolution of high-speed throwing in Homo. Nature. 2013;498(7455):483-6.37.Zihlman AL, Bolter DR. Body composition in Pan paniscus compared with Homo sapiens has implications for changes during human evolution. Proceedings of the National Academy of Sciences. 2015:201505071.38.Pond CM. Morphological aspects and the ecological and mechanical consequences of fat deposition in wild vertebrates. Annu Rev Ecol Syst.1978;9(1):519-70.39.Pontzer H, Raichlen DA, Wood BM, Emery Thompson M, Racette SB, Mabulla AZ, et al. Energy expenditure and activity among Hadza hunter‐gatherers. Amer J Hum Biol.2015;27(5):628-37.40.Perry G, Dominy N, Claw K, Lee A. Diet and the evolution of human amylase gene copy number variation. Nature. 2007;39(10):1256.41.Falchi M, Moustafa JSE-S, Takousis P, Pesce F, Bonnefond A, Andersson-Assarsson JC, et al. Low copy number of the salivary amylase gene predisposes to obesity. Nat Genet.2014;46(5):492-7.42.Des Gachons CP, Breslin PA. Salivary amylase: digestion and metabolic syndrome. Curr Diab Rep. 2016;16(10):102.43.Fernández CI, Wiley AS. Rethinking the starch digestion hypothesis for AMY1 copy number variation in humans. Amer J Phys Anthrop.2017;163(4):645-57.44.Atkinson FS, Hancock D, Petocz P, Brand-Miller JC. The physiologic and phenotypicsignificance of variation in human amylase gene copy number. The American journal of clinical nutrition. 2018;108(4):737-48.
Subscribe to:
Posts (Atom)