The future of sperm: a biovariability framework for understanding global sperm count trends. Marion Boulicault et al. Human Fertility, May 10 2021. https://doi.org/10.1080/14647273.2021.1917778
Abstract: The past 50 years have seen heated debate in the reproductive sciences about global trends in human sperm count. In 2017, Levine and colleagues published the largest and most methodologically rigorous meta-regression analysis to date and reported that average total sperm concentration among men from ‘Western’ countries has decreased by 59.3% since 1973, with no sign of halting. These results reverberated in the scientific community and in public discussions about men and masculinity in the modern world, in part because of scientists’ public-facing claims about the societal implications of the decline of male fertility. We find that existing research follows a set of implicit and explicit assumptions about how to measure and interpret sperm counts, which collectively form what we term the Sperm Count Decline hypothesis (SCD). Using the study by Levine and colleagues, we identify weaknesses and inconsistencies in the SCD, and propose an alternative framework to guide research on sperm count trends: the Sperm Count Biovariability hypothesis (SCB). SCB asserts that sperm count varies within a wide range, much of which can be considered non-pathological and species-typical. Knowledge about the relationship between individual and population sperm count and life-historical and ecological factors is critical to interpreting trends in average sperm counts and their relationships to health and fertility.
Keywords: Sperm count declinemale reproductive healthmale fertilitysemen analysisandrologyhuman reproduction
Contrasting the sperm count decline and sperm count biovariability hypotheses
In this paper we contrast the Sperm Count Decline and Sperm Count Biovariability hypotheses. We understand a hypothesis as not only a set of propositions open to empirical testing, but also as a set of implicit and explicit model-theoretic assumptions about the world that provides a framework for collecting and interpreting new and existing data and setting research agendas.
The SCD hypothesis interprets data on sperm count over time as a metric of men’s potential fertility, a proxy for men’s health, and an assay of environmental quality. According to SCD, a decline in ‘Western’ sperm counts from 1970s levels indicates a decline in male fertility, health, and a sign of a degrading environment. By contrast, the SCB hypothesis allows for the possibility of both pathological and non-pathological variation in sperm counts across populations and time. SCB begins with the premise that, above the threshold necessary for fertility, there is no basis to assume that high average population sperm counts are optimal. Nor is there any reason to believe that sperm counts in the 1970s are a species-typical baseline. SCB posits that sperm count varies within and across bodies in ways that are compatible with health such that a decline in an individual or population may not necessarily signal danger to fertility or well-being. We emphasise that while SCB invites a wider explanation and interpretation of sperm count trends, it does not exclude the possibility of sperm count decline or that decline may carry implications for men’s health and fertility. The SCB hypothesis provides a framework for exploring the trends identified by Levine et al. (2017) that considers the possibility that these trends can be explained by benign or adaptive variation in sperm counts in relation to diverse contexts and factors. Rather than treat nations or regions of global wealth as proxies for stable populations or biologically meaningful environments, SCB calls for testing links between specific developmental and proximate stimuli and sperm count outcomes, recognising human biological variation as local and situated (Lock, 2017; Niewohner & Lock, 2018).
From an SCB perspective, the data points that make up the 2017 meta-analysis simply demonstrate that sperm count varies across bodies, ecologies, and time periods. Examining the same data and background literature with a different set of assumptions, SCB argues that the interpretation that population sperm counts vary within a wide optimum with little consequence for fertility is at least as plausible as the interpretation that steady decline occurs.
We argue in favour of the SCB as a framework for interpreting population trends in human sperm counts. It identifies testable hypotheses that include both pathological and non-pathological explanations for and outcomes of observed variation in sperm counts. Table 2 contrasts the propositions of the SCD and SCB hypotheses. In the following sections, we analyse each proposition pair in turn.
1. Sperm count and men’s fertility
The SCD hypothesis contends that lower average population sperm counts portend higher rates of male infertility, positioning sperm count decline as a marker or cause of reproductive crisis for the human species. Levine et al. (2017) for example, infer that ‘declining mean [sperm count] implies that an increasing proportion of men have sperm counts below any given threshold for sub-fertility or infertility’. Levine et al. (2017) link this to claims of increasing ‘economic and societal burden of male infertility’ (p. 649).
There is little evidence that this is true. Levine et al. (2017) contend that the high circa 1973 numbers represent normal, healthy, and natural levels, while today's numbers represent a crisis and decline from a prior optimum. But current Western average sperm counts reported by Levine et al. (2017) for men unselected for fertility are well within the ‘normal’ range, defined by the World Health Organisation (WHO) as 15–259 million per mL for individuals (World Health Organization, 2010, p. 224). That is, the Levine et al. (2017) study reports a population average decline from ‘normal’ (99 million sperm per ml) to ‘normal’ (47 million sperm per ml). Furthermore, in absolute numbers, the 2009–2011 Unselected Western sperm counts (47.1 million/mL), which are ostensibly cause for alarm, are in fact relatively close to absolute sperm counts in ‘Other’ countries back in the 1978–1983 period (66.4 million/mL for Fertile Other, 72.7 million/mL for Unselected Other) and in the 2010–2011 period (75.7 million/mL for Fertile Other, 62.6 million/mL for Unselected Other).
Male infertility is a complex biological and social phenomenon that cannot be understood in terms of the single metric of sperm count (Guzick et al., 2001). Though azoospermia (sperm count of zero) guarantees infertility, researchers have found that some men with low sperm counts can conceive, while others with higher counts cannot (Patel et al., 2018; Wang & Swerdloff, 2014). Guzick et al. (2001) demonstrate that even sperm concentrations in the so-called sub-fertile range of less than 13.5 million/mL ‘do not exclude the possibility of normal fertility’ (p. 1392). Of note, the 2010 WHO reference values for semen parameters do not predict infertility, as the values were determined by studying fertile men; therefore, while the top 95% of sperm concentrations in the sample were taken to be the reference range, all of the men with sperm concentrations below the 5th centile were also fertile (Chiles & Schiegel, 2015; Cooper et al., 2010). Other studies from across the world have similarly confirmed the fertility of men below the WHO reference values (Haugen et al., 2006; Tang et al., 2015; Zedan et al., 2018).
Clinicians do not report proportionate increases in infertile men presenting for clinical consultation over Levine et al.’s study period (Inhorn & Patrizio, 2015). As urologist Peter Schlegel remarked in The New York Times in reference to the Levine et al. (2017) meta-analysis, ‘If you had a decrease in sperm count in the 50 to 60 percent range, we would expect the proportion of men with severe male infertility to be going up astronomically. And we don’t see that’ (Bowles, 2018). There is insufficient evidence to support claims of increasing rates of male subfertility in recent decades (Inhorn & Patrizio, 2015).
We note that there exists no species optimum in many other measures of reproductive function in men and women. As a concrete example, the gonadal steroid hormones testosterone, oestradiol, and progesterone are necessary to fertility (Dohle et al., 2003; Laufer et al., 1982; Welt et al., 2003). Researchers have documented significant variation in these hormones within and across populations and within individuals over time (Bjørnerem et al., 2006; Stanton et al., 2011; Vitzthum et al., 2004). As the WHO does with sperm count, researchers validate and publish non-pathological ranges for gonadal steroid hormones for use in clinical evaluation (Bhasin et al., 2011; Elmlinger et al., 2002). Within those ranges, higher levels are not considered absolute signals of better fertility or health (Bribiescas, 2016).
2. Sperm count as an assay of men’s health
SCD interprets sperm count decline as a biomarker of declining overall health status among men. Citing studies associating reduced sperm count and ‘increased all-cause mortality and morbidity’ (Levine et al., 2017, pp. 647, 649, 654), Levine et al. (2017) hypothesise that average population declines in sperm counts represent ‘a ‘canary in the coal mine’ for male health across the lifespan’ (p. 654; see also Skakkebaek et al., 2001). This metaphor suggests that low sperm counts are not only a barometer of men’s current health, but also a warning sign of future risks.
While there is evidence of a relationship between abnormal semen parameters and poor health status (Eisenberg et al., 2014), there is little evidence that average sperm count by itself is a valid summary measure of health status of men within a population. Recent work in a population in Córdoba, Argentina, suggests that, while semen parameters decline with age, lifestyle and health factors such as obesity, alcohol, and smoking have only modest associations with decline (Veron et al., 2018). Similar findings exist with respect to other semen parameters: a 1999 study of 939 UK men found no relation between sperm motility and common lifestyle factors such as consumption of alcohol, use of tobacco or recreational drugs, or high body mass index (Povey et al., 2012).
Specific relationships between sperm parameters and developmental and current conditions, including health status, remain to be established. Sperm variability can reflect endogenous and exogenous stimuli on both short and longer time scales. Spermatogenesis is a 42–76 day process (Misell et al., 2006). Interventions can occur at any point from the first division of the spermatogonia to the mature sperm’s journey through the epididymis (Chenoweth & Lorton, 2014). Research on livestock indicates that, depending on the developmental stage of their influence, effects can be permanent or may resolve. For example, enhanced nutrition in early life increased adult sperm production in bulls, but later-life nutrition could not compensate for early-life nutritional deficits (Kastelic, 2013). Seasonal climate variation, however, had only a transient effect on sperm parameters in bulls (Valeanu et al., 2015). Further research is needed to establish whether the same range of developmental and transient effects can be found in humans.
Prospective study models that use repeat individual measures in combination with a wide variety of social and biological measures are needed to identify potential confounders and causal variables in sperm biovariation. Such variables include transient exposures such as heat or tight clothing; the stimulus conditions under which the sample was collected, including available arousal material and duration of arousal pre-ejaculation; lifestyle factors including activity and diet; and developmental or environmental exposures like maternal smoking, pollutants, and endocrine disruptors (for examples of existing cross-sectional studies along these lines, see Gaskins et al., 2015; Inhorn et al., 2008; Priskorn et al., 2018). Without longitudinal individual and population data with sufficient ecological granularity, causal claims about the relationship between average population sperm counts and environments or lifestyles cannot be empirically substantiated.
In any case, the connection between sperm count and health is mediated by the individual’s recent experience and prior life history. For example, increased exercise does not have a stable relationship to sperm production, in part because the effect of exercise is mediated by current fitness level (Ibañez-Perez et al., 2019; Jóźków & Rossato, 2017; Rosety et al., 2017). Factors such as seasonal temperature and illness do not have uniform effects on the sperm production process for similar reasons. Given the range of relationships between stimulus and effect in spermatogenesis, sperm count is not an independent metric of human well-being.
3. Sperm count and environmental pollutants
In line with the TDS hypothesis (Bay et al., 2006; Skakkebaek, 2016), SCD asserts that the likely causes for sperm count declines among ‘Western’ populations are endocrine disruptors and other environmental pollutants introduced by industrialisation, as well as changes in men’s lifestyles. Levine et al. (2017) write that, ‘sperm count and other semen parameters have been plausibly associated with multiple environmental influences, including endocrine disrupting chemicals, pesticides, heat and lifestyle factors, including diet, stress, smoking and BMI’ (Levine et al., 2017, p. 649). In a Guardian article titled, ‘Sperm counts are on the decline - could plastics be to blame?,’ Levine identifies endocrine-disrupting chemicals (EDCs) such as plastics as a major cause of dropping sperm counts (Carr, 2019).
While environmental context undoubtedly affects men’s health, empirical research to date does not support a stable causal relationship between EDCs – exogenous chemicals that interfere with hormone action, typically through mimicking endogenous hormones and binding to protein receptors – and any indices of sperm health, including sperm count, sperm motility, and fertility (Bonde et al., 2016; Zamkowska et al., 2018). Scientists have approached questions about the impact of EDCs on reproductive function through animal models and human studies. In animal studies, male rodents are exposed to specific quantities of EDCs in a controlled environment, and systematically examined for effects on their reproductive health. In contrast, human clinical and epidemiological studies are primarily observational, studying the sperm of human males in the general population who were accidently exposed to unspecified levels of EDCs.
The strongest evidence for the impact of EDCs on human populations lies in their action as somatic carcinogens (Soto & Sonnenschein, 2010). Although reproductive cancers could plausibly lower sperm count, this pathway cannot explain the patterns reported by Levine et al. (2017) as they exclude cancer patients from their study. EDC exposure is also associated with risk for a wide range of health conditions outside of its effects on reproductive health, including non-reproductive cancers, diabetes, thyroid disorders, and neurological conditions (Gore et al., 2015). Some scientific research suggests that EDCs can have reproductive, neurological, and immunological effects on developing human foetuses (of both males and females), but more research is required to establish the exact relationship (Abaci et al., 2009; Bonde et al., 2016).
Even if EDCs cause a decline in sperm count, higher levels of industrial pollutant exposure in the West cannot explain the divergent trends in Levine et al.’s categories of ‘West’ versus ‘Other.’ Scientists have used the global distribution of plastics as a geological indicator of the extent of human altered landscapes (Zalasiewicz et al., 2016). It is widely established that the inequities of global capitalism disproportionately burden the global poor and indigenous peoples with the consequences of toxic pollution (Martinez-Alier et al., 2016). Substantial evidence suggests that pesticide poisoning is an equal or greater problem in low- and middle-income countries as in high-income countries (Jørs et al., 2018); the World Health Organization (2016) reports that 98 percent of people in urban low- and middle-income countries are exposed to unhealthy levels of toxic pollution.
The study period of 1973 to 2011 included in Levine et al. (2017) accompanied increasing global levels of industrial pollution (He et al., 2002; Karan & Bladen, 1976; Ramakrishnan, 2018). As a detailed example, consider two studies from Hyderabad, Andhra Pradesh, both included in Levine et al. (2017): one from a lead-acid battery manufacturing facility in Patancheru District and another at an anonymous lead welding facility (Danadevi et al., 2003; Vani et al., 2012). The 1970s initiated a rapid intensification of environmental pollution, in the form of waste disposal, air pollution, wastewater effluents, and occupational exposures in this region (Danadevi et al., 2003; Vani et al., 2012). Half of India’s rivers today are polluted by industrial wastewater effluents introduced in the 1970s; in Hyderabad, water sources were contaminated with industrial metals as early as 1983 (Karan & Bladen, 1976; Prahalad & Seenayya, 1988). Vegetables grown and consumed in urban Hyderabadi areas are packed with lead, cadmium and chromium, and bodies of water sampled are saturated with EDCs (Kiran Kumar et al., 2016; Ramakrishnan, 2018; Srikanth & Papi Reddy, 1991). In summary, evidence does not support the claim that increased exposure to environmental pollutants in the nations categorised by Levine et al. (2017) as ‘Western’ could be a plausible driver of distinctions between average population sperm count in ‘Western’ compared to ‘Other’ nations.
4. ‘West,’ ‘Other,’ and nations as units of population
Levine et al. (2017) extracted 244 estimates of average sperm counts from human sperm samples collected over the period 1973–2011 and reported in English-language publications, representing 42,935 individual men across the globe. They report that the average sperm concentration across Unselected ‘Western’ populations was 99 million/ml in 1973, declining to 47.1 million/ml in 2011, or 52.4% overall (Table 3). Sperm declines were only statistically significant in studies in ‘Western’ countries (Levine et al., 2017, p. 654), while ‘[n]o significant trends in SC [sperm concentration] or TSC [total sperm count] were seen in ‘Other’ countries overall, or for Unselected or Fertile men separately’ (Levine et al., 2017, p. 652).
Table 3. Sperm concentration in the first and last years of the Levine et al. (2017) meta-regression analysis, for all men and by fertility and geographic groups ‘Western’ and ‘Other.’
The study design of Levine et al. (2017) separated men along axes of fertility and geography. First, they categorised men as ‘Unselected,’ meaning that it was not known whether or not they had conceived a pregnancy, or as ‘Fertile,’ meaning that they had conceived a pregnancy. They next disaggregated men by the nation of the study in which they participated: Europe/Australia; North America; and ‘Other,’ which included South America, Asia, and Africa.
Constituting ‘Western’ and ‘Other’
Notably, the model used by Levine et al. (2017) generated statistically significant declines in sperm concentration over time for both Unselected and Fertile Europe/Australia cohorts, and for the Unselected North America Cohort, but not for the Fertile North America cohort (p = 0.29) or either of the Other cohorts (p = 0.30 for Unselected Other, 0.41 for Fertile Other) (Levine et al. (2017) Table S3). In the final published study, Levine et al. (2017) aggregated North American and Europe/Australia data to create a ‘Western’ cohort. Their justification was the similarity of effect magnitude in the data (despite one subgroup – Fertile North America – being statistically insignificant) and that North American data comprised only 16% of the estimates. In the final model, both Unselected and Fertile Western had statistically significant negative effects. In other words, sperm count declines in North America among Fertile men, which were not previously significant (p = 0.29), gained manufactured significance (p = 0.033) by being weighted with the European/Australian data in the final model.
It is justifiable to explore multiple aggregations of data along hypothesis-driven inquiries. However, the reframing of a statistically insignificant decline in fertility among Fertile North American men implies a level of certainty that the data do not support. When this certainty is adopted by public-facing reporting, it not only contributes to unfounded panic over ‘Western’ fertility, but also may influence the course of future research programs.
The data included in the meta-analysis are sparse by any measure. Global coverage is mottled and asymmetric. Levine et al. (2017) recognise that data on sperm count in ‘Other’ countries is much sparser than in ‘Western’ countries, as illustrated by Figure 1 and Table 3. Less evident yet still important is the quantitative and qualitative variation in the data points at the level of the nation and the region. For example, the preponderance of sperm count studies over a range of time periods from several major Danish cities included in the meta-analysis might allow researchers to describe general population trends in sperm count within Denmark. However, the same number of studies, or fewer, conducted at disparate times in such large and heterogeneous countries such as India or China cannot hope to capture the same granularity of data by averaging sperm counts.
Studies of men unselected by fertility (i.e. men assumed to be representative of the general population in a given geographic area) included in the meta-analysis vary in study design and sample composition across geographic location. For example, control groups in studies of the impact of an environmental exposure on sperm quality were extracted for inclusion in Levine et al. (2017). Studies conducted in this way contribute more samples to some national data pools than others. For example, of 10 separate studies conducted in Denmark, four (40%) were interested in the impact of a specific exposure (e.g. pesticides or maternal folic acid) on sperm quality. By contrast, 13 of the 16 studies in the United States (81%) are exposure studies, looking at the effects of a chemical exposure, smoking, stress, or a medical condition such as cryptorchidism on sperm quality. And 100% of the five studies used for Unselected samples in all of Central and South America were designed to study sperm quality in the context of a specific exposure, whether pesticides, contaminants, or a medical condition. This is important because the controls in these exposure studies are often convenience samples relative to study subjects. For example, a study in Mexico City on rubber factory workers exposed to hydrocarbons used a control group consisting of employees working in the factory’s administrative offices (De Celis et al., 2000), and a study in San Francisco on sperm quality in anaesthesiologists had anaesthesia residents serve as controls (Wyrobek et al., 1981). As Fleiss and Gross (1991) explain, factors other than exposure may affect whether a sample is a case or a control, and these confounding variables can obscure the associations of interest.
Interpreting average sperm count in a nation
The SCD treats nations and continents as bounded populations, with men unselected by fertility described as ‘more likely to be representative of the general population’ in that nation or continent (Levine et al., (2017), p. 655). That is, continents or nations are conceptualised as population samples that can be used to compare, for example, average 1973 sperm counts to average 2011 sperm counts.
Within this paradigm, the categories ‘West’ and ‘Other’ rely on a particular vision of a static national population that obfuscates the role of several types of migration in continually redefining how these populations are constructed. Since the 1970s, repeated, large-scale, highly varied movements of populations have occurred across national borders, and in particular between nations categorised by Levine et al. (2017) as ‘Other’ to ‘Western’ countries. Yet these movements, from East and South to North and West, from rural to urban, and across many kinds of differently polluted and polluting ecologies, are lost entirely in the racial/national geopolitical categories of difference uncritically embraced by current instantiations of the SCD hypothesis. If biological variables such as sperm count are to be understood as ecologically dependent at the population level, patterns of migration since the 1970s that have fundamentally reshaped nations categorised as ‘West’ and ‘Other’ must be taken into consideration.
During the time period covered by the Levine et al. (2017) meta-analysis, patterns of migration have redistributed formerly concentrated populations into a contingent of increasingly heterogeneous cities and states, predominantly in the Global North and West. In Western Europe, decolonisation as well as the proliferation of guest worker programmes to meet the needs of a broadly booming post-war economy brought individuals from former colonies distributed over three continents, combined with workers from North Africa and Southern Europe, into Northern and Western Europe (Moch, 2003; Van Mol & de Valk, 2016). Sweden, for instance, which historically had higher emigration than immigration, saw a rapid population change after World War II. Immigration rates peaked in 2013. As a result, 24% of the population is now foreign-born, and its ethnic composition has also shifted. Migration from Africa again rose in the 1990s and migration from across Asia and Latin America into Western Europe rose significantly from the turn of the 21st century (Pellegrino, 2004; Van Mol & de Valk, 2016). Meanwhile in North America, the United States has also seen significant demographic shifts precipitated by evolving migration patterns. Due to alterations to US migration law after 1965, most immigrants after 1970 were of Latin American or Asian origin, whereas previously they had been predominantly of European origin. Simultaneously, total immigration in the US increased dramatically – from approximately 5% to nearly 15% of the total population (Martin, 2014).
Within the nations broadly categorised by Levine et al. (2017) as ‘Other,’ which comprise the large majority of the world by aggregate population, migration has also played a formative role in shaping demographic distribution since the 1970s. Large-scale internal migration in large and populous nations such as China and India, predominantly from rural to urban settings, was generated by rapid industrialisation and entry into global markets by many states in the 1970s that created the need for more robust industrial workforces (Liang & Ma, 2004; Lusome & Bhagat, 2006). In the same states during the same period, international migration played a similarly important role in structuring demographics; the Opening of China in the later-1980s saw a renewed wave of emigration of diverse individuals, and the oil boom in the Gulf States in the 1970s saw the efflux of labourers from India and into new ‘Other’ nations (Ecevit, 1981; Ganeshan, 2011; Khadria, 2006; Xiang, 2016). In summary, particularly if the SCD assumes that the influence of interest is the individual’s developmental rather than current environment, country of residence is a poor proxy for a sample population, because populations have not stayed within their borders during the study period.
Attending to the globalising processes of migration, development, and pollution reveals that the differences assumed between so-called ‘West’ and ‘Other’ countries do not apply to the study period covered in Levine et al. (2017). South India can act as an exemplar of a region that has undergone momentous shifts in ecology and demography since the 1970s, brought about by ongoing internal and international migration and environmental pollution from globalising industrialism. The 1970s Gulf oil boom contributed to an increase in Indian emigration (particularly among males), as did India’s growing global economic presence, which led to a ‘brain drain’ migration among educated and skilled Indians to the global West (Chacko, 2007; Ganeshan, 2011). Now, across India, rural-to-urban migrants account for more than half of the population of cities (Ganeshan, 2011; Irudaya Rajan & Sumeetha 2020). Hyderabad too has seen decades of cyclical immigration from rural areas towards urban industrialisation, emigration abroad both to ‘West’ and ‘Other’ nations, and a recent reversal of this process wherein Western-trained Indians return as Hyderabad grows in its transnational, globally connected contemporary networks (Chacko, 2007). The social and environmental experience of growing up and living in South India in the 1970s, for example, is not comparable to that of South India in the 2010s. It is unclear how Levine et al. (2017) locate ‘India’ in their analysis, whether as a place with a set of defined ecological conditions, as a group of people, or as a place that might have changed over time ecologically but where the population has remained constant enough to allow for disambiguation of the effects of place on sperm production from any other effects. We suggest that it is not obvious that the genetic composition of a population in a given place remains the same over time. Nor is it certain that the people in a given place have experienced the developmental environment of that place, or that the place has remained ecologically stable (or not) in predictable or documented ways.
A biovariability framework emphasises that the appropriate unit of analysis to understand relationships between ecology and average population sperm count is a spatiotemporally continuous population, in which bio-environmental and socio-cultural exposures and their outcomes can be tracked over time within and among individuals at multiple time points. Levine et al. (2017) assume that geographic regions such as nations and continents over the period from 1973 to the present day represent such populations. Although it may sometimes be the case that geopolitical boundaries can meet these criteria, the highly dynamic history of migration and environmental change since 1973, wrought by increasingly globalised processes, indicates that the nation-level sperm count averages utilised by Levine et al. (2017) are inappropriate categories for understanding sperm count epidemiology.