Thursday, November 28, 2019

Great interest exists in identifying methods to predict neuropsychiatric disease states and treatment outcomes from high-dimensional data, including neuroimaging and genomics data; best practices are discussed

Establishment of Best Practices for Evidence for Prediction: A Review. Russell A. Poldrack, Grace Huckins, Gael Varoquaux. JAMA Psychiatry, November 27, 2019. doi:https://doi.org/10.1001/jamapsychiatry.2019.3671

Abstract
Importance  Great interest exists in identifying methods to predict neuropsychiatric disease states and treatment outcomes from high-dimensional data, including neuroimaging and genomics data. The goal of this review is to highlight several potential problems that can arise in studies that aim to establish prediction.

Observations  A number of neuroimaging studies have claimed to establish prediction while establishing only correlation, which is an inappropriate use of the statistical meaning of prediction. Statistical associations do not necessarily imply the ability to make predictions in a generalized manner; establishing evidence for prediction thus requires testing of the model on data separate from those used to estimate the model’s parameters. This article discusses various measures of predictive performance and the limitations of some commonly used measures, with a focus on the importance of using multiple measures when assessing performance. For classification, the area under the receiver operating characteristic curve is an appropriate measure; for regression analysis, correlation should be avoided, and median absolute error is preferred.

Conclusions and Relevance  To ensure accurate estimates of predictive validity, the recommended best practices for predictive modeling include the following: (1) in-sample model fit indices should not be reported as evidence for predictive accuracy, (2) the cross-validation procedure should encompass all operations applied to the data, (3) prediction analyses should not be performed with samples smaller than several hundred observations, (4) multiple measures of prediction accuracy should be examined and reported, (5) the coefficient of determination should be computed using the sums of squares formulation and not the correlation coefficient, and (6) k-fold cross-validation rather than leave-one-out cross-validation should be used.

---
Excerpts (full paper, references, etc., at the DOI above):

Introduction

The development of biomarkers for disease is attracting increasing interest in many domains of biomedicine. Interest is particularly high in neuropsychiatry owing to the current lack of biologically validated diagnostic or therapeutic measures.1 An essential aspect of biomarker development is demonstration that a putative marker is predictive of relevant behavioral outcomes,2 disease prognosis,3 or therapeutic outcomes.4 As the size and complexity of data sets have increased (as in neuroimaging and genomics studies), it has become increasingly common that predictive analyses have been performed using methods from the field of machine learning, with techniques that are purpose-built for generating accurate predictions on new data sets.

Despite the potential utility of prediction-based research, its successful application in neuropsychiatry—and medicine more generally—remains challenging. In this article, we review a number of challenges in establishing evidence for prediction, with the goal of providing simple recommendations to avoid common errors. Although most of these challenges are well known within the machine learning and statistics communities, awareness is less widespread among research practitioners.

We begin by outlining the meaning of the concept of prediction from the standpoint of machine learning. We highlight the fact that predictive accuracy cannot be established by using the same data both to fit and test the model, which our literature review found to be a common error in published claims of prediction. We then turn to the question of how accuracy should be quantified for categorical and continuous outcome measures. We outline the ways in which naive use of particular predictive accuracy measures and cross-validation methods can lead to biased estimates of predictive accuracy. We conclude with a set of best practices to establish valid claims of successful prediction.

Code to reproduce all simulations and figures is available at https://github.com/poldrack/PredictionCV.

Association vs Prediction

A claim of prediction is ultimately judged by its ability to generalize data to new situations; the term implies that it is possible to successfully predict outcomes in data sets other than the one used to generate the claim. When a statistical model is applied to data, the goodness of fit of that model to those data will in part reflect the underlying data-generating mechanism, which should generalize to new data sets sampled from the same population, but it will also include a contribution from noise (ie, unexplained variation or randomness) that is specific to the particular sample.5 For this reason, a model will usually fit better to the sample used to estimate it than it will to a new sample, a phenomenon known in machine learning as overfitting and in statistics as shrinkage.

Because of overfitting, it is not possible to draw useful estimates of predictive accuracy simply from a model’s goodness of fit to a data set; such estimates will necessarily be inflated, and their degree of optimism will depend on many factors, including the complexity of the statistical model and the size of the data set. The fit of a model to a specific data set can be improved by increasing the number of parameters in the model; any data set can be fit with 0 error if the model has as many parameters as data points. However, as the model becomes more complex than the process that generates the data, the fit of the model starts to reflect the specific noise values in the data set. A sign of overfitting is that the model fits well to the specific data set used to estimate the model but fits poorly to new data sets sampled from the same population. Figure 1 presents a simulated example, in which increasing model complexity results in decreased error for the data used to fit the model, but the fit to new data becomes increasingly poor as the model grows more complex than the true data-generating process.

Because we do not generally have a separate test data set to assess generalization performance, the standard approach in machine learning to address overfitting is to assess model fit via cross-validation, a process that uses subsets of the data to iteratively train and test the predictive performance of the model. The simplest form of cross-validation is known as leave-one-out, in which the model is successively fit on every data point but 1 and is then tested on that left-out point. A more general cross-validation approach is known as k-fold cross-validation, in which the data are split into k different subsets, or folds. The model is successively trained on every subset but 1 and is then tested on the held-out subset. Cross-validation can also help discover the model that will provide the best predictive performance on a new sample (Figure 1).

One might ask how poorly inflated the in-sample association is as an estimate of out-of-sample prediction; if the inflation is small, or only occurs with complex models, then perhaps it can be ignored for practical purposes. Figure 2 shows an example of how the optimism of in-sample fits depends on the complexity of the statistical model; in this case, we use a simple linear model but vary the number of irrelevant independent variables in the model. As the number of variables increases, the fit of the model to the sample increases owing to overfitting. However, even for a single predictor in the model, the fit of the model is inflated compared with new data or cross-validation. The optimism of in-sample fits is also a function of sample size (Figure 2). This example demonstrates the utility of using cross-validation to estimate predictive accuracy on a new sample.


Statistical Significance vs Useful Prediction

A second reason that significant statistical association does not imply practically useful prediction is exemplified by the psychiatric genetic literature. Large genome-wide association studies have now identified significant associations between genetic variants and mental illness diagnoses. For example, Ripke et al6 compared more than 21 000 patients with schizophrenia with more than 38 000 patients without schizophrenia and found 22 genetic variants significant at a genome-wide level (P = 5 × 10−8), the strongest of which (rs9268895) had a combined P value of 9.14 × 10−14. However, this strongest association would be useless on its own as a predictor of schizophrenia. The combined odds ratio for this risk variant was 1.167; assuming a population prevalence of schizophrenia of 1 in 196 individuals as the baseline risk,7 possessing the risk allele for this strongest variant would raise an individual’s risk to 1 in 167. Such an effect is far from clinically actionable. In fact, the increased availability of large samples has made clear the point that Meehl8 raised more than 50 years ago, which stated that in the context of null hypothesis testing, as samples become larger, even trivial associations become statistically significant.

A more general challenge exists regarding the prediction of uncommon outcomes, such as a diagnosis of schizophrenia. Consider the case in which a researcher has developed a test for schizophrenia that has 99% sensitivity (ie, a 99% likelihood that the test will return a positive result for someone with the disease) and 99% specificity (ie, a 99% likelihood that the test will return a negative result for someone without the disease). These are performance levels that any test developer would be thrilled to obtain; in comparison, mammography has a sensitivity of 87.8% and a specificity of 90.5% for the detection of breast cancer.9 If this test for schizophrenia were used to screen 1 million people, it would detect 99% of those with schizophrenia (5049 individuals) but would also incorrectly detect 9949 individuals without schizophrenia; thus, even with exceedingly high sensitivity and specificity, the predictive value of a positive test result remains well below 50%. As we can straightforwardly deduce from the Bayes theorem, false alarm rates will usually be high when testing for events with low baseline rates of occurrence.


Misinterpretation of Association as Prediction

A significant statistical association is insufficient to establish a claim of prediction. However, in our experience, it is common for investigators in the functional neuroimaging literature to use the term prediction when describing a significant in-sample statistical association. To quantify the prevalence of this practice, we identified 100 published studies between December 24, 2017, and October 30, 2018, in PubMed by using the search terms fMRI prediction and fMRI predict. For each study, we identified whether the purported prediction was based on a statistical association, such as a significant correlation or regression effect, or whether the researchers used a statistical procedure specifically designed to measure prediction, such as cross-validation or out-of-sample validation. We only included studies that purported to predict an individual-level outcome based on fMRI data and excluded other uses of the term prediction, such as studies examining reward prediction error. A detailed description of these studies is presented in the eTable in the Supplement.

Of the 100 studies assessed, 45 reported an in-sample statistical association as the sole support for the claims of prediction, suggesting that the conflation of statistical association and predictive accuracy is common.10 The remaining studies used a mixture of cross-validation strategies, as shown in Figure 3.


Factors That Can Bias Assessment of Prediction

Although performing some type of assessment of an out-of-sample prediction is essential, it is also clear that cross-validation still leaves room for errors when establishing predictive validity. We now turn to issues that can affect the estimation of predictive accuracy even when using appropriate predictive modeling methods.

- Small Samples

The use of cross-validation with small samples can lead to highly variable estimates of predictive accuracy. Varoquaux11 noted that a general decrease in the level of reported prediction accuracy can be observed as sample sizes increase. Given the flexibility of analysis methods12 and publication bias for positive results, such that only the top tail of accuracy measures is reported, the high variability of estimates with small samples can lead to a body of literature with inflated estimates of predictive accuracy.

Our literature review found a high prevalence of small samples, with more than half of the samples comprising fewer than 50 people and 15% of the studies with samples comprising fewer than 20 people (Figure 3). Most studies that use small samples are likely to exhibit highly variable estimates. This finding suggests that many of the claims of predictive accuracy in the neuroimaging literature may be exaggerated and/or not valid.

- Leakage of Test Data

To give a valid measure of predictive accuracy, cross-validation needs to build on a clean isolation of the test data during the fitting of models to the training data. If information leaks from the testing set into the model-fitting procedure, then estimates of predictive accuracy will be inflated, sometimes wildly. For example, any variable selection that is applied to the data before application of cross-validation will bias the results if the selection involves knowledge of the variable being predicted. Of the 57 studies in our review that used cross-validation procedures, 10 may have applied dimensionality reduction methods that involved the outcome measure (eg, thresholding based on correlation) to the entire data set. This lack of clarity raises concerns regarding the level of methodological reporting in these studies.13

In addition, any search across analytic methods, such as selecting the best model or the model parameters, must be performed using nested cross-validation, in which a second cross-validation loop is used within the training data to determine the optimal method or parameters. The best practice is to include all processing operations within the cross-validation loop to prevent any potential for leakage. This practice is increasingly possible using cross-validation pipeline tools, such as those available within the scikit-learn software package (scikit-learn Developers).14

- Model Selection Outside of Cross-validation

Selecting a predictive method based on the data creates an opportunity for bias that could involve the potential use of a number of different classifiers, hyperparameters for those classifiers, or various preprocessing methods. As in standard data analysis, there is a potential garden of forking paths,15 such that data-driven modeling decisions can bias the resulting outcomes even if there is no explicit search for methods providing the best results. The outcomes are substantially more biased if an explicit search for the best methods is performed without a held-out validation set.

As reported in studies by Skocik et al16 using simulations and Varoquaux11 using fMRI data, it is possible to obtain substantial apparent predictive accuracy from data without any true association if a researcher capitalizes on random fluctuations in classifier performance and searches across a large parameter space. A true held-out validation sample is a good solution to this problem. A more general solution to the problem of analytic flexibility is the preregistration of analysis plans before any analysis, as is increasingly common in other areas of science.17

- Nonindependence Between Training and Testing Sets

Like any statistical technique, the use of cross-validation to estimate predictive accuracy involves assumptions, the failure of which can undermine the validity of the results. An important assumption of cross-validation is that observations in the training and testing sets are independent. While this assumption is often valid, it can break down when there are systematic relationships between observations. For example, the Human Connectome Project data set includes data from families, and it is reasonable to expect that family members will be closer to each other in brain structure and function than will individuals who are not biologically related.

Similarly, data collected as a time series will often exhibit autocorrelation, such that observations closer in time are more similar. In these cases, there are special cross-validation strategies that must be used to address this structure. For example, in the presence of family structure, such as the sample used in the Human Connectome Project, a researcher might cross-validate across families (ie, leave-k-families-out) rather than individuals to address the nonindependence potentially induced by family structure.18

- Quantification of Predictive Accuracy

Two main categories of problems occur in predictive modeling. The first, classification accuracy, involves the prediction of discrete class membership, such as the presence or absence of a disease diagnosis; the second, regression accuracy, involves the prediction of a continuous outcome variable, such as a test score or disease severity measure. In our literature review, we found that 37 studies performed classification while 64 performed regression to determine predictive accuracy. These strategies generally involve different methods for quantification of accuracy, but in each case, potential problems can arise through the naive use of common methods.

- Quantifying Classification Accuracy

In a classification problem, we aim to quantify our ability to accurately predict class membership, such as the presence of a disease or a cognitive state. When the number of members in each class is equal, then average accuracy (ie, the proportion of correct classifications, as used in the examples in Figure 2) is a reasonable measure of predictive accuracy. However, if any imbalance exists between the frequencies of the different classes, then average accuracy is a misleading measure. Consider the example of a predictive model for schizophrenia, which has a prevalence of 0.5% in the population; the classifier can achieve average accuracy of 99.5% across all cases by predicting that no one has the disease, simply owing to the low frequency of the disease.

A standard method to address the class imbalance problem is to use the receiver operating characteristic curve from signal detection theory.19 A receiver operating characteristic curve can be constructed given any continuous measure of evidence, as provided by most classification models. A threshold is then applied to this measure of evidence, systematically ranging from low (in which most cases will be assigned to the positive class, and the number of false positives will be high) to high (in which most cases will be assigned to the negative class, and the number of false positives will be low). The area under the curve can then be used as an integrated measure of classification accuracy. A perfect prediction leads to an area under the curve of 1.0, while a fully random prediction leads to an area under the curve of 0.5. Importantly, the area under the curve value of 0.5 expected by chance is not biased by imbalanced frequencies of positive and negative cases in the way that simple measures of accuracy would be. It is also useful to separately present the sensitivity (ie, the proportion of positive cases correctly identified as positive) and specificity (ie, the proportion of negative cases correctly identified as negative) of the classifier, to allow assessment of the relative balance of false positives and false negatives.

- Quantifying Regression Accuracy

It is increasingly common to apply predictive modeling in cases in which the outcome variable is continuous rather than discrete—that is, in regression rather than classification problems. For example, a number of studies in cognitive neuroscience have attempted to predict phenotypic measures, such as age,20 personality,21 or behavioral outcomes.22 For continuous predictions, accuracy can be quantified either by the relation between the predicted and actual values, relative to perfect prediction, or by a measure of the absolute difference between predicted and actual values (ie, the error). A relative measure is useful because its value can easily be related to the success of the prediction. For this purpose, a useful measure is the fraction of explained variance, often called the coefficient of determination or R2. If a model makes perfect predictions, its associated R2 value will be 1.0, whereas a model making random predictions should have an R2 value of approximately 0. If a model is particularly poor, to the point that its predictions are less accurate than they would be if the model simply returned the mean value for the data set, the R2 value can be negative, despite the fact that it is called R2. The disadvantage of this measure is that it does not support comparisons of the quality of predictions across different data sets because the variance of the outcome variable may differ between one data set and another. For this purpose, absolute error measurements, such as the mean absolute error, which has the benefit of quantifying error in the units of the original measure (such as IQ points), are useful.

It is common in the literature to use the correlation between predicted and actual values as a measure of predictive performance; of the 64 studies in our literature review that performed prediction analyses on continuous outcomes, 30 reported such correlations as a measure of predictive performance. This reporting is problematic for several reasons. First, correlation is not sensitive to scaling of the data; thus, a high correlation can exist even when predicted values are discrepant from actual values. Second, correlation can sometimes be biased, particularly in the case of leave-one-out cross-validation. As demonstrated in Figure 4, the correlation between predicted and actual values can be strongly negative when no predictive information is present in the model. A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).

As discussed previously in this section, leave-one-out cross-validation is problematic because it allows for the possibility of negative R2 values. For classification settings, the effect is the same; in a perfectly balanced data set, leave-one-out cross-validation creates a testing set comprising a single observation that is in the minority class of the training set. A simple prediction rule, such as majority vote, would thus lead to predictions that would be incorrect.24 Rather, the preferred method of performing cross-validation is to leave out 10% to 20% of the data, using k-fold or shuffle-split techniques that repeatedly split the data randomly. Larger testing sets enable a good computation of measurements, such as the coefficient of determination or area under the receiver operating characteristic curve.


Best Practices for Predictive Modeling

We have several suggestions for researchers engaged in predictive modeling to ensure accurate estimates of predictive validity:

.    In-sample model fit indices should not be reported as evidence for predictive accuracy because they can greatly overstate evidence for prediction and take on positive values even in the absence of true generalizable predictive ability.

.    The cross-validation procedure should encompass all operations applied to the data. In particular, predictive analyses should not be performed on data after variable selection if the variable selection was informed to any degree by the data themselves (ie, post hoc cross-validation). Otherwise, estimated predictive accuracy will be inflated owing to circularity.25

.    Prediction analyses should not be performed with samples smaller than several hundred observations, based on the finding that predictive accuracy estimates with small samples are inflated and highly variable.26

.    Multiple measures of prediction accuracy should be examined and reported. For regression analyses, measures of variance, such as R2, should be accompanied by measures of unsigned error, such as mean squared error or mean absolute error. For classification analyses, accuracy should be reported separately for each class, and a measure of accuracy that is insensitive to relative class frequencies, such as area under the receiver operating characteristic curve, should be reported.

.    The coefficient of determination should be computed by using the sums-of-squares formulation rather than by squaring the correlation coefficient.

.    k-fold cross-validation, with k in the range of 5 to 10,27 should be used rather than leave-one-out cross-validation because the testing set in leave-one-out cross-validation is not representative of the whole data and is often anticorrelated with the training set.


Author Contributions: Dr Poldrack and Ms Huckins had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Poldrack, Varoquaux.

Acquisition, analysis, or interpretation of data: Poldrack, Huckins.

Drafting of the manuscript: Poldrack.

Critical revision of the manuscript for important intellectual content: Huckins, Varoquaux.

Statistical analysis: All authors.

Administrative, technical, or material support: Poldrack.

Among well-nourished populations of Westerners, men's high testosterone levels represent an outlier of cross‐cultural variation; probably due to intrasexual competition in reproductive contexts; it increases prostate cancer risk


From 2012... Do evolutionary life‐history trade‐offs influence prostate cancer risk? A review of population variation in testosterone levels and prostate cancer disparities. Louis Calistro Alvarado. Evolutionary Applications, December 11 2012. https://doi.org/10.1111/eva.12036

Abstract: An accumulation of evidence suggests that increased exposure to androgens is associated with prostate cancer risk. The unrestricted energy budget that is typical of Western diets represents a novel departure from the conditions in which men's steroid physiology evolved and is capable of supporting distinctly elevated testosterone levels. Although nutritional constraints likely underlie divergent patterns of testosterone secretion between Westernized and non‐Western men, considerable variability exists in men's testosterone levels and prostate cancer rates within Westernized populations. Here, I use evolutionary life history theory as a framework to examine prostate cancer risk. Life history theory posits trade-offs between investment in early reproduction and long-term survival. One corollary of life history theory is the ‘challenge hypothesis’, which predicts that males augment testosterone levels in response to intrasexual competition occurring within reproductive contexts. Understanding men's evolved steroid physiology may contribute toward understanding susceptibility to prostate cancer. Among well-nourished populations of Westerners, men's testosterone levels already represent an outlier of cross‐cultural variation. I hypothesize that Westernized men in aggressive social environments, characterized by intense male–male competition, will further augment testosterone production aggravating prostate cancer risk.



Discussion

Modern Westernized environments represent a clear deviation from the environment in which male reproductive physiology evolved. Largely removed from energetic constraint and pathogen burden, Westernized men are capable of supporting distinctly elevated testosterone at the upper limit of human variability and amplifying the incidence of hormone‐sensitive cancer. Variation in nutritional status can largely account for observed disparities in men's testosterone levels and prostate cancer between Westernized and non‐Western populations, but not within Westernized populations—the populations at highest risk of prostate cancer. By incorporating a challenge hypothesis framework, another source of lifetime variation in testosterone exposure was proposed: Aggressive social environments affect prostate cancer incidence through the responsiveness of male androgen physiology to challenges, specifically among Westerners who are able to support the energetic costs of high testosterone levels. I reviewed literature which showed that ancestry, a widely recognized risk factor for prostate cancer, is in and of itself biologically unimportant when accounting for lifestyle factors. For instance, population disparities in testosterone levels of black‐and white‐American men become attenuated and nonsignificant when comparing among college‐educated men from similar backgrounds (Mazur 1995, 2006). And in a nationally representative sample, there was no significant difference in testosterone levels of black‐and white‐American men after accounting for differences in anthropometry (age and body fat percentage) and lifestyle factors (drug use and physical activity) (Rohrmann et al. 2007). To reiterate, there is surprisingly little evidence to suggest that testosterone levels are a direct consequence of ancestry. And as discussed earlier, men of lower SES, regardless of ethnicity, demonstrate higher rates of male–male violence, higher testosterone levels, and higher prostate cancer. Using ancestry as a putative biomarker of prostate cancer risk is effective only to the extent which it tracks environmental circumstances and living conditions that influence cancer risk.

Additionally, I argued that poverty and compromised male investment lead to prioritized mating effort and increased male–male competition, culminating into chronically elevated testosterone and higher rates of prostate cancer. This general trend would be expected only if inequity in wealth distribution translated into more agonistic interactions between males at the population level. In other words, if the relationship between poverty and aggressive social environments is moderated, then there would be little expectation for lower SES to contribute to prostate cancer risk. Norwegian men, for example, deviate from the normally observed correlation between low SES and increased prostate cancer risk. This is particularly interesting because of the sizeable welfare program that is characteristic of Nordic social policy (Sachs 2006), which is associated with some of the lowest crime rates, violent or otherwise (Barclay et al. 2001). As such, Norway invests heavily in poverty reduction, boasts the lowest homicide rate within the developed world, and does not exhibit a concentration of prostate cancer among men of lower SES. Taken together, it would appear that comprehensive social programs might decouple socioeconomic differentials from male–male violence and prostate cancer risk, and may provide a surprising example of how improved social policies and poverty alleviation strategies are fundamental to the interest of public health.

And finally, the challenge hypothesis framework developed in this review may have occupational health implications, considering that men's testosterone levels vary according to occupational status (Dabbs 1992), and that some professions carry a disproportionate risk of prostate cancer (Demers et al. 1994; Zeegers et al. 2004). Dabbs (1992) and colleagues (1998) found that blue‐collar workers have higher salivary and serum testosterone than white‐collar workers. However, distinct social contexts within a profession can also give rise to differences in testosterone levels. Although lawyers as a group are white‐collar workers, trial lawyers have significantly higher salivary testosterone than nontrial lawyers, which has been attributed to the polemical nature of face‐to‐face litigation (Dabbs et al. 1998). If this pattern of elevated testosterone from agonistic interactions persists across occupations, it seems reasonable to expect that men in professions with a higher intensity of competitive interaction would exhibit a greater incidence of prostate cancer. Findings from an extensive cohort study of 58,279 Western European men (ages 55–69 years) from 20 separate occupations are consistent with this reasoning (Zeegers et al. 2004). After accounting for individual characteristics and lifestyle factors (age, diet, drug and alcohol use, education, family disease history, and physical activity), it was police officers who showed the highest relative risk for prostate cancer. Indeed, prostate cancer risk increased 67% for each 10 years of occupational duty as a policeman. The framework proposed here can explain these seemingly peculiar associations between career choice and prostate cancer risk.

Liberals & conservatives are similarly obedient to their own authorities & condemn perceived abuses of their ideology’s sacralized objects & and heroes; liberals & conservatives seem made up of the same psychological stuff

Do liberals and conservatives use different moral languages? Two replications and six extensions of Graham, Haidt, and Nosek’s (2009) moral text analysis. Jeremy A. Frimer. Journal of Research in Personality, November 28 2019, 103906.  https://doi.org/10.1016/j.jrp.2019.103906

Abstract: Do liberals and conservatives tend to use different moral languages? The Moral Foundations Hypothesis states that liberals rely more on foundations of care/harm and fairness/cheating whereas conservatives rely more on loyalty/betrayal, authority/subversion, and purity/degradation in their moral functioning. In support, Graham, Haidt, and Nosek (2009; Study 4) showed that sermons delivered by liberal and conservative pastors differed as predicted in their moral word usage, except for the loyalty foundation. I present two high-powered replication studies in religious contexts and six extension studies in politics, the media, and organizations to test ideological differences in moral language usage. On average, replication success rate was 30% and effect sizes were 38 times smaller than those in the original study. A meta-analysis (N=303,680) found that compared to liberals, conservatives used more authority r=.05, 95% confidence interval=[.02,.09] and purity words, r=.14 [.09,.19], fewer loyalty words, r=-.08 [-.10,-.05], and no more or less harm, r=.00 [-.02,.02], or fairness words, r=-.03 [-.06,.01].

Keywords: morality, language, ideology, conservatism, replication, moral foundations theory

General Discussion

Two replications and six extensions found limited support for the MFH in terms of language usage. Whereas a close replication of sermons from the same two U.S. Christian denominations as those in the original was successful (Study 1), a conceptual replication with 12 other U.S. Christian denominations was largely unsuccessful (Study 2), meaning that the two denominations studied in Graham et al. (2009) may not be representative of Christian denominations in general. This suggests that even within the context of religious sermons by U.S. Christian pastors, liberals and conservatives may not use different moral languages as much as previously thought. Although Graham et al. (2009) suggested that political speeches may not be the ideal context for detecting the different moral languages of liberals and conservatives, conceptual replications with four political samples were successful in aggregate for four of the five foundations (Study 3). A moderation analysis found that the differences in the moral languages of liberals and conservatives changed when moving from a religious to a political context for two of the five foundations only, meaning that the distinction between religion and politics may not be as important as Graham et al. (2009) suggested.

Samples drawn from the media and organizations, contexts not ruled out by Graham et al. (2009), allowed for a novel assessment of whether liberal and conservative commoners (broadly defined) use different moral languages (Studies 4-5). Tests of the MFH in these contexts were predominantly unsuccessful. Across all samples, metrics, foundations, and dictionaries, replication success rate was just 30%, meaning that 70% of replications failed. A meta-analysis (Study 6) of all the available data found support for the MFH for the authority and purity foundations, no evidence to support the MFH for harm and fairness, and evidence that is counter to the MFH for the loyalty foundation. Effect sizes were 38 times smaller on average. The most generous viable conclusion is that these results offer limited support for the MFH in the language of liberals and conservatives.

Analytical Considerations

The present analyses revealed that most distributions generated by the moral foundations dictionaries have a large number of identically-zero entries and are skewed. Correcting for this skew had relatively little effect on replication success and the resulting effect sizes. Thus, this analytic issue ended up being relatively inconsequential vis-à-vis replication considerations. Another analytical question concerned the dictionaries themselves. I used both the original MFD1 and the more recent and more valid MFD2. While results were not always the same, they tended to be largely similar. Analyses of non-skewed distributions stemming from the MFD2 are probably the most valid due to enhanced normality and predictive validity of this analytical set up.

Both GHN and the present studies relied on a simple word counting program to operationalize the usage of moral languages (GHN also coded the speakers’ attitudes towards those words). For more than a century, psychologists have drawn inferences about topics of conversation and speakers’ internal states and traits through methods like these. And word counting procedures have generally been shown to be valid. However, topics are not fully reducible to the presence of certain words. Future work might use other linguistic techniques to assess whether liberals and conservatives have similar or different attitudes toward moral languages and use them in similar or different ways.

Theoretical Considerations

Graham et al. (2009) found that liberals used more loyalty words than conservatives, a finding that is at variance with the MFH. The present analyses suggested that although this effect is weak, it is robust. Why liberals talk more about a topic upon which their morality is not based remains an important and pressing question for MFT.

The present and recent empirical findings motivate the revisiting of a fundamental question: what is a moral foundation, psychologically speaking? Proponents of the theory have advocated for construct pluralism in the sense that foundations are general mental modules that manifest in multiple psychological forms, including values, perceptions, behavioral orientations, language, and so on. The present findings, along with other work, raise questions about this tenet of Moral Foundations Theory. Results from the present studies suggest that differences in the moral language usage of liberals and conservatives are generally small. Moreover, for three foundations, the MFH was unsupported. It would probably be more accurate to conclude that liberals and conservatives use similar moral languages than that they use different languages.

Along with their similar languages, liberals and conservatives may not be as different as previously thought in terms of their general action orientations: liberals and conservatives are similarly obedient to their own authorities (Frimer, Gaucher, & Schaefer, 2014) and condemn perceived abuses of their ideology’s sacralized objects (Frimer et al. 2015, 2016) and heroes (Frimer, Biesanz, Walker, & MacKinley, 2013). This growing body of evidence is in line with idea that liberals and conservatives are made up of the same psychological stuff, but each ideology has its own set of cherished values and symbols. Whereas conservatism tends to cherish religion and the military, liberalism champions social justice and the environment (Frimer et al. 2015, 2016). Psychologically speaking, liberals and conservatives may cut from the same cloth.

They meta-analyze whether race or ethnicity moderate the heritability of intelligence in the US; find moderate to high heritabilities that do not substantially differ by race or ethnicity

Racial and ethnic group differences in the heritability of intelligence: A systematic review and meta-analysis. Bryan J.Pesta et al. Intelligence, Volume 78, January–February 2020, 101408. https://doi.org/10.1016/j.intell.2019.101408

Highlights
•    We meta-analyze whether race or ethnicity moderate the heritability of intelligence.
•    The main sample (k = 16) was comprised of Whites Blacks, and Hispanics from the USA.
•    We found moderate to high heritabilities for both groups.
•    Heritabilities, however, did not substantially differ by race or ethnicity.
•    Results are largely inconsistent with predictions from the Scarr-Rowe hypothesis.

Abstract: Via meta-analysis, we examined whether the heritability of intelligence varies across racial or ethnic groups. Specifically, we tested a hypothesis predicting an interaction whereby those racial and ethnic groups living in relatively disadvantaged environments display lower heritability and higher environmentality. The reasoning behind this prediction is that people (or groups of people) raised in poor environments may not be able to realize their full genetic potentials. Our sample (k = 16) comprised 84,897 Whites, 37,160 Blacks, and 17,678 Hispanics residing in the United States. We found that White, Black, and Hispanic heritabilities were consistently moderate to high, and that these heritabilities did not differ across groups. At least in the United States, Race/Ethnicity × Heritability interactions likely do not exist.


1. Introduction

In behavioral genetic research, individual variance in cognitive ability is commonly partitioned into three components. The first is the additive genetic component (a2, also known as h2), which refers to genetic effects on a trait that act additively. This component is called (narrow) “heritability.” The second component is the common or shared environment (c2), which denotes environmental effects that make family members more similar. The third component is the unshared environment (e2), which consists of non-genetic effects (plus measurement error) that are not shared between family members, but which instead differentiate them from each other. Collectively, the last two components are known as “environmentality” (Plomin, DeFries, Knopik, & Neiderhiser, 2014).

These three components together comprise the “ACE” model of behavioral genetics. The model represents one basic, biometric framework behavioral geneticists may use when studying the heritability of human traits, including intelligence. The ACE model assumes that environmental and genetic influences are additive, but allows that interactions (e.g., A × E) may also exist between components; these can be estimated as well (Plomin et al., 2014; Vinkhuyzen, van der Sluis, Maes, & Posthuma, 2012). Moreover, the model is useful in intelligence research because the behavioral genetic architecture of the trait is “surprisingly simple” (Plomin et al., 2014, p. 200). Finally, the ACE model nicely fits IQ data, and ACE estimates do not require the use of cumbersome kinship designs.

The relative importance of genetic and environmental sources of individual differences in cognitive ability has been extensively studied. Results for the general population show that the proportion of variance in IQ explained by genes increases with age (Plomin et al., 2014). Specifically, in early childhood, genetic effects explain less than 50% of IQ variance, and the effect of the shared environment is relatively strong. As children age, though, genetic effects become increasingly prominent, and the environmental variance due to factors common to siblings decreases. In adults, the heritability of intelligence is 60–80%, while the effect of common environment is small, if not zero (Plomin et al., 2014). The unshared environment explains the rest.

The degree to which one can generalize heritability estimates to other populations has been debated (see, e.g., Sesardic, 2005). It is clear, though, that some variables (e.g., age; Plomin et al., 2014) moderate the heritability of cognitive ability. One putative moderator is the quality of one’s environment. Poorer (richer) environments supposedly correspond to lower (higher) heritability, to a presumably measurable degree. Said differently, “natural potentials for adaptive functioning are more fully expressed in the context of more nourishing environmental experiences” (Tucker-Drob & Bates, 2016, p. 1). This prediction is known as the Scarr-Rowe hypothesis (Scarr-Salapatek, 1971; Turkheimer, Harden, D’onofrio, & Gottesman, 2011).

The Scarr-Rowe hypothesis predicts lower heritabilities for lower performing social classes and racial/ethnic groups (Scarr-Salapatek, 1971, p. 1286). Scarr-Salapatek’s (1971) original hypothesis and related ones – examples include the “Threshold Hypothesis” (Jensen, 1968), the “Bio-ecological Model” (Bronfenbrenner & Ceci, 1994), and the “Gene–Gini Hypothesis” (Selita & Kovas, 2019) – predict that Scarr-Rowe interactions will result when there are environmental differences. Assuming that social class and racial/ethnic differences are largely environmental in origin, Scarr-Salapatek (1971) and others have predicted lower heritabilities for the lower scoring groups.

Does the heritability of human intelligence differ by either social class or race/ethnicity? The answer is complicated because variables like age and the country sampled can moderate the effects. For example, a meta-analysis by Tucker-Drob and Bates (2016) found greater heritability with higher socioeconomic status, but these effects existed only with participants from the United States. Regarding age, recent data from Germany suggest the existence of a Scarr-Rowe interaction, but one which declines with increasing age (Gottschling et al., 2019).

While Scarr-Rowe interactions for social class are relatively well-studied, interactions for race or ethnicity are less so. Hence, whether Scarr-Rowe interactions for race or ethnicity exist is unclear. Some reviews suggest that the heritability of intelligence is similar across cultures (Plomin et al., 2014) and ethnic groups (Jensen, 1998; Rushton & Jensen, 2005). Others suggest differently (Turkheimer, Harden, & Nisbett, 2017).

The issue is relevant for several reasons, including evaluating the trans-ethnic validity of polygenic scores. Recently, Lee et al. (2018) developed polygenic scores for both intelligence and educational levels. These scores were derived from European samples and they showed lower predictive accuracy in non-European groups such as African Americans. The typical explanation offered for attenuated predictive accuracy is decay of linkage disequilibrium (LD) which results in differences in the correlations between SNPs across different ancestry groups (Zanetti & Weale, 2018). Another hypothesis appeals to lower within-group heritability in non-White groups (see, e.g., Rabinowitz et al., 2019). Both explanations are plausible since the predictive accuracy of polygenic scores is a joint function of (1) the validity of the scores as predictors of the traits, and (2) the within-group heritability of the traits in question (i.e., the association between the genotype and the phenotype; Daetwyler, Villanueva, & Woolliams, 2008). While LD decay might be a theoretically adequate explanation for attenuated predictive accuracy of PGS (Zanetti & Weale, 2018), whether it is the actual explanation can only be properly evaluated when the heritabilities of the trait within the different subgroups are known.

Our aim is to shed light on these matters by conducting a systematic review and meta-analysis. The goal is to test for the presence of Scarr-Rowe interactions with respect to race/ethnicity. Our specific research question is whether the heritability of intelligence differs across racial/ethnic groups residing in the United States (we searched for studies worldwide but found only samples from this country).

Wednesday, November 27, 2019

From 2017... Opiate of the Masses? Inequality, Religion, and Political Ideology in the U.S.

Schnabel, Landon. 2017. “Opiate of the Masses? Inequality, Religion, and Political Ideology in the United States.” SocArXiv. July 18. doi:10.31235/osf.io/dnz2w

Abstract: This study considers the assertion that religion is the opiate of the masses. Using a special module of the General Social Survey, I first demonstrate that religion functions as a compensatory resource for structurally-disadvantaged groups—women, racial minorities, those with lower incomes, and, to a lesser extent, sexual minorities. I then demonstrate that religion—operating as both palliative resource and values-shaping schema—suppresses what would otherwise be larger group differences in political ideology. This study provides empirical support for the general “opiate” claim that religion is the “sigh of the oppressed creature” and suppressor of emancipatory political values. I expand and refine the theory, however, showing religion provides (1) compensatory resources for lack of social, and not just economic, status, and (2) traditional-values-oriented schemas that impact social attitudes more than economic attitudes.


Religious suffering is, at one and the same time, the expression of real suffering and a protest against real suffering. Religion is the sigh of the oppressed creature, the heart of a heartless world, and the soul of soulless conditions. It is the opium of the people.
                     -Karl Marx (1970 [1843])


Whenever a candidate or policy that advantages the few while disadvantaging the many wins an election, pundits assume people voted against their own self-interests and then wonder why. For example, after the 2016 U.S. presidential election many wondered why women did not vote more consistently for the first woman nominated by a major party. Status and positionality theories of politics excel at predicting why structurally-disadvantaged groups often support and vote for progressive candidates and policies, but these theories break down in the not infrequent cases when disadvantaged groups are not liberal. For example, as I will show, men are more supportive of a woman’s right to choose abortion than are women. Are disadvantaged groups simply irrational, or is there a missing piece or overlapping identity that, when added to positionality theories of politics, explains otherwise unexpected attitudes and voting behavior?

Marx, Du Bois, Weber, and other classical social theorists said religion appeals to the disenfranchised and helps them through suffering. But, according to these theorists, negatives accompany the positives, with religion legitimating subordination and/or distracting people from the root causes of their suffering. Marx’s “opiate of the masses” argument would predict that religion constrains revolution by suppressing political engagement. Yet, in the contemporary United States and many other countries, the most intensely religious people are often the most politically engaged, having an outsized impact on politics (Bolzendahl, Schnabel, and Sagi 2019). Although religion does not seem to make people apolitical, it is still possible that religion legitimates the status quo. Applying and synthesizing several theoretical traditions—including structuration (Giddens 1984; Sewell 1992), system justification (Jost and Hunyady 2002), compensatory control (Kay et al. 2009), and related cultural and social psychological approaches to the study of religion (Edgell 2012; Hoffmann and Bartkowski 2008; Willer 2009)—I explore, expand upon, and refine the classic “opiate” argument.

In the process of exploring the “opiate” argument, this study answers, at least in part, two broader social scientific questions: (1) Why are some groups consistently more religious than others? (2) Why do attitudes toward certain social issues, such as abortion and same-sex relationships, seem to contradict the positionality principle of disadvantage promoting progressive values? I conclude that, as Marx and others have argued, religion can legitimate inequality. But I propose a new mechanism: Rather than suggesting that religions make people less political, less agentic, or more irrational, I argue that religions shape political ideology in accordance with the deeply-held identities, interests, and values of agentic people with multiple overlapping identities seeking meaning and wellbeing in the face of uncertainty and injustice. By acting as a compensatory resource that disproportionately provides comfort and strength to the disadvantaged and a schema that disproportionately shapes their political ideology according to traditional religious values, contemporary American religion—and Christianity in particular—suppresses what would otherwise be larger group differences in political ideology.


Religious affiliation and marital satisfaction: commonalities among Christians, Muslims, and atheists

Religious affiliation and marital satisfaction: commonalities among Christians, Muslims, and atheists. Piotr Sorokowski1, Marta Kowal1 and  Agnieszka Sorokowska. Front. Psychol. | doi: 10.3389/fpsyg.2019.02798.

ORIGINAL RESEARCH ARTICLE Provisionally accepted The full-text will be published soon

Abstract: Scientists have long been interested in the relationship between religion and numerous aspects of people’s lives, such as marriage. This is because religion may differently influence one’s level of happiness. Some studies have suggested that Christians have greater marital satisfaction, while others have found evidence that Muslims are more satisfied. Additionally, less-religious people have shown the least marital satisfaction. In the present study, we examined marital satisfaction among both sexes, and among Muslims, Christians, and atheists, using a large, cross-cultural sample from the dataset in Sorokowski et al. (2017). Our results show that men have higher marital satisfaction ratings than women, and that levels of satisfaction do not differ notably among Muslims, Christians, and atheists. We discuss our findings in the context of previous research on the association between marriage and religion.

Keywords: Religious affiliation, marital satisfaction, Christians, Muslims, Atheists

Discussion

The present study’s primary goal was to examine the association between religious affiliation and marital satisfaction, and the results showed that there was no relationship between the former and level of the latter—Christians and Muslims were found to be similarly satisfied with their marriages, as were atheists. Nevertheless, the present analysis provided support for a link between marital satisfaction and age (younger people showed higher marital happiness), material status (higher material status, higher marital satisfaction), or sex (men were happier in their marriages than women).
Previous findings have indicated Abrahamic religions (e.g., Christianity, Islam) share many similarities (Agius and Chircop, 1998; Zarean and Barzegar, 2016) and promote formation of traditional family ties, such as marriage rather than cohabitation, and marital rather than non-marital births (Dollahite and Lambert, 2007; Zarean and Barzegar, 2016). However, these religions have some substantive differences in beliefs and practices. For example, polygyny is not accepted in Christianity, whereas it is widely accepted in Islam, and such a family model may negatively influence marital life (Al-Krenawi and Graham, 2006). Despite the discrepancies between those two religions, the present study found no differences between them as far as marital satisfaction, and this included people from different parts of the world.
Moreover, since the New York City terrorist attacks on September 11, 2001, Islam has been central in many debates, discussions, and publications (Alghafli et al., 2014). Discussion on Islam frequently concerns familial issues, perceived by the Western media mostly in a negative light. Problematic issues include, for instance, gender roles and the treatment of women (McDonald, 2006; Ridouani, 2011; Ennaji, 2016). Studies, however, do not support this unfavorable view of females’ situations: religious Muslims show increased marital satisfaction (Abdel-Khalek, 2006, 2010; Asamarai et al., 2008; Ahmadi and Hossein-Abadi, 2009; Zaheri et al., 2016, but see also Abu-Rayya, 2007).
The present study’s results provide evidence that Christians and Muslims do not differ in their level of marital satisfaction. People from various countries identifying themselves as belonging to one of these two religions had similar level of marital happiness, which is consistent with previous findings. For instance, Dabone (2012) compared marital satisfaction among Muslim and Christian spouses, and found relative dissatisfaction, while the religious affiliation did not affect the satisfaction.
As scarce data exist on marital satisfaction among atheists, the present study’s second aim was to investigate whether atheists have similar marital satisfaction to marriages as do religious adherents. Considering positive correlations found between religiosity and marital satisfaction (Marks, 2005), atheists may be expected to have significantly lower levels of both variables. A major drawback of previous related research is its predominant focus on comparisons between more-religious and less-religious people (Fincham et al., 2011), excluding the relatively large group that atheists represent. Additionally, most studies have been conducted in the United States, where atheists are often negatively stereotyped (Zuckerman, 2009). The present study results provide evidence that atheists are neither more nor less satisfied with their marriages than religious adherents, which suggests religion may not influence marital satisfaction.
There are a few possible explanations for observed similar marital satisfaction ratings across people of different religions. Overall, married couples constitute a lower percentage of people in a relationship (Nock, 1995). Those who decide to get married may be particularly committed or well-suited to partnership, regardless of their religious affiliation. Entering a serious relationship, such as marriage, requires strong enthusiasm toward the partner (Wang and Chang, 2002) and, thus, results in higher ratings of subjectively perceived relationship satisfaction. Another possible explanation may be that people generally consider marriage a long-lasting relationship (Silliman and Schumm, 2004; Willoughby and Dworkin, 2009), and when they decide to get married, they rationalize and “cognitively close” their choice (Webster and Kruglanski, 1994). Participants in the study population may have felt they had to be satisfied with their relationship, as they had invested so much energy into its development. Had they reported being unsatisfied, feeling an internal conflict may have surfaced (e.g., “Why am I even with him/her if it makes me unhappy?”). The need to explain the dissonance of staying in an unsuccessful relationship would be negatively perceived, and could yield unpleasant emotions, especially in Western, individualistic cultures, which value the pursuit of personal happiness at all costs (Gilovich et al., 2015). Such emotion could also occur in Eastern, collectivistic cultures, which emphasize the importance of being unselfish, grateful, and appreciative of one’s partner (Kagawa-Fox, 2010).
In general, participants were relatively satisfied with their marriages. Nonetheless, men’s marital satisfaction differed from women’s (independent of religious affiliation). Over 40 years ago, Bernard (1975) presented a provocative and controversial thesis asserting marriage is better for men than for women, and his statement has raised heated discussions. Most of the research has provided evidence for to support Bernard’s (1975) that thesis (Fowers, 1991; Schumm et al., 1998), and this is also true in non-Western cultures (Shek and Tsang, 1993; Asamarai et al., 2008). However, there was also one study which yielded unclear findings (McNulty et al., 2008). Results of the present study – which is based on the analysis of a large, cross-cultural sample, confirm the differences among men’s and women’s marital satisfaction: husbands did indeed have higher marital satisfaction than wives. Nevertheless, the size effect of these sex differences was extremely small (Eta < 0.01).
In conclusion, despite a large body of research on marital satisfaction (Bradbury et al., 2000; Twenge et al., 2003; Hilpert et al., 2016), most studies have rarely controlled for participants’ religion. Even when they have done so, they have not explored the differences between people of various religious affiliations (Sullivan, 2001; Williams and Lawler, 2003; Olson et al., 2016). Future research should therefore focus on people of different (1) religions (especially less-prevalent ones); and (2) cultures (as most studies up to date have been conducted on Western, educated, industrialized, rich, and democratic populations (Henrich et al., 2010), and should take into consideration other factors that may influence marital satisfaction among people of different religious affiliations (e.g., number of children, education, country’s development), as this would provide further understanding on the interaction between religion and marital happiness, as well as culture.

Perceptions of married life among single never‐married, single ever‐married, & married adults: Conceptualizations of marriage may be changing to be less positive or less discrepant from conceptualizations of single life

Perceptions of married life among single never‐married, single ever‐married, and married adults. Amanda N. Gesselman et al. Personal Relationships, November 26 2019. https://doi.org/10.1111/pere.12295

Abstract: With the increasing prevalence of single adults in the United States, perceptions of marriage as the relationship “gold standard” may be diminishing. In this study (N = 6,576), we explored perceptions of married life in three subgroups of participants: Those who have never married, ever married, and currently married. Across subgroups, most did not perceive married life more positively than single life in external/tangible domains (e.g., more friends), but did in emotional experiences and frame of mind (e.g., contentment). These findings suggest conceptualizations of marriage may be changing to be less positive or less discrepant from conceptualizations of single life. However, these findings also suggest that people continue to view marital relationships as a positive source of emotional experience and support.


4 | DISCUSSION

In the present study, we explored how American adults of varying relationship statuses perceive married
life compared to single life across eight social domains. In our analyses, we examined perceptions
of single life versus married life in the sample overall, by specific relationship status (i.e., never
married, currently married, separated/divorced, widowed), and conducted two targeted comparisons
of (a) never married versus ever married participants and (b) currently married versus previously
married participants. Across our analyses, single life was perceived to positively exceed married life
in terms of friendships and social life, sexual behavior, working hard to stay in shape, and careermindedness.
Conversely, married life was perceived to include more feelings of contentment,
confidence, and security, which are factors globally important to happiness and satisfaction with interpersonal
relationships (e.g., Feeney & Collins, 2015; Mikulincer & Shaver, 2013; Murray, Holmes, &
Griffin, 2000). These findings demonstrate in a large national sample that in some domains single life
is perceived more positively, and in other domains married life is perceived more positively, highlighting
the reality of multiple determinations in people's romantic and sexual lives. These findings also
demonstrate the utility of assessing multiple social life domains when examining differences and
similarities across relationship statuses (e.g., Ta et al., 2017), pointing to unique aspects of social and
interpersonal lives that people perceive and experience more positively or negatively.
In testing perceptual differences between never married and ever married participant subgroups,
we found a “knowing from experience” effect: Compared to those singles who had never been married,
participants who had ever been married perceived single life (vs. married life) to include more
sex, more interesting social lives, and being in better shape to a greater extent. Similarly, compared
to those who had ever been married, participants who had never been married perceived married life
to include more feelings of contentment than in single life to a greater extent. Last, while previously
married and currently married participants held the same perceptions, those singles who had previously
been married had more emphasized differences in their perceptions of single versus married
life just as in the prior comparison tests. When compared to currently married participants, those who
had previously been married felt to a greater extent that single life included having more interesting
social lives, being in better shape, and being more career-minded, but also felt to a greater extent that
married life includes more feelings of contentment, confidence, and security than single life. These
findings demonstrate an important nuance largely missing in previous research examining married
and single life, that in addition to current relationship status, previous experiences of having ever
been married may uniquely characterize perceptions and attitudes toward married life, potentially
dampening the effects found in prior large-scale studies that have either not examined prior marriages
as a separate subgroup of participants or have incidentally biased studies of married and single life
by removing previously married individuals from their samples.
Although marriage is often seen as the optimal arrangement, our findings did not show more positive
perceptions of married life across all domains. In our overall pattern of results, we found that in
several more concrete or observable aspects, participants across all relationship statuses perceived
single life more favorably than married life. It was only when considering emotional experiences and
frame of mind that married life was consistently rated more positively across groups. This finding of
marital relationships as a positive source of emotional experience and support is consistent with
social perception findings demonstrating that people believe those who are married are generally
happier and more fulfilled than singles (DePaulo & Morris, 2006), although research does not support
such differences when measured directly (e.g., Greitemeyer, 2009). These findings may also be
indicative of a shift in what Americans hope to garner from marriage, as proposed by Finkel et al.
(2014), with particular emphasis now on higher level needs that contribute to one's own psychological
well-being. Finkel et al. (2014) proposed that this more recent emphasis on contemporary marriage
fulfilling both lower- and higher-level needs may also explain why research has demonstrated
that links between marital quality and psychological well-being have become stronger over time
(Proulx, Helms, & Buehler, 2007). These patterns may be further compounded by the more tangible
advantages—including resources and financial benefits with government recognition—that come
with marriage in the United States. Such resources would likely increase one's feelings of contentment
and security, as well as confidence in one's ability to be successful, because of the relative
increases with combined family resources and social capital as well as partner social support.
Many of our findings showing positive perceptions of single life mirror observed differences
found in prior research. For instance, while there were no differences by subgroup, participants perceived
singles to have more friends and a more interesting social life. This is supported by multiple
studies. In a study of 25,000 American adults, researchers found single people to have more and
higher quality friendships than married people, regardless of gender or of parental status (Gillespie,
Lever, Frederick, & Royce, 2015). Similarly, in a large nationally representative study using data
from both the General Social Survey and the National Survey of Families and Households, Sarkasian
and Gerstel (2016) found that single people had more frequent contact with their friends, family, and
neighbors, and were more likely to both provide and receive help and support from these people in
their social networks. Singles have also been found to engage in more long-term caretaking of loved
ones and friends (Henz, 2006), and to be more socially integrated into their communities
(Klinenberg, 2012). Conversely, as proposed by the dyadic withdrawal hypothesis (Johnson &
Leslie, 1982), married and partnered people tend to engage in social withdrawal with those beyond
the partnership, linked with insularity and decay of one's social network, further magnifying these
patterns across relationship statuses.
Our participants perceived singles to work harder to stay in shape than married individuals. This
was indeed documented in a study of exercise frequency. While controlling for the effects of age, single
men and women were found to engage in more physically active hobbies, activities, and sports
than married men and women over a 2-week period—with single women exercising over an hour
more than married women, and single men exercising over 3 hr more than married men in the measurement
period (Nomaguchi & Bianchi, 2004). Considering these differences on a yearly basis, this
means that single men and women may be physically active for approximately 30–80 hr more than
married men and women, potentially leading to greater heart health, lowered anxiety and depression,
and longevity.
Additionally, our participants perceived single life to be characterized by more sexual behavior
than married life. Although recent media reports have purported that Americans are now having
less sex than ever before (e.g., Julian, 2018), some research has shown that this varies by relationship
status. In an examination of nationally representative data from the General Social Survey
gathered between 1989 and 2014, researchers showed that American adults had sex approximately
nine times less per year in the early 2010s when compared to the late 1990s (Twenge, Sherman, &
Wells, 2015). These declines in sexual frequency were consistent across gender, race, geographical
region, education level, and employment, and were also present in married/partnered individuals.
Unpartnered individuals, however, remained steady in their sexual frequency. This is
especially interesting given that unpartnered people have typically been shown to have sex less
often than those in relationships (Laumann, Gagnon, Michael, & Michaels, 1994). Thus, while on
average singles engage in less regular sexual activity than partnered people—likely impacted by
the investments needed to find and court each new sexual partner—the category of singles seems
to not be experiencing the same demographic declines in sexual frequency that have been
observed in samples of partnered people. This further suggests that any larger national patterns of
declining sexual frequency are not being driven by the rising proportion of singles in the adult
population.

Women who are more satisfied with their bodies & appearance are more comfortable undressing in front of a partner, having sex with the lights on, trying new sexual activities; initiate sex more often, report more orgasms

A review of research linking body image and sexual well-being. Meghan M. Gillen, Charlotte H. Markey. Body Image, Volume 31, December 2019, Pages 294-301. https://doi.org/10.1016/j.bodyim.2018.12.004

Highlights
•    We reviewed research on body image and sexual well-being.
•    The review focused on Dr. Thomas Cash’s contributions to this area.
•    Most research suggests a positive link between body image and sexual well-being.
•    We suggest research on new populations using new methods and on positive body image.

Abstract: The link between body image and sexual well-being is intuitive and increasingly supported by psychological research: individuals, particularly women, with greater body satisfaction and body appreciation tend to report more positive sexual experiences. Although both perceptions of one’s body and one’s sexual life are central to most adults’ experiences, this area of research has remained somewhat understudied. In this review, we discuss the findings that are available and suggest directions for future research and applied implications of this work. We highlight Thomas Cash’s contributions to this area of study, given his significant contributions to moving our understanding of body image and sexual well-being forward.



4.1. Body image and sexual experience

Sexual experience has been measured in a number of ways,such as relationship status (i.e., in a romantic relationship or not), ever engaging in sexual intercourse and oral sex, and frequency of sexual activities. Most of this research has been conducted among young adults, given that they are just beginning to navigate sexual experiences and romantic relationships. Studies suggest that individuals who are in romantic relationships have less body image self-consciousness during sexual intimacy (Sanchez & Kiefer, 2007; Steer & Tiggemann, 2008;Wiederman, 2000) and less difficulty achieving orgasm (Sanchez& Kiefer, 2007) as compared to those who are not in romantic relationships. Among college students, ever engaging in sexualintercourse is associated with higher body satisfaction, higher appearance evaluation, lower body dissatisfaction, lower bodyimage self-consciousness, and higher orientation toward appear-ance (Gillen, Lefkowitz, & Shearer, 2006; Merianos, King, &Vidourek, 2013; Wiederman, 2000). Interestingly, however, in one study (Wiederman & Hurst, 1998), college women who had ever had sexual intercourse reported similar body image as those who had never had sexual intercourse, yet experimenters rated women with no sexual intercourse experience as less attractive. Body image and oral sex experience have been found to be associated with each other. In one study, only receiving (rather than giving) oral sex was associated with higher self-perceptions of bodily attractiveness among college women (Wiederman & Hurst, 1998). Also among college women, ever engaging in oral sex is associated with lower body image self-consciousness (Wiederman, 2000). Body image is also associated with frequency of sexual experiences. Women who have higher body satisfaction report greater frequency of sex (Ackard,Kearney-Cooke, & Peterson, 2000), and women with higher body image self-consciousness during sexual intimacy have less variable and frequent heterosexual sexual experience (Wiederman, 2000).In sum, individuals who are in a romantic relationship, have ever had sexual intercourse and oral sex, and who have more frequent and variable sexual experiences tend to have more positive body attitudes and less self-consciousness during sexual intimacy.

Because the studies reviewed here are correlational, the directionality of these associations are not clear. For example, being in a romantic relationship with a supportive romantic partner who offers frequent compliments about one’s body can enhance body image (Markey & Markey, 2006). It is also feasible that individuals who have more positive body image have more confidence to seek out more romantic and sexual experiences. These relations maybe cyclical; the more confident individuals feel about their bod-ies, the more likely they are to seek out sexual experiences. Then,the more sexual experiences they have, the better they feel about their bodies. Although it is likely that the direction of effect runs both ways, longitudinal and experimental research is needed to help determine directionality.

4.2. Body image and sexual functioning
Sexual functioning encompasses factors such as desire, arousal,orgasm, satisfaction, and pain (Rosen et al., 2000). Much of this liter-ature has focused on women, perhaps because they are more likelythan men to engage in appearance-based spectatoring, or being dis-tracted during sex with thoughts of how one’s body appears to apartner (Wiederman, 2012). Some research shows no significant associations between various measures of body image and sexualfunctioning among women, perhaps because women’s body image concerns have become so widespread that they do not meaningfully relate to women’s sexual experiences (Davison & McCabe, 2005; Milhausen, Buchholz, Opperman, & Benson, 2015). It may also be that context-specific measures of body image in sexual situations are better predictors of sexual functioning than more general mea-sures of body image (Wiederman, 2012; Yamamiya et al., 2006).Yet, most research shows that body image is related to various domains of sexual functioning (for a review, see Woertman & Brink,2012). In general, women with higher body and appearance satis-faction also appear to be more comfortable and satisfied in sexual contexts. Specifically, women who are more satisfied with their bodies and appearance are more comfortable undressing in front of a partner, having sex with the lights on, and trying new sexual activities; they also initiate sex more often, report more orgasms during sex, and have higher solitary and partnered sexual desire (Ackard et al., 2000; Dosch, Ghisletta, & Van der Linden, 2015).

Similar associations have been found for other body image con-structs. For example, among women, higher body esteem and fewer distracting appearance-based thoughts during sexual activity are associated with higher sexual satisfaction (Pujols et al., 2010), and higher situational body image dysphoria is associated with lower sexual assertiveness, lower sexual esteem, higher sexual anxiety,and more sexual problems (Weaver & Byers, 2006).Consistent with objectification theory, some work in this area has focused on objectification-related constructs and their linkswith sexual well-being. For instance, body surveillance is signif-icantly associated with lower sexual self-esteem, lower sexual self-competence, and lower sexual satisfaction among collegewomen (Calogero & Thompson, 2009a,2009b). Similarly, body shame is associated with lower sexual self-esteem, lower sex-ual satisfaction, and more self-consciousness during partnered sexual activity among college women (Calogero & Thompson, 2009a, 2009b; Steer & Tiggemann, 2008). Among adults, body shame is associated with lower sexual pleasure and more sexual problems (associations were also mediated by self-consciousness during sexual activity with a partner; Sanchez & Kiefer, 2007).
Although less often studied, appearance anxiety is also significantly associated with higher self-consciousness during sexual activity with a partner and decreased sexual functioning among college women (Steer & Tiggemann, 2008; Tiggemann & Williams, 2011).In one study, appearance anxiety in sexual situations also significantly mediated relations between body surveillance and sexual well-being (Vencill et al., 2015). That is, increased body surveillance related to increased appearance anxiety in sexual situations,which in turn related to decreased sexual well-being.Recent research has focused on associations between posi-tive body image and sexual functioning, in line with the call for more work on positive body image (Gillen et al., 2018; Smolak & Cash, 2011). This research has focused on body appreciation, a widely studied facet of positive body image. In samples of women,body appreciation was significantly associated with higher arousal (Brink, Smeets, Hessen, & Woertman, 2016; Satinsky et al., 2012), higher sexual desire (Brink et al., 2016), more frequent orgasms,and higher sexual satisfaction (Satinsky et al., 2012). Body appreciation also appears to be related to attitudes toward sexual practices.Among women and men, body appreciation is associated with higher sexual liberalism and more positive attitudes toward unconventional sexual practices (Swami, Weis, Barron, & Furnham, 2017).In sum, body image tends to be significantly associated with various dimensions of sexual functioning, although there are some exceptions. Most of this research focuses on women, with the limited research on men supporting similar conclusions (e.g., Sanchez& Kiefer, 2007). The reported associations are especially strong for contextual measures of body image (measures that capture body image in certain situations), and have been found for both negative and positive aspects of body image.

4.3. Body image and risky sexual behavior and attitudes
Researchers have also investigated associations between body image and risky sexual behavior and attitudes. Risky sexual behav-ior includes behaviors such as having casual sex, sex without protection, and having multiple partners. Risky sexual attitudes includes low condom use self-efficacy (i.e., low confidence in using condoms), perceiving more barriers to condom use, and endorsing the sexual double standard (i.e., that it is acceptable for men to have more sexual freedom than women). In terms of risky sexual behaviors, one study of college students reported no significant differences between students with low and high body satisfaction on a range of sexual risk behaviors (Merianoset al., 2013). However, several other studies indicate significant associations between body image and risky sexual behavior, particularly among young women. For young women, poor body image appears to be related to increased risk for engaging in risky sexual behavior. For example, among sexually active women, those who have higher body shame, higher body dissatisfaction, and perceive themselves as overweight report more unprotected sex (Hollander,2010; Littleton, Breitkopf, & Berenson, 2005; Wingood, DiClemente, Harrington, & Davies, 2002). Women with higher body shame also report multiple sex partners in the past year, and women with higher body surveillance report being more likely to mix substance use and sex (Littleton et al., 2005). So, it can be surmised that women who are less satisfied with their physical selves are less confident in approaching sexual encounters and less willing to demand condom use or other contraceptive use. Just as negative body image appears to make women vulnerable to engaging in risky sexual behavior, positive body image seems to be a protective factor. Among sexually active women, more positive body image(i.e., body appreciation) is associated with greater use of barrier and non-barrier contraceptive methods (Gillen et al., 2006; Ramseyer Winter, Ruhr, Pevehouse, & Pilgrim, 2018) and more engagement in a variety of preventive sexual health behaviors (Ramseyer Winter,2017). Body satisfaction may even predict protective sex behavior at a later time point. For instance, among adolescent girls, Schooler (2013) found that body satisfaction in 8th grade predicted consistent condom use in 12th grade (with the exception of girls who had sex before 10th grade). For men, there are less data, yet findings point to a pattern of associations among body image and risky sexual behavior.Two studies to our knowledge have been conducted on this topicamong college men. Schooler and Ward (2006) found no significant association between body comfort and risky sexual behavior.Yet, in another study (Gillen et al., 2006), men who evaluated their appearance in a more positive way had more lifetime sex partners, more unprotected sex, and believed that condoms were less efficacious than their peers who evaluated their appearance more negatively. Also, men who were more oriented toward their appearance had more lifetime sex partners. This may be indicative of a constellation of personality qualities consistent with superficiality. It could also be that positive body image gives men a boost of confidence in sexual situations where they may already feel power through embodiment of the male sexual role (Gillen et al.,2006). Data on sexual attitudes generally suggest that poorer body image is related to risky sexual attitudes. Regarding attitudes toward condoms, a meta-analysis of studies on men and women demonstrated that individuals with higher body dissatisfaction have less condom use self-efficacy (Blashill & Safren, 2015); further, male and female college students with less positive views of their appearance perceive more barriers to condom use. It maybe that individuals who do not feel particularly positive about their bodies also feel a low sense of efficacy in their intimate lives. Demanding condom use of partners may be inconsistent with what they believe they deserve from a sexual partner. Body image is also related to attitudes toward men’s and women’s rolesin sexual situations. College men and women who are more oriented toward their appearance more strongly endorse the sexual double standard, the idea that men should have more sexual free-dom than women (Crawford & Popp, 2003; Gillen et al., 2006). Individuals who are more concerned with their appearance may invest more in achieving cultural standards of beauty. Because they endorse these gendered appearance standards, they may also believe more in the sexual double standard, which argues for gender-specific attitudes and behavior with regard to sex (Gillenet al., 2006).In sum, for sexually active women, there is a clear association between poor body image and risky sexual behavior. Recent research also suggests that measures of positive body image are linked to less sexual risk for women. For men, there are less data on these associations and therefore a strong conclusion cannot be drawn. One study suggests that these associations may work in the opposite direction, in that favorable appearance evaluations might actually increase sexual risk for men (Gillen et al., 2006). More studies are needed, however, to support this idea. There is less research on body image and sexual attitudes, but studies generally suggest that poor body image is associated with more risky sexual attitudes.

4.4. Body image and communication about sex
The literature on body image and communication about sex suggests that individuals with a more favorable body image are more comfortable communicating with a partner about sexual issues.These associations have been found for both men and women. For example, among women, those with higher body esteem and body appreciation communicate more easily with a partner about sex(Pujols et al., 2010; Ramseyer Winter, Gillen et al., 2018). Similarly,research examining adolescent girls demonstrates that those with higher body dissatisfaction were more likely to fear their partners leaving them if they brought up condom use and perceive less control in their relationships (Wingood et al., 2002). Research on boys and young men is consistent with this work on girls and women.College men who reported greater comfort with their bodies (e.g.,facial hair) were more sexually assertive and had higher safer sex self-efficacy (Schooler & Ward, 2006). Similar associations were found among adolescent boys. Across findings from both qualitative and quantitative work, boys with higher body satisfaction had greater clarity about their personal sexual needs and values,and felt more comfortable communicating these ideas to a partner (Schooler, Impett, Hirschman, & Bonem, 2008). Body image mayeven be protective for communication about the sensitive topic of HIV. College students with more positive views of their appearance were more likely to have ever asked a partner’s HIV status and to have asked a partner to get tested for HIV (Gillen & Markey, 2014).In sum, both boys/men and girls/women who have more positive and less negative body image tend to be more comfortable discussing sexual topics with a partner. This comfort includes discussing HIV status, a sensitive topic that may be difficult to approach with a partner.

4.5. Perceptions of breasts and genitals and sexual well-being
Given that breasts and genitals are likely to be visible in sex-ual situations, it is important to consider how perceptions of theseparts of the body relate to sexual well-being. Few studies, however,have considered this, especially individuals’ perceptions of theirbreasts. Increased breast size is associated with increased percep-tions of sexual attractiveness, although medium and large breastsizes (versus small and very small breasts) do not differ significantlyin perceptions of sexual attractiveness (Dixson, Duncan, & Dixson, 2015). Little is known, however, about how women’s breast satis-faction is related to their sexual well-being. In one study (Didie & Sarwer, 2003), women who were pursuing breast augmentation were compared to similar women who were not candidates for this procedure. Women who were candidates for breast augmentation reported higher dissatisfaction with their breasts, but also higher sexual functioning, including greater sexual drive and arousal, as compared to women who were not candidates. This may indicate that women who are interested in increasing their breast size are more interested in sex and more interested in being sexually appealing to partners or potential partners.There is more research on genital self-image, which suggests that these perceptions are important for sexual well-being (Wiederman, 2012), including feelings of sexual attractiveness(Amos & McCabe, 2016). Women’s genital dissatisfaction (e.g.,with the appearance of the vulva) is associated with lower sexual esteem, lower sexual satisfaction, lower sexual functioning, more pain during sexual intercourse, and higher sexual distress (Amos & McCabe, 2016; Pazmany, Bergeron, Van Oudenhove, Verhaeghe, & Enzlin, 2013; Schick, Calabrese, Rima, & Zucker, 2010). Further,women with higher genital self-consciousness, a related construct, have lower sexual esteem and lower sexual satisfaction (Amos &McCabe, 2016; Schick et al., 2010). Among men, results are similar. Men with higher genital satisfaction (e.g., length of penis) and lower genital self-consciousness have higher sexual esteem (Amos & McCabe, 2016). In another study of young men, higher genital satisfaction was related to less sexual anxiety, which was inturn related to less erectile dysfunction (Wilcox, Redmond, & Davis,2015). Some research has also focused on men’s attitudes toward their circumcision status. Men who are happier with their circumcision status (i.e., circumcised or not) reported better global body image, better sexual context-specific body image, greater satisfaction with their genitals, and higher sexual functioning (Bossio &Pukall, 2018).

In sum, there is still more work to be done on the associations between breast and genital perceptions and sexual well-being.There is too little research on breast perceptions and sexual-well-being to draw conclusions. Research on genital self-image suggests that it is significantly related to sexual well-being for both men and women. Individuals who have more positive perceptions of their genitals tend to have higher sexual well-being.