Wednesday, December 9, 2020

Rolf Degen summarizing... There was no connection between the attractiveness of the face and the voice, again casting doubt on the "good genes" theory of physical attractiveness

Attractiveness and distinctiveness between speakers' voices in naturalistic speech and their faces are uncorrelated. Romi Zäske, Verena Gabriele Skuk and Stefan R. Schweinberger. Royal Society Open Science, December 9 2020. https://doi.org/10.1098/rsos.201244

Rolf Degen's take: https://twitter.com/DegenRolf/status/1336606172384735233

Abstract: Facial attractiveness has been linked to the averageness (or typicality) of a face and, more tentatively, to a speaker's vocal attractiveness, via the ‘honest signal’ hypothesis, holding that attractiveness signals good genes. In four experiments, we assessed ratings for attractiveness and two common measures of distinctiveness (‘distinctiveness-in-the-crowd’, DITC and ‘deviation-based distinctiveness', DEV) for faces and voices (simple vowels, or more naturalistic sentences) from 64 young adult speakers (32 female). Consistent and substantial negative correlations between attractiveness and DEV generally supported the averageness account of attractiveness, for both voices and faces. By contrast, and indicating that both measures of distinctiveness reflect different constructs, correlations between attractiveness and DITC were numerically positive for faces (though small and non-significant), and significant for voices in sentence stimuli. Between faces and voices, distinctiveness ratings were uncorrelated. Remarkably, and at variance with the honest signal hypothesis, vocal and facial attractiveness were also uncorrelated in all analyses involving naturalistic, i.e. sentence-based, speech. This result pattern was confirmed using a new set of stimuli and raters (experiment 5). Overall, while our findings strongly support an averageness account of attractiveness for both domains, they provide no evidence for an honest signal account of facial and vocal attractiveness in complex naturalistic speech.

4. General discussion

4.1. Relationships between attractiveness and distinctiveness

The present study is to our knowledge, the first to demonstrate a systematic relationship between perceived attractiveness and distinctiveness in human voices. Here, we found strong negative correlations between attractiveness and deviation-based distinctiveness (DEV) for voices when based both on vowels (ρ = −0.85) and on sentences (ρ =−0.87). This pattern was analogous to and, if anything, even stronger than the previously described negative correlation between attractiveness and DEV for faces (experiment 2: ρ = −0.64; experiment 4: ρ = −0.74). Overall, this pattern of findings provides strong support for an averageness account of attractiveness for both faces and voices [3,7] when distinctiveness is assessed in a deviation-based manner. Note, that the negative relationship between attractiveness and DEV was not merely an artefact of imitating a model speaker during voice recordings: in experiment 5 using new speakers, we replicated our results for voices recorded with the model, but also found significant, though smaller, negative correlations for voices recorded without model speaker (vowels: ρ = −0.46, and sentences: ρ = −0.59). While this indicates that the presence of a model partially preserves idiosynchratic variability in voices which drives the relationship between attractiveness and DEV in terms of stable ‘voice traits’, speaking after a model enhances the strength of this relationship, perhaps owing to a change of natural voice variation affecting either attractiveness, DEV or both. Based on correlations across recording modes, which were substantial and positive for attractiveness, but tended to be relatively smaller for DEV ratings (at least for vowels), we tentatively suggest that the presence of a model speaker may change natural variation of DEV more than variation of attractiveness. Note, however, that mean F0 was remarkably similar across the recording modes. The notion that the relationship between attractiveness and DEV is systematic and substantial, independent of recording mode, is further supported by strong and positive cross-sentence correlations throughout for attractiveness ratings (0.64 ≤ ρ ≤ 0.77) as well as for DEV ratings (0.63 ≤ ρ ≤ 0.79), with no significant difference between recording modes.

The present findings are important because they were obtained in the context of ratings for real stimuli, rather than for averaged ‘composite’ stimuli. Note that averaging towards composites causes artefacts per se, such as smooth and symmetric visual patterns for faces, or increased harmonics-to-noise ratios for voices. (It should also be noted, however, that voice morphing is a comparatively new and elaborate technique which only a few laboratories master, and this may also explain why there are relatively few studies investigating the averageness account of vocal attractiveness.) We propose that, because such ‘non-average’ features of digitally created composites have been shown to consistently contribute to perceived attractiveness [7,14], studies with natural individual stimuli that vary in perceived prototypicality or averageness are important to cross-validate findings obtained with composite stimuli.

Compared to these consistent findings of negative correlations between attractiveness and deviation-based distinctiveness, the relationship with rated attractiveness was much less consistent for ‘in-the-crowd’-based distinctiveness ratings. Specifically, while there was also a moderate negative correlation between attractiveness and ‘in-the-crowd’-based distinctiveness (VITC) for voices when based on vowels (experiment 1), this correlation was positive when based on sentences (experiment 3). For faces, a marginally non-significant positive correlation between attractiveness and FITC was found in experiment 1 (numerically similar to the significant positive correlation with more stimuli as reported in [34]), and while this pattern was not seen in experiment 3 using the same stimuli and task, the relationships between rated attractiveness and DITC ratings were inconsistent when compared to DEV ratings. Taken together, our findings confirm that common deviation-based and ‘in-the-crowd’-based measures of distinctiveness (VITC and FITC) measure at least partially different constructs [34], and extend those findings by showing that this is the case both for faces and for voices. For voices, the specific relationship between attractiveness and distinctiveness appears to depend on the type of utterance. Specifically, while simple vowel stimuli were rated as less attractive with increasing VITC distinctiveness (experiment 1), in line with an averageness account, sentence stimuli (experiment 3) were rated as more attractive with increasing VITC distinctiveness.

These differences between different types of utterances may generally be related to differences in duration and/or number of different phonemes [6,59]. Whereas sentences carry much richer cues to attractiveness and distinctiveness, vowels are simple periodic utterances which are mainly influenced by ‘static’ biophysical characteristics of individual speakers. VITC ratings for vowels could also differ from those for sentences owing to a certain oddity of imagining someone saying a prolonged vowel sound in a crowd. Possibly related to this notion, an earlier study reported that perceived voice attractiveness and acoustic distance to mean (in terms of F0, F1) were correlated for a vowel (/a/), but not for a word or sentence [6]. Overall, ratings of attractiveness and VITC distinctiveness are probably based on partially different sets of acoustic cues, depending on their salience in a given utterance. The positive correlation between voice attractiveness and VITC distinctiveness is reminiscient of analogous findings for faces in previous research [34] where it has been argued that DITC measures of distinctiveness may be distorted by cognitive heuristics. Accordingly, raters might be biased to think that they would surely spot a highly attractive person in the crowd, even when this might not be the case. Such an effect seems to generalize to voices in the present study, at least when ratings are based on sentence stimuli. Note that such a putative heuristic, as suggested here, does not imply that attractive voices would, in fact, stand out of a crowd if put to test. (In fact, other more salient bottom-up acoustic characteristics such as intensity [60] probably play a more prominent role here which we had controlled in our stimuli by RMS normalization.) While we are at present unaware of studies addressing the specific issue of whether attractive voices stick out from a noisy environment, a recent study on the ‘cocktail-party effect’ [61] could provide tentative and indirect evidence in favour of this assumption. Specifically, interference from a non-target speaker can be reduced both when the target is familiar and the interfering voice is unfamiliar and, critically, also when the target is unfamiliar and the interfering voice is familiar [62]. Although the link to attractiveness is indirect, voice familiarity, just like voice averaging (and, by implication, attractiveness), could promote positive evaluation via a fluency mechanism as seen in the mere exposure effect.

In contrast to ‘in-the-crowd-based’ measures of distinctiveness, correlations for deviation-based distinctiveness and attractiveness were highly consistent, and consistently negative across modalities and utterance types in the present study. This supports the notion that both faces and voices become increasingly attractive the more typical, i.e. the more average, they are perceived relative to prior personal experience [3,7]. At variance with this experience-based account of typicality, it has been argued recently that typicality ratings rather reflect stereotypes of what constitutes attractive and typical voices [9]. In our view, this may be the case for tasks that do not further specify what typicality/distinctiveness is. However, it should be noted that our task explicitly invoked a memory component by asking participants to judge distinctiveness relative to the faces and voices they know. Given the different patterns of results for two types of distinctiveness measures, we believe that it is extremely important for future studies to specify exactly how typicality/distinctiveness was assessed.

Overall, while deviation-based measures gave rise to a highly consistent pattern of negative correlations with attractiveness across stimulus modalities and domains, inconsistent correlations were seen for attractiveness and DITC measures which may be distorted by subjective heuristics. This may indicate that DEV ratings are preferable over ‘in-the-crowd’-based measures to assess distinctiveness in an unbiased manner.

4.2. Relationships between ratings for faces and voices of the same speakers

The second aim of the present study was to provide a systematic assessment of relationships between ratings of attractiveness and distinctiveness for faces and voices from the same speakers. Positive correlations between independent ratings of faces and voices might be expected to the extent that (i) facial and vocal features are determined by the same underlying basis (e.g. genetic or hormonal), and (ii) those features systematically influence perceptions under investigation (e.g. of attractiveness or distinctiveness). A common basis of vocal and facial attractiveness has been postulated by several studies (e.g. [35,41,43]). While we are unaware of research directly linking attractiveness and hormonal status via distinctiveness, there is evidence that certain vocal parameters (e.g. F0, vocal tract length estimates, shimmer, jitter, harmonics-to-noise ratio, as determined from sustained vowel recordings only) are linked to speakers' body size measurements (e.g. height, weight and waist-to-hip ratios), as probably mediated by hormonal mechanisms [63]. The present findings, however, consistently indicate that correlations between vocal and facial attractiveness are remarkably absent in the majority of the studied conditions, and small at best in one exception which we discuss below (figure 2). It could be argued that the standardization of the present stimuli in terms of neutral emotional expression and speaking style according to a model speaker, may have compromised to some degree the natural variation between voices relevant for perceived attractiveness, such as vocal pitch.

However, experiment 5 addresses this concern, and its results are clear in showing that correlations between facial and vocal attractiveness in sentences were also absent for voices recorded naturally and without a model speaker, as predicted [55] based on our findings from experiment 4. Although we had no predictions regarding the small positive correlation (ρ = 0.30) between facial and vocal attractiveness for simple vowels we had observed in experiment 1, it may be noted that this was not replicated in experiment 5. Rather, we found a numerically negative, though non-significant correlation (ρ = −0.30) in the condition with model speaker, and a numerically negative non-significant correlation (ρ = −0.18) in the new condition without a model speaker. Overall, the pattern of results across five experiments would seem to indicate that, for a range of conditions tested in this series of experiments, any correlation between facial and vocal attractiveness is small at best, and is potentially non-existent.

Together, the present findings challenge the ‘honest signal account’ of facial and vocal attractiveness [14]. On one hand, we appreciate that the only exception to this pattern, a small but significant positive correlation between facial and vocal attractiveness in experiment 1, when simple vowels were used as voice samples, could potentially resolve discrepancies between our data and previous findings in which evidence for a correlation between facial and vocal attractiveness was reported using similarly simple vocalizations [41,43]. On the other hand, our failure to replicate this finding with a new set of speakers emphasizes the importance for researchers both to critically consider the nature of the stimuli used to assess these relationships, and to assess the replicability of critical findings across a range of conditions and situations.

In that respect, prerequisites to find evidence for or against the honest signal hypothesis include that face and voice stimuli should be honest and undistorted representations of their owners' genetic quality. We selected our face stimuli to be devoid of attractiveness-enhancing features such as make-up or jewellery. However, it may be more difficult to remove or standardize socio-cultural norms of attractiveness that are reflected in acquired speech patterns in the voice [9]. In that sense, the present voice ratings to more naturalistic sentence stimuli may in part reflect cultural norms, rather than purely biophysically determined voice qualities. Accordingly, one explanation for the results found with simple vowels could be that these are relatively devoid of such socio-cultural cues, and thus may reflect genetic factors more ‘honestly’ compared to more naturalistic and complex vocalizations. While this interesting possibility should be addressed in more detail in future research, we can conclude that correlations between vocal and facial attractiveness appear to be remarkably absent, at least when voices are presented in the more naturalistic context of sentences (as opposed to vowels) akin to everyday communication.

Although acoustic analyses on vowel pitch and sentence duration did not indicate different degrees of acoustic variability for samples recorded with, versus without, the model speaker, electronic supplementary material, table S19 suggest approximately 10% longer average durations of the same sentences when produced with compared to without a model speaker. We tentatively attribute this difference to the larger effort to imitate neutral emotion and speaking style of a model.

With respect to distinctiveness, facial and vocal ratings were uncorrelated for both types of distinctiveness ratings, suggesting no common basis for perceived distinctiveness. An interesting question for future research is how various measures of vocal distinctiveness could be related to one another. For instance, it would be instructive to assess in more detail how DITC and DEV are related with the actual recognizability of voices (for relevant research on faces, see [34,64,65]), and to determine the acoustic stimulus parameters which underlie different aspects of perceived vocal distinctiveness (for relevant methods, see [66]).

4.3. Limitations

As a possible limitation for both the present study and earlier research in this field [35,40,41,43], we assessed attractiveness and distinctiveness for static faces, and thus did not consider a possible role of dynamic facial information. To the extent that static and dynamic faces may be judged by different standards [67], it remains possible that cross-domain correlations between facial and vocal attractiveness could be found for dynamic facial stimuli. In fact, one previous study emphasized the role of dynamic information for correlations between vocal and visual attractiveness, although this was not found consistently across different conditions [68]. Recent theoretical accounts of person perception increasingly address the role of dynamic information [69], and this issue warrants further investigation.

As a second step towards understanding impression formation in every-day social interaction, it may be of interest how faces and voices combine to shape our evaluation of a person's attractiveness. Clearly, simultaneous presentation of face-voice stimuli would be unsuited to study the honest-signal account of attractiveness which requires independent ratings of (unimodal) voices and faces, owing to possible multimodal interactions. Interestingly, such interactions present a promising research field in their own right, as they can reveal important insights into the relative contribution of facial and vocal information to social evaluations beyond attractiveness (see [70,71]).

No comments:

Post a Comment