Bone conduction facilitates self-other voice discrimination. Pavo Orepic, Oliver Alan Kannape, Nathan Faivre and Olaf Blanke. Royal Society Open Science, February 15 2023. https://doi.org/10.1098/rsos.221561
Abstract: One's own voice is one of the most important and most frequently heard voices. Although it is the sound we associate most with ourselves, it is perceived as strange when played back in a recording. One of the main reasons is the lack of bone conduction that is inevitably present when hearing one's own voice while speaking. The resulting discrepancy between experimental and natural self-voice stimuli has significantly impeded self-voice research, rendering it one of the least investigated aspects of self-consciousness. Accordingly, factors that contribute to self-voice perception remain largely unknown. In a series of three studies, we rectified this ecological discrepancy by augmenting experimental self-voice stimuli with bone-conducted vibrotactile stimulation that is present during natural self-voice perception. Combining voice morphing with psychophysics, we demonstrate that specifically self-other but not familiar-other voice discrimination improved for stimuli presented using bone as compared with air conduction. Furthermore, our data outline independent contributions of familiarity and acoustic processing to separating the own from another's voice: although vocal differences increased general voice discrimination, self-voices were more confused with familiar than unfamiliar voices, regardless of their acoustic similarity. Collectively, our findings show that concomitant vibrotactile stimulation improves auditory self-identification, thereby portraying self-voice as a fundamentally multi-modal construct.
1. Introduction
We are all familiar with the strange sensation that occurs when we hear our voice in video or voice recordings [1–5]. Considering the fundamental role our voice plays in our everyday communication, this should be quite surprising. We have a lifelong daily exposure to our voice, higher than exposure even to the most familiar voices. Our own voice is the sound most intimately linked to our self. Although there is ample evidence showing that self-related stimuli are perceived differently and activate distinct cortical regions compared with other, non-self-associated stimuli [6–14], the specific mechanisms of self-voice perception have been surprisingly under-investigated, both in behavioural and neuroimaging studies [15–17]. For instance, the extent to which self-voice perception differs from that of other familiar voices remains poorly understood; as does the extent to which acoustic properties that enable discriminating voices of other people [18] are involved in self-other voice discrimination (VD). A better understanding of self-voice perception is of immediate clinical relevance, as deficits in self-other VD have been related to auditory-verbal hallucinations (AVHs) [19–22] (i.e. ‘hearing voices’), one of the most common [23,24] and most distressing [25,26] hallucinations in a major psychiatric disorder, schizophrenia. Investigating different perceptual factors underlying self-other VD, we here hypothesized that one key contribution would stem from bone conduction and, based on our findings, propose a new experimental paradigm that improves the ecological validity for studying self-voice perception.
A crucial contribution for the perception of our own voice, and our own voice only, comes from bone conduction resulting from speech production/articulation. Under natural conditions, one's spoken voice is transmitted not only through the air, but also, unfailingly through the skull [27,28], which alters self-voice perception in two ways. First, due to the different sound propagation, bone conduction transforms the sound of our voice—specifically, it is assumed to instantiate a low-pass filter [29,30]. Because of the low-frequency emphasis, we hear our voice as lower [29] compared with how our voice sounds to others. Second, next to transforming the sound of our voice, bone conduction conveys additional sensory information, as not only auditory, but also vibrotactile [31] and somatosensory [32,33] signals are involved, resulting from the vibrations of the skull and skin deformation. Thus, self-voice, when heard under natural conditions, is not only an auditory but rather a multi-modal percept.
One reason for the scarcity of self-voice studies probably lies in methodological obstacles faced when creating appropriate experimental stimuli. Without bone conduction, prior self-voice studies inevitably contain a perceptual mismatch between the experimental self-voice stimuli (e.g. presented through air-conducting loudspeakers) and the actual self-voice. In fact, the majority of studies that compared recognition of self-voice versus other voices reported lower accuracy rates and higher response times for self-voice compared with other voices [16,34–48]. Early self-voice studies suggested that this discrepancy between self- and other voices might result from a lower previous exposure to self-voice in voice recordings [34,35,37]. However, similar behavioural differences still persist [16,36–41,45], with a higher exposure to recorded self-voice through contemporary technology (e.g. voice messages and video recordings). Moreover, more recent self-voice paradigms often demonstrate ceiling effects [37,39–41,46–49], e.g. high accuracy rates in all experimental conditions, reflecting a need for more sensitive experimental paradigms. To account for the aforementioned ecological discrepancy, several studies investigated if acoustic transformations (e.g. low-pass or other types of filters) of air-conducted self-voice stimuli would render the self-voice more natural to the listeners. These attempts, however, yielded contradictory results [50–54], as they indicated preferences for different acoustic transformations. Crucially, these studies manipulated only one aspect related to bone conduction effects on self-voice (i.e. acoustic transformations) and neglected the additional vibrotactile stimulation. In order to better approximate natural self-voice, experimental self-voice stimuli should be accompanied with the concomitant vibrotactile stimulation resulting from the vibrations of the skull. Here, we address this by providing vibrational input through a bone conduction headset and investigate whether it improves self-voice perception, as opposed to auditory input alone.
In a series of three behavioural studies in independent cohorts, and using a new self-voice perception paradigm, we investigated the following three main perceptual factors of self-other VD: (i) sound conduction type (air versus bone), (ii) other-voice familiarity (familiar versus unfamiliar), and (iii) acoustic voice parameters. Using voice-morphing technology [55] and bone conduction headphones, we designed a psychophysical self-other VD task to investigate the nature of perceptual differences in self-other VD, while trying to avoid ceiling effects. Participants heard short voice morphs of their own and other people's vocalizations (phoneme /a/) and indicated whether the morphs more closely resembled their own or someone else's voice. In Study 1 (N = 16), we investigated differences in self-other VD as a function of sound conduction (air, bone) and how this is modulated by previous exposure to self-voice [34,35,37]; in Study 2 (N = 16), we extended this to familiar-other VD in order to investigate whether the bone conduction effects are specific for self-voice, or generalize to other familiar voices [56,57]. In Study 3, we set out to replicate Studies 1 and 2 within a single, larger cohort (N = 52). We, furthermore, included an additional self-familiar VD task and a control self-voice recognition task (without voice morphing) and investigated the acoustic parameters of all tested voices [18]. We hypothesized that bone conduction would facilitate self-voice perception in self-other VD (bias or increased sensitivity) (Study 1) but would not affect familiar-other VD task (Study 2). We further hypothesized that bone conduction effects would be more prominent without exposure to the self-voice used in our experiment prior to the task—i.e. when the task difficulty is increased (Studies 1 and 2)—and that they would occur regardless of other-voice familiarity [56,57] (Study 3).