Unmasking the Difficulty of Listening to Talkers With Masks: lessons from the COVID-19 pandemic. Elena Giovanelli et al. i-Perception, March 10, 2021. https://doi.org/10.1177/2041669521998393
Abstract: Interactions with talkers wearing face masks have become part of our daily routine since the beginning of the COVID-19 pandemic. Using an on-line experiment resembling a video conference, we examined the impact of face masks on speech comprehension. Typical-hearing listeners performed a speech-in-noise task while seeing talkers with visible lips, talkers wearing a surgical mask, or just the name of the talker displayed on screen. The target voice was masked by concurrent distracting talkers. We measured performance, confidence and listening effort scores, as well as meta-cognitive monitoring (the ability to adapt self-judgments to actual performance). Hiding the talkers behind a screen or concealing their lips via a face mask led to lower performance, lower confidence scores, and increased perceived effort. Moreover, meta-cognitive monitoring was worse when listening in these conditions compared with listening to an unmasked talker. These findings have implications on everyday communication for typical-hearing individuals and for hearing-impaired populations.
Keywords: speech processing, multisensory, speech in noise, facial masks, COVID-19
In the present work, we mimicked a real multitalker video call to measure the impact of different visual conditions on speech comprehension in typical hearing participants. Results showed that hiding the talkers behind a black screen or concealing their lips via a face mask led to lower performance and lower listening confidence scores as well as increased listening effort. These differences between listening conditions suggest that the actual audio-visual benefit coming from vision relies on lip reading and demonstrate the impact of face masks on speech comprehension. Understanding a talker wearing a face mask in noise was, in our study, comparable to not seeing him or her at all. Importantly, these findings emerged in a context in which we disentangled the impact of visual information related to wearing a mask from the voice distortions generated by the mask. In this way, our results can be interpreted as the consequences of altering or removing visual information from lip movements in speech processing.
Our visual manipulation also impacted on the ability to successfully judge one’s own cognitive processes while engaged in a task, namely, meta-cognitive monitoring. Face masks reduced meta-cognitive monitoring abilities. In this condition, participants’ listening confidence about their performance was less consistent with their objective performance (e.g., they could be confident about their performance, when in fact their speech comprehension was poor, or vice versa). This result is in line with previous work concerning the effect of face masks on confidence in reading emotions (Carbon, 2020), which found lower confidence and accuracy scores in recognizing expressions displayed by faces wearing surgical masks. This result supports the idea that hiding the lower part of a face undermines the efficacy of a conversation not only linguistically but also from a nonverbal point of view. While this result merits further investigation, it may suggest that when interacting with people wearing a mask, we not only feel less confident about our listening experience overall, but we are also less capable of monitoring whether we understood the message correctly or not. In addition, the confusion they generate on emotional reading of face expressions could further contribute to lowering the efficacy of our everyday life communications, preventing us from reconstructing the emotional tone of a conversation, which could partially contribute to better speech comprehension. This novel result is particularly interesting because compensatory strategies (e.g., asking our conversational partner to speak slower or in a louder voice) are typically triggered by adequate meta-cognitive monitoring of the success of the communication exchange (Boldt & Gilbert, 2019).
In June 2020, the World Health Organization warned about the potential risks and harms of face masks on daily communications. As evidenced by this study, when a talker wears a face mask the listening effort increases, while performance and confidence in what we listen decrease (see also Coniam, 2005; Llamas et al., 2009; Saunders et al., 2020). This could potentially result in stress and misunderstandings during communications, and even lead to risky behaviors, such as pulling down face masks or reducing social distancing while trying to understand each other better. In this study, we intentionally focused on a population of young adults, native speakers of Italian (the language used in the experiment), who reported no hearing difficulties. This is because we reasoned that any effect observed in this sample could only be exacerbated in populations that experience difficulties with language and communication. These populations include hearing children developing their L1, for whom the observation of adults’ mouths can play a key role in an educational context (Spitzer, 2020); hearing children and adults learning a new language (L2); adults and aging people with normal hearing but sensitive to noisy contexts (Tremblay et al., 2015); and obviously all the populations with hearing loss or profound deafness. We believe it is a social priority to extend research on the effects of face masks on communication as well as other aspects of interpersonal perception (such as emotional processing or personal identity identification: Carbon, 2020) to all these populations.
The question arises them of how we can combine safe behavior and effective communication. One approach is to consider the introduction of transparent masks on a large scale. At the moment, they are only used in few medical settings (e.g., in the United Kingdom; Action on Hearing Loss, 2020), but they are gaining increasing attention among the hearing-impaired community (Taylor-Coleman, 2020). Even though this solution may seem the best way to reinstate lip reading into verbal communication, the current generation of transparent masks have several limitations. On the one hand, their materials impact greatly on the high frequencies of the human voice (Corey et al., 2020) affecting consonant perception (Divenyi et al., 2005; Roth et al., 2011). On the other hand, transparent masks are difficult to find because there is only a limited number of producers (Chodosh et al., 2020). Finally, in many countries, these devices are not approved by health authorities.
To conclude, our findings provide a clear example of the audio-visual nature of speech processing, and they emphasize the perceptual and meta-cognitive limitations that result from occluding the face of our conversational partner. From the methodological point of view, our study represents a successful attempt to investigate audio-visual communication using an on-line task and simulating an ordinary listening context, such as the video call with a limited number of talkers. Clearly, when conducting hearing research online, a number of criteria need to be relaxed. It would be important to replicate and extend these observations running similar experimental protocols in a more controlled laboratory context in which individual hearing thresholds are also measured (unlike here). Moreover, it would also be important to increase the number of trials per participant (that said, our linear mixed-effect model approach to the analysis implies that we worked on a dataset of 1728 measures overall). Future experiments should also consider using audio tracks recorded both with and without masks, in order to objectively estimate the actual transmission loss produced by the masks and directly compare the effects of those distortions on speech comprehension. It is clear that such a comparison should necessarily exploit professional audio tools and accurate measures, only obtainable in a laboratory context. Nonetheless, our results agree with a vast literature on the multisensory contributions to speech perception and already provide support to recent petitions that pressured the main video conferencing platforms to offer real-time speech-to-text captioning (Chodosh et al., 2020). Most importantly, our findings indicate that audio-visual communication should be pursued even in the case of the health constraints imposed by a world pandemic. This is necessary for everyone, but especially for those individuals for whom face masks could become a severe obstacle to social inclusion.