If your device could smile: People trust happy-sounding artificial agents more. Ilaria Torre, Jeremy Goslin, Laurence White. Computers in Human Behavior, December 9 2019. https://doi.org/10.1016/j.chb.2019.106215
Highlights
• Smiling can be heard in the voice without any visual cue.
• This ‘smiling voice’ elicits higher trusting behaviors than a neutral one.
• The higher trust persists even when the speaker is untrustworthy.
• This has implications for the design of voice-based artificial agents.
Abstract: While it is clear that artificial agents that are able to express emotions increase trust in Human-Machine Interaction, most studies looking at this effect concentrated on the expression of emotions through the visual channel, e.g. facial expressions. However, emotions can be expressed in the vocal channel too, yet the relationship between trust and vocally expressive agents has not yet been investigated. We use a game theory paradigm to examine the influence of smiling in the voice on trusting behavior towards a virtual agent, who responds either trustworthily or untrustworthily in an investment game. We found that a smiling voice increases trust, and that this effect persists over time, despite the accumulation of clear evidence regarding the agent’s level of trustworthiness in a negotiated interaction. Smiling voices maintain this benefit even in the face of behavioral evidence of untrustworthiness.
Keywords: TrustSmiling voiceVirtual agents
5. Discussion
Using an investment game paradigm, we found that positive vocal emotional expression – smiling voice – increases
participants’ implicit trust attributions to virtual agents, compared with when agents speak with an emotionally neutral
voice. As previously observed, the monetary returns of the
agent also affected implicit trust, so that participants invested
more money in the agent that was behaving generously. Critically, however, there was no interaction between behavior
and vocal emotional expression: smiling voice enhanced trust
regardless of the explicit behavioral cues that the virtual agent
provided to its trustworthiness.
The effect of smiling voice in the game, supported by
our questionnaire findings, adds to previous studies on emotional expression, showing that the display of a positive emotion increases trust and likeability, even in the vocal channel
(Scharlemann et al., 2001; Krumhuber et al., 2007; PentonVoak et al., 2006).
Smiling was a consistent predictor of investments overall.
That is to say, while participants’ investments were primarily
driven by the virtual player’s generosity or meanness, they
also overall invested more money in the smiling agents. This
contrasts with the predictions of the EASI model (Van Kleef
et al., 2010), according to which the display of a positive
emotion in an incongruent context (such as the mean behavior condition) should elicit uncooperative behaviors. While
Van Kleef et al. (2010) listed social dilemma tasks based
on Prisoner’s Dilemma among possible competitive situations, it is possible that participants in an iterated investment
game view it as an essentially cooperative task. Specifically,
while typical Prisoner’s Dilemma tasks involve a dichotomous choice (cooperate/defect), in our experiment, even in
the mean condition, the agent was still returning a (small)
amount of money, which might have been seen as a partially
cooperative signal by participants.
If participants are reluctant to give up on cooperation —
as shown by the fact that investments increase in the second
half of the game in the mean condition (Fig. 3) — they might
be even more reluctant to give up on partners who seem to
encourage them to cooperate, with their positive emotional
expression. In Krumhuber et al. (2007), people explicitly
and implicitly trusted smiling faces more than neutral faces,
regardless of the sincerity of their smile, and genuine smiles
were trusted more than fake smiles (Krumhuber et al., 2007).
Similarly, Reed et al. (2012) found that people displaying
either Duchenne or non-Duchenne smiles were more likely to
cooperate in a one-shot investment game (Reed et al., 2012).
Thus, displaying an emotion, even a feigned one, might be
preferred to not displaying any emotion at all, hence the
increased investments to the mean smiling agents.
Additionally, participants might have felt more positive
emotions themselves upon hearing a smiling agent. In fact,
emotional expressions can evoke affective reactions in observers (Geday et al., 2003), which may subsequently influence their behavior (Hatfield et al., 1994), and this ‘emotional
contagion’ might be transmitted through the auditory channel
as well. If this is the case, participants might have trusted
the smiling agents more because feeling a positive emotion
themselves might have prompted them to behave in a cooperative manner (Schug et al., 2010; Mieth et al., 2016). These
results show similarities with Tsankova et al. (2015), who
found that people rated trustworthy faces and voices as happier (Tsankova et al., 2015). Although they addressed the
issue from the opposite direction – "Are trustworthy stimuli
perceived as happier?" rather than "Are happy stimuli perceived as trustworthy?" – taken together, the studies suggest
a bidirectionality in the perception of trustworthiness and
cues to positive emotion, congruent with a ’halo effect’ of
positive traits (Lau, 1982).
The smiling-voice effect suggests that, in the absence of
visual information, the audio equivalent of a Duchenne smile
might act as a relative ‘honest signal’ of cooperation. As
mentioned before, Duchenne smiles are smiles describing
genuine happiness or amusement (Ekman & Friesen, 1982).
Traditionally, in the visual domain they can be distinguished
from other types of smiles because they involve the contraction of the ‘Orbicularis Oculi’ muscle, which is a movement
that is notoriously more difficult to fake (Ekman & Friesen,
1982; Schug et al., 2010). Obviously, in the auditory channel
it is not possible to detect a genuine smiling voice from this
muscular movement. However, it is possible that a smiling
voice which sounds happy might be the auditory equivalent of
a Duchenne smile. As participants indicated that the smiling
voices used in this study did sound happy, it is possible that
the expression of happiness and amusement in the speech
signal led listeners to believe that the agent could be trusted.
A limitation of this study is that no video recordings were
taken during the audio recordings of the speakers used in
this experiment. This means that, while every effort was
was made to ensure consistency in the smile production, it
is possible that our speakers might have produced different
kinds of smiles. As is well known in emotion theory, smiles
can convey many different meanings, and several different
facial expressions of smiles are known (e.g. Rychlowska et
al., 2017; Keltner, 1995). However, much of the research on
the effect of different types of smiles on person perception
and decision making has concentrated on the difference between polite (non-Duchenne) and genuine (Duchenne) smiles
(e.g. Chu et al., 2019; Krumhuber et al., 2007; Reed et al.,
2012). Traditionally, these two are characterised by different
muscle activation, with non-Duchenne smiles only activating the Zygomaticus Major muscle, and Duchenne smiles
also activating the Orbicularis Oculi muscle (Frank et al.,
1993). However, recent studies have suggested that Orbicularis Oculi activation in Duchenne smiles might actually be a
by-product of the Zygomaticus Major activation (Girard et al.,
2019; Krumhuber & Manstead, 2009). Also, the acoustics of
smiling are only affected by activation of the Zygomaticus
Major muscle, which contributes to vocal tract shape, but
not of Orbicularis Oculi. Following past research that Orbicularis Oculi activation is the only thing that distinguishes
Duchenne from non-Duchenne smiles, we would still expect
both smiles to sound the same, as the Zygomaticus Major
activation would be the same.
Still, research on the acoustic characteristics of different
types of smiles is lacking. Drahota et al. (2008-04) obtained
three different smiling expressions – Duchenne smiles, nonDuchenne smiles, and suppressed smiles – as well as a neutral
baseline, from English speakers, and asked participants to correctly identify these four expressions. Participants were only
able to reliably distinguish Duchenne smiles from non-smiles,
but the majority of the other smile types were classified as
non-smiles. Furthermore, they only performed pairwise comparisons between a smile type and a non-smile, but they
did not compare differences in identification between two
different smile types. Even though they only had 11 participants, which warrants for a much-needed replication of this
study, this finding suggests that people might only be able to
acoustically discriminate between two categories, smile and
non-smile.
Similar results were obtained in studies using different
types of visual smiles in decision-making tasks. Previous
work using cooperative games with Duchenne and non-Duchenne (facial) smiles have shown that people made the same
decisions regardless of the type of smile (Reed et al., 2012;
Krumhuber et al., 2007). This suggests that people might react according to a broad, dichotomous smile category (smile
vs. non-smile), even though the smiles in the experiment stimuli were of different qualities. This corroborates previous findings in nonconscious mimicry, whereby facial EMG recordings were different when viewing a face with a Duchenne
smile and a neutral expression, but not when viewing a face
with a non-Duchenne smile and a neutral expression (Surakka
& Hietanen, 1998). This contrasts with Chu et al. (2019), who
found that participants cooperated more with a confederate
expressing a non-Duchenne smile, than with a confederate
expressing a Duchenne smile, following a breach of trust.
However, in this study the confederate only showed the smiling expression after the cooperate/defect decision was made,
whereas in Reed et al. (2012); Krumhuber et al. (2007), as
well as in the current study, the smiling expression was displayed before the decision was made. As Chu et al. (2019)
point out, this factor might have influenced the decisions and
could explain the different behaviors. For example, participants might interpret an emotional expression – such as a
smile – after a decision as being an appraisal of that decision.
People might put more cognitive effort into understanding
this appraisal, as this is essential for shaping future interactions, hence the more accurate discrimination of different
smile types. As de Melo et al. (2015, 2013) suggest, a happy
expression following the decision to cooperate conveys a
different meaning than a happy expression following the decision to defect. This is also consistent with the EASI model
(Van Kleef et al., 2010). On the other hand, a happy expression shown before the decision to cooperate / defect might
rather convey some information about the emotional state of
the person in question, and might be kept independent from
that person’s actual behavior in the game. Also, counterparts’
smiles may lead people to anticipate positive social outcomes
(Kringelbach & Rolls, 2003). Thus, it seems that the timing
of emotional expression in relation to the behavior of interest drastically changes the interpretation of that, and future,
behaviors.
It would be very interesting to replicate the current experiment with different smiling voices, shown before and after
the action is taken in the game. Also, if a similar study were
to be replicated, the actual facial expression of the speakers
could be recorded in order to determine whether different
facial expressions correspond to different auditory smiles,
both in terms of objective measures (acoustics) and in terms
of perception and behavior correlates in the game.
So far, we have compared our results with previous studies
that used facial smiles. These comparisons are necessary,
as at the time of writing there are virtually no studies that
have employed trust games with expressive voices. However,
emotional expressions are naturally multimodal, and it is
possible that a certain emotion expressed only in the voice
might elicit different behaviors than if it were expressed only
in the face, or in a voice + face combination. In fact, previous
research suggested that an ’Emotional McGurk Effect’ might
be at play (Fagel, 2006; Mower et al., 2009; Pourtois et al.,
2005). Thus, our current results can only inform the design
of voice-based artificial agents, but should not be extended
to the design of embodied agents.
The results from questionnaires validate the behavioral
measures obtained from the investment game. We found
that people consistently gave higher ratings of trustworthiness and liking to the smiling agents, and to the agents that
behaved generously in the game. Again, the lack of interactions between smiling and behavior suggests that the smiling
voice mitigates negative reactions following an untrustworthy
behavior.
We also found some evidence that individual differences
among participants might play a role in trusting behavior, as
shown by the 3-way interaction between behavior, game turn,
and gender (Section 4.1). The effect of gender on trusting and
trustworthiness has been widely studied using game theoretic
paradigms, but so far there has been no definite conclusion on
whether women trust more / are more trustworthy than men,
or vice versa (e.g. Chaudhuri et al., 2013; Bonein & Serra,
2009; Slonim & Guillen, 2010). Our results support previous
findings showing that we tend to trust people of the opposite gender more (Slonim & Guillen, 2010), as men in our
experiment invested more money than women to the virtual
agents, which had a female voice. They also support findings
that men trust more than women in general (Chaudhuri &
Gangadharan, 2007). However, these conclusions only hold
insofar as the generous behavior condition is concerned, as in
the mean condition men actually trusted the virtual agent less
than women did. A similar behavior was previously observed
in Haselhuhn et al. (2015), who found that men showed less
trust following a trust breach on the trustee’s part (Haselhuhn
et al., 2015). Also, Torre et al. (2018) showed that people
who formed a first impression of trustworthiness of a virtual
agent punished it when the agent behaved in an untrustworthy
manner, by investing less money than to an agent whose first
impression was lower. Thus, a ’congruency effect’ might be
at play here: our male participants might have formed a first
impression of trustworthiness of the female agents (Slonim
& Guillen, 2010); when this first impression was congruent
with the observed behavior (in the generous condition), the
agent received more monetary investments from the male
participants. On the other hand, when the first impression
was incongruent with the observed behavior (mean condition), it received less (cf. Torre et al., 2018). Participants’
age did not have an effect on the behavioral results from the
investment game, but it did influence participants’ explicit
ratings of the artificial agents’ trustworthiness, with older
people indicating lower trust. This is consistent with the idea
that younger people trust technology more, perhaps due to
a higher degree of familiarity (e.g. Scopelliti et al., 2005;
Giuliani et al., 2005; Czaja & Sharit, 1998). However, we did
not match participants’ age – or gender– systematically, so
more research is needed on the role of individual differences
on trust towards voice-based artificial agents.
Finally, speaker identity was varied randomly rather than
wholly systematically in our experimental design, and so
we included speaker identity as a random rather than fixed
effect in our analyses. It is possible, indeed likely, that participants’ trust attributions were influenced by the virtual
agents’ unique vocal profiles as well as their behavior and
smiling status. In fact, Fig. 7 shows that people invested
more money with speaker B2, followed by speakers R1, R2,
and B1 (mean overall investments = £5.46, £4.76, £4.11,
£3.56, respectively). This is not unexpected: voices carry a
wide variety of information about the speaker, such as gender, accent, age, emotional state, socioeconomic background,
etc., and all this information is implicitly used by listeners to
form an initial impression of the speaker; a short exposure
to someone’s voice is enough to determine if that someone
can be trusted (McAleer et al., 2014). For example, in the
free-text comments explaining the liking rating to each voice,
one participant remarked that smiling speaker B2 “varied in
tone and was much more interesting to listen to” and neutral
speaker B2 was “calm and convincing”; on the other hand,
smiling speaker R2 was “mellow and monotone"and neutral
speaker R2 “sounded bored and insincere”. Smiling speaker
B1 was “quite annoying” and the neutral version “didn’t seem
trustworthy or reassuring”, “sounded too neutral” and even
“too fake”. Thus, when designing a voice for an artificial
agent, it is important to also keep in mind what effect its
specific vocal imprint will have on the user (see als McGinn
& Torre, 2019). Nevertheless, any potential between-speaker
differences in the current experiment were nested within the
effect of smiling voice, as all speakers were recorded in both
smiling and neutral conditions
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment