skip to main content

Title: Differences in Gradient Emotion Perception: Human vs. Alexa Voices
The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing ‘happiness’ (0 %, 33 %, 66% ‘happier’). On each trial, listeners (native speakers of American English, n=99) rated a given sentence on two scales to assess dimensions of emotion: valence (negative-positive) and arousal (calm-excited). Participants also rated the Alexa voice on several parameters to assess anthropomorphism (e.g., naturalness, human-likeness, etc.). Results showed that the emotion manipulations led to increases in perceived positive valence and excitement. Yet, the effect differed by interlocutor: increasing ‘happiness’ manipulations led to larger changes for the human voice than the Alexa voice. Additionally, we observed individual differences in perceived valence/arousal based on participants’ anthropomorphism scores. Overall, this line of research can speak to theories of computer personification and elucidate our changng relationship with voice-AI technology.
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Proceedings of Interspeech
Page Range or eLocation-ID:
1818 to 1822
Sponsoring Org:
National Science Foundation
More Like this
  1. The goal of this research is to develop Animated Pedagogical Agents (APA) that can convey clearly perceivable emotions through speech, facial expressions and body gestures. In particular, the two studies reported in the paper investigated the extent to which modifications to the range of movement of 3 beat gestures, e.g., both arms synchronous outward gesture, both arms synchronous forward gesture, and upper body lean, and the agent‘s gender have significant effects on viewer’s perception of the agent’s emotion in terms of valence and arousal. For each gesture the range of movement was varied at 2 discrete levels. The stimuli of the studies were two sets of 12-s animation clips generated using fractional factorial designs; in each clip an animated agent who speaks and gestures, gives a lecture segment on binomial probability. 50% of the clips featured a female agent and 50% of the clips featured a male agent. In the first study, which used a within-subject design and metric conjoint analysis, 120 subjects were asked to watch 8 stimuli clips and rank them according to perceived valence and arousal (from highest to lowest). In the second study, which used a between-subject design, 300 participants were assigned to two groups ofmore »150 subjects each. One group watched 8 clips featuring the male agent and one group watched 8 clips featuring the female agent. Each participant was asked to rate perceived valence and arousal for each clip using a 7-point Likert scale. Results from the two studies suggest that the more open and forward the gestures the agent makes, the higher the perceived valence and arousal. Surprisingly, agents who lean their body forward more are not perceived as having higher arousal and valence. Findings also show that female agents’ emotions are perceived as having higher arousal and more positive valence that male agents’ emotions.« less
  2. Despite significant vision loss, humans can still recognize various emotional stimuli via a sense of hearing and express diverse emotional responses, which can be sorted into two dimensions, arousal and valence. Yet, many research studies have been focusing on sighted people, leading to lack of knowledge about emotion perception mechanisms of people with visual impairment. This study aims at advancing knowledge of the degree to which people with visual impairment perceive various emotions – high/low arousal and positive/negative emotions. A total of 30 individuals with visual impairment participated in interviews where they listened to stories of people who became visually impaired, encountered and overcame various challenges, and they were instructed to share their emotions. Participants perceived different kinds and intensities of emotions, depending on their demographic variables such as living alone, loneliness, onset of visual impairment, visual acuity, race/ethnicity, and employment status. The advanced knowledge of emotion perceptions in people with visual impairment is anticipated to contribute toward better designing social supports that can adequately accommodate those with visual impairment.
  3. Abstract Objectives Traditional police procedural justice theory argues that citizen perceptions of fair treatment by police officers increase police legitimacy, which leads to an increased likelihood of legal compliance. Recently, Nagin and Telep (2017) criticized these causal assumptions, arguing that prior literature has not definitively ruled out reverse causality—that is, legitimacy influences perceptions of fairness and/or compliance influences perceptions of both fairness and legitimacy. The goal of the present paper was to explore this critique using experimental and correlational methodologies within a longitudinal framework. Methods Adolescents completed a vignette-based experiment that manipulated two aspects of officer behavior linked to perceptions of fairness: voice and impartiality. After reading the vignette, participants rated the fairness and legitimacy of the officer within the situation. At three time points prior to the experiment (1, 17, and 31 months), participants completed surveys measuring their global perceptions of police legitimacy and self-reported delinquency. Data were analyzed to assess the extent to which global legitimacy and delinquency predicted responses to the vignette net of experimental manipulations and controls. Results Both experimental manipulations led to higher perceptions of situational procedural justice and officer legitimacy. Prior perceptions of police legitimacy did not predict judgments of situational procedural justice; however,more »in some cases, prior engagement in delinquency was negatively related to situational procedural justice. Prior perceptions of legitimacy were positively associated with situational perceptions of legitimacy regardless of experimental manipulations. Conclusions This study showed mixed support for the case of reverse causality among police procedural justice, legitimacy, and compliance« less
  4. This study examines an aspect of the role of emotion in multimedia learning, i.e., whether participants can recognize the instructor’s positive or negative emotion based on hearing short clips involving only the instructor’s voice just as well as also seeing an embodied onscreen agent. Participants viewed 16 short video clips from a statistics lecture in which an animated instructor, conveying a happy, content, frustrated, or bored emotion, stands next to a slide as she lectures (agent present) or uses only her voice (agent absent). For each clip, participants rated the instructor on five-point scales for how happy, content, frustrated, and bored the instructor seemed. First, for happy, content, and bored instructors, participants were just as accurate in rating emotional tone based on voice only as with voice plus onscreen agent. This supports the voice hypothesis, which posits that voice is a powerful source of social-emotional information. Second, participants rated happy and content instructors higher on happy and content scales and rated frustrated and bored instructors higher on frustrated and bored scales. This supports the positivity hypothesis, which posits that people are particularly sensitive to the positive or negative tone of multimedia instructional messages.
  5. Multivariate pattern analysis (MVPA) of functional magnetic resonance imaging (fMRI) data has critically advanced the neuroanatomical understanding of affect processing in the human brain. Central to these advancements is the brain state, a temporally-succinct fMRI-derived pattern of neural activation, which serves as a processing unit. Establishing the brain state’s central role in affect processing, however, requires that it predicts multiple independent measures of affect. We employed MVPA-based regression to predict the valence and arousal properties of visual stimuli sampled from the International Affective Picture System (IAPS) along with the corollary skin conductance response (SCR) for demographically diverse healthy human participants (n = 19). We found that brain states significantly predicted the normative valence and arousal scores of the stimuli as well as the attendant individual SCRs. In contrast, SCRs significantly predicted arousal only. The prediction effect size of the brain state was more than three times greater than that of SCR. Moreover, neuroanatomical analysis of the regression parameters found remarkable agreement with regions long-established by fMRI univariate analyses in the emotion processing literature. Finally, geometric analysis of these parameters also found that the neuroanatomical encodings of valence and arousal are orthogonal as originally posited by the circumplex model of dimensional emotion.