%ACohn, Michelle%ARaveh, Eran%APredeck, Kristin%AGessinger, Iona%AMöbius, Bernd%AZellou, Georgia%Anull Ed.%D2020%I %K %MOSTI ID: 10275345 %PMedium: X %TDifferences in Gradient Emotion Perception: Human vs. Alexa Voices %XThe present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing ‘happiness’ (0 %, 33 %, 66% ‘happier’). On each trial, listeners (native speakers of American English, n=99) rated a given sentence on two scales to assess dimensions of emotion: valence (negative-positive) and arousal (calm-excited). Participants also rated the Alexa voice on several parameters to assess anthropomorphism (e.g., naturalness, human-likeness, etc.). Results showed that the emotion manipulations led to increases in perceived positive valence and excitement. Yet, the effect differed by interlocutor: increasing ‘happiness’ manipulations led to larger changes for the human voice than the Alexa voice. Additionally, we observed individual differences in perceived valence/arousal based on participants’ anthropomorphism scores. Overall, this line of research can speak to theories of computer personification and elucidate our changng relationship with voice-AI technology. Country unknown/Code not availablehttps://doi.org/10.21437/Interspeech.2020-1938OSTI-MSA