NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers

https://doi.org/10.1016/j.specom.2021.10.003

Cohn, Michelle; Predeck, Kristin; Sarian, Melina; Zellou, Georgia (December 2021, Speech Communication)
null (Ed.)
Full Text Available
Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners’ Autistic-Like Traits

https://doi.org/10.21437/Interspeech.2020-1339

Cohn, Michelle; Sarian, Melina; Predeck, Kristin; Zellou, Georgia (October 2020, Proceedings of Interspeech)
null (Ed.)
More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics.
more » « less
Full Text Available
Differences in Gradient Emotion Perception: Human vs. Alexa Voices

https://doi.org/10.21437/Interspeech.2020-1938

Cohn, Michelle; Raveh, Eran; Predeck, Kristin; Gessinger, Iona; Möbius, Bernd; Zellou, Georgia (October 2020, Proceedings of Interspeech)
null (Ed.)
The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing ‘happiness’ (0 %, 33 %, 66% ‘happier’). On each trial, listeners (native speakers of American English, n=99) rated a given sentence on two scales to assess dimensions of emotion: valence (negative-positive) and arousal (calm-excited). Participants also rated the Alexa voice on several parameters to assess anthropomorphism (e.g., naturalness, human-likeness, etc.). Results showed that the emotion manipulations led to increases in perceived positive valence and excitement. Yet, the effect differed by interlocutor: increasing ‘happiness’ manipulations led to larger changes for the human voice than the Alexa voice. Additionally, we observed individual differences in perceived valence/arousal based on participants’ anthropomorphism scores. Overall, this line of research can speak to theories of computer personification and elucidate our changng relationship with voice-AI technology.
more » « less
Full Text Available

Search for: All records