skip to main content


Title: Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners’ Autistic-Like Traits
More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics.  more » « less
Award ID(s):
1911855
NSF-PAR ID:
10275342
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of Interspeech
Page Range / eLocation ID:
1813 to 1817
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction. 
    more » « less
  2. null (Ed.)
    The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing ‘happiness’ (0 %, 33 %, 66% ‘happier’). On each trial, listeners (native speakers of American English, n=99) rated a given sentence on two scales to assess dimensions of emotion: valence (negative-positive) and arousal (calm-excited). Participants also rated the Alexa voice on several parameters to assess anthropomorphism (e.g., naturalness, human-likeness, etc.). Results showed that the emotion manipulations led to increases in perceived positive valence and excitement. Yet, the effect differed by interlocutor: increasing ‘happiness’ manipulations led to larger changes for the human voice than the Alexa voice. Additionally, we observed individual differences in perceived valence/arousal based on participants’ anthropomorphism scores. Overall, this line of research can speak to theories of computer personification and elucidate our changng relationship with voice-AI technology. 
    more » « less
  3. null (Ed.)
    Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon’s Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted, interactive turns between a participant and a model talker (pre-recorded from either a natural production or voice-AI): First, participants produced target words in a carrier phrase. Then, a model talker responded with an utterance containing the target word. The interlocutor responses varied by 1) communicative affect (social) and 2) correctness (functional). Finally, participants repeated the carrier phrase. Degree of phonetic alignment was assessed acoustically between the target word in the model’s response and participants’ response. Results indicate that social and functional factors distinctly mediate alignment toward AI and humans. Findings are discussed with reference to theories of alignment and human-computer interaction. 
    more » « less
  4. Abstract  
    more » « less
  5. A hallmark of human intelligence is the ability to understand and influence other minds. Humans engage in inferential social learning (ISL) by using commonsense psychology to learn from others and help others learn. Recent advances in artificial intelligence (AI) are raising new questions about the feasibility of human–machine interactions that support such powerful modes of social learning. Here, we envision what it means to develop socially intelligent machines that can learn, teach, and communicate in ways that are characteristic of ISL. Rather than machines that simply predict human behaviours or recapitulate superficial aspects of human sociality (e.g. smiling, imitating), we should aim to build machines that can learn from human inputs and generate outputs for humans by proactively considering human values, intentions and beliefs. While such machines can inspire next-generation AI systems that learn more effectively from humans (as learners) and even help humans acquire new knowledge (as teachers), achieving these goals will also require scientific studies of its counterpart: how humans reason about machine minds and behaviours. We close by discussing the need for closer collaborations between the AI/ML and cognitive science communities to advance a science of both natural and artificial intelligence. This article is part of a discussion meeting issue ‘Cognitive artificial intelligence’. 
    more » « less