skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Age- and Gender-Related Differences in Speech Alignment Toward Humans and Voice-AI
Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction.  more » « less
Award ID(s):
1911855
PAR ID:
10275348
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Frontiers in Communication
Volume:
5
ISSN:
2297-900X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics. 
    more » « less
  2. This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568–1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered. 
    more » « less
  3. Abstract Listening to music is an enjoyable behaviour that engages multiple networks of brain regions. As such, the act of music listening may offer a way to interrogate network activity, and to examine the reconfigurations of brain networks that have been observed in healthy aging. The present study is an exploratory examination of brain network dynamics during music listening in healthy older and younger adults. Network measures were extracted and analyzed together with behavioural data using a combination of hidden Markov modelling and partial least squares. We found age- and preference-related differences in fMRI data collected during music listening in healthy younger and older adults. Both age groups showed higher occupancy (the proportion of time a network was active) in a temporal-mesolimbic network while listening to self-selected music. Activity in this network was strongly positively correlated with liking and familiarity ratings in younger adults, but less so in older adults. Additionally, older adults showed a higher degree of correlation between liking and familiarity ratings consistent with past behavioural work on age-related dedifferentiation. We conclude that, while older adults do show network and behaviour patterns consistent with dedifferentiation, activity in the temporal-mesolimbic network is relatively robust to dedifferentiation. These findings may help explain how music listening remains meaningful and rewarding in old age. 
    more » « less
  4. null (Ed.)
    Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon’s Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted, interactive turns between a participant and a model talker (pre-recorded from either a natural production or voice-AI): First, participants produced target words in a carrier phrase. Then, a model talker responded with an utterance containing the target word. The interlocutor responses varied by 1) communicative affect (social) and 2) correctness (functional). Finally, participants repeated the carrier phrase. Degree of phonetic alignment was assessed acoustically between the target word in the model’s response and participants’ response. Results indicate that social and functional factors distinctly mediate alignment toward AI and humans. Findings are discussed with reference to theories of alignment and human-computer interaction. 
    more » « less
  5. The impact of educators in informal science learning sites (ISLS) remains understudied from the perspective of youth visitors. Less is known about whether engagement with educators differs based on the age and gender of both visitor and educator. Here, visitors (5–17 years old) to six ISLS in the United States and United Kingdom (n¼488, female n¼244) were surveyed following an interaction with either a youth (14–18 -years old) or adult educator (19þ years old). For participants who reported lower interest in the exhibit, more educator engagement was related to greater self-reported learning. Younger children and adolescents reported more engagement with an adult educator, whereas engagement in middle childhood did not differ based on educator age. Participants in middle childhood showed a trend toward answering more conceptual knowledge questions correctly following an interaction with a youth educator. Together, these findings emphasize the promise of tailoring educator experiences to visitor demographics. 
    more » « less