skip to main content


Title: The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors
Two studies investigated the influence of conversational role on phonetic imitation toward human and voice-AI interlocutors. In a Word List Task, the giver instructed the receiver on which of two lists to place a word; this dialogue task is similar to simple spoken interactions users have with voice-AI systems. In a Map Task, participants completed a fill-in-the-blank worksheet with the interlocutors, a more complex interactive task. Participants completed the task twice with both interlocutors, once as giver-of-information and once as receiver-of-information. Phonetic alignment was assessed through similarity rating, analysed using mixed effects logistic regressions. In the Word List Task, participants aligned to a greater extent toward the human interlocutor only. In the Map Task, participants as giver only aligned more toward the human interlocutor. Results indicate that phonetic alignment is mediated by the type of interlocutor and that the influence of conversational role varies across tasks and interlocutors.  more » « less
Award ID(s):
1911855
NSF-PAR ID:
10275444
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Language, Cognition and Neuroscience
ISSN:
2327-3798
Page Range / eLocation ID:
1 to 15
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon’s Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted, interactive turns between a participant and a model talker (pre-recorded from either a natural production or voice-AI): First, participants produced target words in a carrier phrase. Then, a model talker responded with an utterance containing the target word. The interlocutor responses varied by 1) communicative affect (social) and 2) correctness (functional). Finally, participants repeated the carrier phrase. Degree of phonetic alignment was assessed acoustically between the target word in the model’s response and participants’ response. Results indicate that social and functional factors distinctly mediate alignment toward AI and humans. Findings are discussed with reference to theories of alignment and human-computer interaction. 
    more » « less
  2. null (Ed.)
    Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction. 
    more » « less
  3. As the influence of social robots in people’s daily lives grows, research on understanding people’s perception of robots including sociability, trust, acceptance, and preference becomes more pervasive. Research has considered visual, vocal, or tactile cues to express robots’ emotions, whereas little research has provided a holistic view in examining the interactions among different factors influencing emotion perception. We investigated multiple facets of user perception on robots during a conversational task by varying the robots’ voice types, appearances, and emotions. In our experiment, 20 participants interacted with two robots having four different voice types. While participants were reading fairy tales to the robot, the robot gave vocal feedback with seven emotions and the participants evaluated the robot’s profiles through post surveys. The results indicate that (1) the accuracy of emotion perception differed depending on presented emotions, (2) a regular human voice showed higher user preferences and naturalness, (3) but a characterized voice was more appropriate for expressing emotions with significantly higher accuracy in emotion perception, and (4) participants showed significantly higher emotion recognition accuracy with the animal robot than the humanoid robot. A follow-up study ([Formula: see text]) with voice-only conditions confirmed that the importance of embodiment. The results from this study could provide the guidelines needed to design social robots that consider emotional aspects in conversations between robots and users. 
    more » « less
  4. null (Ed.)
    Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input. 
    more » « less
  5. In a social context where two or more interlocutors interact with each other in the same space, one’s sense of copresence with the others is an important factor for the quality of communication and engagement in the interaction. Although augmented reality (AR) technology enables the superposition of virtual humans (VHs) as interlocutors in the real world, the resulting sense of copresence is usually far lower than with a real human interlocutor. In this paper, we describe a human-subject study in which we explored and investigated the effects that subtle multi-modal interaction between the virtual environment and the real world, where a VH and human participants were co-located, can have on copresence. We compared two levels of gradually increased multi-modal interaction: (i) virtual objects being affected by real airflow as commonly experienced with fans in summer, and (ii) a VH showing awareness of this airflow. We chose airflow as one example of an environmental factor that can noticeably affect both the real and virtual worlds, and also cause subtle responses in interlocutors.We hypothesized that our two levels of treatment would increase the sense of being together with the VH gradually, i.e., participants would report higher copresence with airflow influence than without it, and the copresence would be even higher when the VH shows awareness of the airflow. The statistical analysis with the participant-reported copresence scores showed that there was an improvement of the perceived copresence with the VH when both the physical–virtual interactivity via airflow and the VH’s awareness behaviors were present together. As the considered environmental factors are directed at the VH, i.e., they are not part of the direct interaction with the real human, they can provide a reasonably generalizable approach to support copresence in AR beyond the particular use case in the present experiment. 
    more » « less