skip to main content

Search for: All records

Creators/Authors contains: "Zellou, Georgia"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available January 1, 2023
  2. Free, publicly-accessible full text available December 1, 2022
  3. Two studies investigated the influence of conversational role on phonetic imitation toward human and voice-AI interlocutors. In a Word List Task, the giver instructed the receiver on which of two lists to place a word; this dialogue task is similar to simple spoken interactions users have with voice-AI systems. In a Map Task, participants completed a fill-in-the-blank worksheet with the interlocutors, a more complex interactive task. Participants completed the task twice with both interlocutors, once as giver-of-information and once as receiver-of-information. Phonetic alignment was assessed through similarity rating, analysed using mixed effects logistic regressions. In the Word List Task, participants aligned to a greater extent toward the human interlocutor only. In the Map Task, participants as giver only aligned more toward the human interlocutor. Results indicate that phonetic alignment is mediated by the type of interlocutor and that the influence of conversational role varies across tasks and interlocutors.
  4. This paper investigates users’ speech rate adjustments during conversations with an Amazon Alexa socialbot in response to situational (in-lab vs. at-home) and communicative (ASR comprehension errors) factors. We collected user interaction studies and measured speech rate at each turn in the conversation and in baseline productions (collected prior to the interaction). Overall, we find that users slow their speech rate when talking to the bot, relative to their pre-interaction productions, consistent with hyperarticulation. Speakers use an even slower speech rate in the in-lab setting (relative to at-home). We also see evidence for turn-level entrainment: the user follows the directionality of Alexa’s changes in rate in the immediately preceding turn. Yet, we do not see differences in hyperarticulation or entrainment in response to ASR errors, or on the basis of user ratings of the interaction. Overall, this work has implications for human-computer interaction and theories of linguistic adaptation and entrainment.
  5. Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction.
  6. Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon’s Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted, interactive turns between a participant and a model talker (pre-recorded from either a natural production or voice-AI): First, participants produced target words in a carrier phrase. Then, a model talker responded with an utterance containing the target word. The interlocutor responses varied by 1) communicative affect (social) and 2) correctness (functional). Finally, participants repeated the carrier phrase. Degree of phonetic alignment was assessed acoustically between the target word in the model’s response and participants’ response. Results indicate that social and functional factors distinctly mediate alignment toward AI and humans. Findings are discussed with reference to theories of alignment and human-computer interaction.
  7. This study tests speech-in-noise perception and social ratings of speech produced by different text-to-speech (TTS) synthesis methods. We used identical speaker training datasets for a set of 4 voices (using AWS Polly TTS), generated using neural and concatenative TTS. In Experiment 1, listeners identified target words in semantically predictable and unpredictable sentences in concatenative and neural TTS at two noise levels (-3 dB, -6 dB SNR). Correct word identification was lower for neural TTS than for concatenative TTS, in the lower SNR, and for semantically unpredictable sentences. In Experiment 2, listeners rated the voices on 4 social attributes. Neural TTS was rated as more human-like, natural, likeable, and familiar than concatenative TTS. Furthermore, how natural listeners rated the neural TTS voice was positively related to their speech-in-noise accuracy. Together, these findings show that the TTS method influences both intelligibility and social judgments of speech — and that these patterns are linked. Overall, this work contributes to our understanding of the of the nexus of speech technology and human speech perception.
  8. More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics.