The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice. more »« less
More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics.
Chen, Chao; Mahar, Rohit; Merritt, Matthew E.; Denlinger, David L.; Hahn, Daniel A.
(, Proceedings of the National Academy of Sciences)
Metabolic suppression is a hallmark of animal dormancy that promotes overall energy savings. Some diapausing insects and some mammalian hibernators have regular cyclic patterns of substantial metabolic depression alternating with periodic arousal where metabolic rates increase dramatically. Previous studies, largely in mammalian hibernators, have shown that periodic arousal is driven by an increase in aerobic mitochondrial metabolism and that many molecules related to energy metabolism fluctuate predictably across periodic arousal cycles. However, it is still not clear how these rapid metabolic shifts are regulated. We first found that diapausing flesh fly pupae primarily use anaerobic glycolysis during metabolic depression but engage in aerobic respiration through the tricarboxylic acid cycle during periodic arousal. Diapausing pupae also clear anaerobic by-products and regenerate many metabolic intermediates depleted in metabolic depression during arousal, consistent with patterns in mammalian hibernators. We found that decreased levels of reactive oxygen species (ROS) induced metabolic arousal and elevated ROS extended the duration of metabolic depression. Our data suggest ROS regulates the timing of metabolic arousal by changing the activity of two critical metabolic enzymes, pyruvate dehydrogenase and carnitine palmitoyltransferase I by modulating the levels of hypoxia inducible transcription factor (HIF) and phosphorylation of adenosine 5′-monophosphate-activated protein kinase (AMPK). Our study shows that ROS signaling regulates periodic arousal in our insect diapasue system, suggesting the possible importance ROS for regulating other types of of metabolic cycles in dormancy as well.
Zellou, Georgia; Cohn, Michelle; Ferenc Segedin, Bruno
(, Frontiers in Communication)
null
(Ed.)
Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction.
Over the past decade, the growth of voice assistants has presented new challenges within domestic life. Prior research has shown that such technologies affect users unevenly or fail to account for some relationships and domesticities entirely. Our work investigates the under-examined topic of queerness at home. Drawing on the experience of queer breakup, we describe a design inquiry and a first-person research approach exploring two concurrent relationships in separate households both using Alexa. We explore issues of temporality, glitch, and shared accountability. We also ask critical questions with audio experiments, including: How do voice assistants differentiate between queer voices? How should we converse with voice assistants about queerness? And are voice assistants “queer enough”? We contribute a discussion of difference, inclusion, and queer cultures of adversarial use to highlight the limitations of both everyday and professional language for describing and analyzing the particulars of queerness and interaction design.
Language in autism is heterogeneous, with a significant proportion of individuals having structural language difficulties and inclusion of language impairment as a specifier under Diagnostic and Statistical Manual of Mental Disorders (5th ed.) criteria for autism. This systematic review asked: What are the reporting patterns of variables pertaining to structural language in autism prior to and after publication of the Diagnostic and Statistical Manual of Mental Disorders (5th ed.)? What norm-referenced assessments does research use to characterize the language abilities of autistic individuals with respect to language impairment? This preregistered review (PROSPERO: CRD42021260394) followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Searches took place in September 2022 and included Linguistics and Language Behavior Abstracts, PsycINFO, PubMed, and the Directory of Open Access Journals. Search terms included three essential concepts: autism, language, and age. Two coders independently screened and evaluated articles. Searches yielded 57 qualifying studies, with mostly consistent reporting practices prior to and after the Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Studies varied in how they defined language groups and in what norm-referenced measures they used. Interpreting research on structural language in autism requires attention to diagnostic and grouping criteria. Although inconsistency in reporting in original studies limited this review, better understanding the available information on structural language in autistic individuals aged 3–21 years may support identification of language needs. Lay abstractUnder the Diagnostic and Statistical Manual of Mental Disorders (5th ed.), language impairment can co-occur with autism. It is not yet clear how research defines, reports, and characterizes structural language abilities of autistic individuals eligible for school-based special education services (aged 3–21 years) in the United States. In the United States, students typically must be formally diagnosed to be eligible for services and supports. However, the quality of diagnosis is only as good as the research evidence on which diagnosis depends. To evaluate evidence quality, we examined how studies of school-aged autistic individuals report assessments of language ability. This systematic review included 57 studies using English language age-referenced assessments used to measure structural language. Findings showed many differences across studies in how language abilities were measured and reported. Also, none of the studies fully reported the variables relevant to characterizing language impairment. Outcomes were similar across versions of the Diagnostic and Statistical Manual of Mental Disorders. Findings indicate that researchers and clinicians should pay attention to reporting diagnostic and grouping criteria. Carefully interpreting research evidence is critical for ensuring that diagnostic criteria and supports are representative of and accessible to autistic individuals and relevant parties.
Kreiman, Jody. Information conveyed by voice quality. Retrieved from https://par.nsf.gov/biblio/10491469. The Journal of the Acoustical Society of America 155.2 Web. doi:10.1121/10.0024609.
Kreiman, Jody. Information conveyed by voice quality. The Journal of the Acoustical Society of America, 155 (2). Retrieved from https://par.nsf.gov/biblio/10491469. https://doi.org/10.1121/10.0024609
@article{osti_10491469,
place = {Country unknown/Code not available},
title = {Information conveyed by voice quality},
url = {https://par.nsf.gov/biblio/10491469},
DOI = {10.1121/10.0024609},
abstractNote = {The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice.},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {2},
publisher = {The Acoustical Society of America},
author = {Kreiman, Jody},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.