NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Variation in the production of nasal coarticulation by speaker age and speech style

https://doi.org/10.1121/10.0036227

Zellou, Georgia; Cohn, Michelle (March 2025, JASA Express Letters)

This study investigates apparent-time variation in the production of anticipatory nasal coarticulation in California English. Productions of consonant-vowel-nasal words in clear vs casual speech by 58 speakers aged 18–58 (grouped into three generations) were analyzed for degree of coarticulatory vowel nasality. Results reveal an interaction between age and style: the two younger speaker groups produce greater coarticulation (measured as A1-P0) in clear speech, whereas older speakers produce less variable coarticulation across styles. Yet, duration lengthening in clear speech is stable across ages. Thus, age- and style-conditioned changes in produced coarticulation interact as part of change in coarticulation grammars over time.
more » « less
Comparing human and machine's use of coarticulatory vowel nasalization for linguistic classification

https://doi.org/10.1121/10.0027932

Zellou, Georgia; Kim, Lila; Gendrot, Cédric (July 2024, The Journal of the Acoustical Society of America)

Anticipatory coarticulation is a highly informative cue to upcoming linguistic information: listeners can identify that the word is ben and not bed by hearing the vowel alone. The present study compares the relative performances of human listeners and a self-supervised pre-trained speech model (wav2vec 2.0) in the use of nasal coarticulation to classify vowels. Stimuli consisted of nasalized (from CVN words) and non-nasalized (from CVCs) American English vowels produced by 60 humans and generated in 36 TTS voices. wav2vec 2.0 performance is similar to human listener performance, in aggregate. Broken down by vowel type: both wav2vec 2.0 and listeners perform higher for non-nasalized vowels produced naturally by humans. However, wav2vec 2.0 shows higher correct classification performance for nasalized vowels, than for non-nasalized vowels, for TTS voices. Speaker-level patterns reveal that listeners' use of coarticulation is highly variable across talkers. wav2vec 2.0 also shows cross-talker variability in performance. Analyses also reveal differences in the use of multiple acoustic cues in nasalized vowel classifications across listeners and the wav2vec 2.0. Findings have implications for understanding how coarticulatory variation is used in speech perception. Results also can provide insight into how neural systems learn to attend to the unique acoustic features of coarticulation.
more » « less
Full Text Available
Children and adults produce distinct technology- and human-directed speech

https://doi.org/10.1038/s41598-024-66313-5

Cohn, Michelle; Barreda, Santiago; Graf_Estes, Katharine; Yu, Zhou; Zellou, Georgia (July 2024, Scientific Reports)

Abstract This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether “Alexa seems like a real person or not”, further indicating that children’s conceptualization of the system’s competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human–computer interaction frameworks, providing support for routinized theories of spoken interaction with technology.
more » « less
Apparent-time variation in the use of multiple cues for perception of anticipatory nasal coarticulation in California English

https://doi.org/10.16995/glossa.10831

Zellou, Georgia; Cohn, Michelle (January 2024, Glossa: a journal of general linguistics)

This study examines apparent-time variation in the use of multiple acoustic cues present on coarticulatorily nasalized vowels in California English. Eighty-nine listeners ranging in age from 18-58 (grouped into 3 apparent-time categories based on year of birth) performed lexical identifications on syllables excised from words with oral and nasal codas from six speakers who produced either minimal (n=3) or extensive (n=3) anticipatory nasal coarticulation (realized by greater vowel nasalization, F1 bandwidth, and diphthongization on vowels in CVN contexts). Results showed no differences across listeners’ identification for Extensively coarticulated vowels, as well as oral vowels by both types of speakers (all at-ceiling). Yet, performance for the Minimal Coarticulators’ nasalized vowels was lowest for the older listener group and increased over apparent-time. Perceptual cue-weighting analyses revealed that older listeners rely more on F1 bandwidth, while younger listeners rely more on acoustic nasality, as coarticulatory cues providing information about lexical identity. Thus, there is evidence for variation in apparent- time in the use of the different coarticulatory cues present on vowels. Younger listeners’ cue weighting allows them flexibility to identify lexical items given a range of coarticulatory variation across (here, younger) speakers, while older listeners’ cue weighting leads to reduced performance for talkers producing innovative phonetic forms. This study contributes to our understanding of the relationship between multidimensional acoustic features resulting from coarticulation and the perceptual re-weighting of cues that can lead to sound change over time.
more » « less
Full Text Available
Selective tuning of nasal coarticulation and hyperarticulation across slow-clear, casual, and fast-clear speech styles

https://doi.org/10.1121/10.0023841

Cohn, Michelle; Zellou, Georgia (December 2023, JASA Express Letters)

This study investigates how California English speakers adjust nasal coarticulation and hyperarticulation on vowels across three speech styles: speaking slowly and clearly (imagining a hard-of-hearing addressee), casually (imagining a friend/family member addressee), and speaking quickly and clearly (imagining being an auctioneer). Results show covariation in speaking rate and vowel hyperarticulation across the styles. Additionally, results reveal that speakers produce more extensive anticipatory nasal coarticulation in the slow-clear speech style, in addition to a slower speech rate. These findings are interpreted in terms of accounts of coarticulation in which speakers selectively tune their production of nasal coarticulation based on the speaking style.
more » « less
Acoustic-phonetic properties of Siri- and human-directed speech

https://doi.org/10.1016/j.wocn.2021.101123

Cohn, Michelle; Segedin, Bruno Ferenc; Zellou, Georgia (January 2022, Journal of Phonetics)

Full Text Available
Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers

https://doi.org/10.1016/j.specom.2021.10.003

Cohn, Michelle; Predeck, Kristin; Sarian, Melina; Zellou, Georgia (December 2021, Speech Communication)
null (Ed.)
Full Text Available
Prosodic differences in human- and Alexa-directed speech, but similar local intelligibility adjustments

https://doi.org/10.3389/fcomm.2021.675704

Cohn, Michelle; Zellou, Georgia (July 2021, Frontiers in communication)
Siegert, Ingo (Ed.)
Full Text Available
The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors

https://doi.org/10.1080/23273798.2021.1931372

Zellou, Georgia; Cohn, Michelle; Kline, Tyler (May 2021, Language, Cognition and Neuroscience)
null (Ed.)
Two studies investigated the influence of conversational role on phonetic imitation toward human and voice-AI interlocutors. In a Word List Task, the giver instructed the receiver on which of two lists to place a word; this dialogue task is similar to simple spoken interactions users have with voice-AI systems. In a Map Task, participants completed a fill-in-the-blank worksheet with the interlocutors, a more complex interactive task. Participants completed the task twice with both interlocutors, once as giver-of-information and once as receiver-of-information. Phonetic alignment was assessed through similarity rating, analysed using mixed effects logistic regressions. In the Word List Task, participants aligned to a greater extent toward the human interlocutor only. In the Map Task, participants as giver only aligned more toward the human interlocutor. Results indicate that phonetic alignment is mediated by the type of interlocutor and that the influence of conversational role varies across tasks and interlocutors.
more » « less
Full Text Available
Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech

https://doi.org/10.1016/j.cognition.2020.104570

Cohn, Michelle; Pycha, Anne; Zellou, Georgia (May 2021, Cognition)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records