skip to main content


Title: Integration of fundamental frequency and voice-onset-time to voicing categorization: Listeners with normal hearing and bimodal hearing configurations

This study investigates the integration of word-initial fundamental frequency (F0) and voice-onset-time (VOT) in stop voicing categorization for adult listeners with normal hearing (NH) and unilateral cochlear implant (CI) recipients utilizing a bimodal hearing configuration [CI + contralateral hearing aid (HA)]. Categorization was assessed for ten adults with NH and ten adult bimodal listeners, using synthesized consonant stimuli interpolating between /ba/ and /pa/ exemplars with five-step VOT and F0 conditions. All participants demonstrated the expected categorization pattern by reporting /ba/ for shorter VOTs and /pa/ for longer VOTs, with NH listeners showing more use of VOT as a voicing cue than CI listeners in general. When VOT becomes ambiguous between voiced and voiceless stops, NH users make more use of F0 as a cue to voicing than CI listeners, and CI listeners showed greater utilization of initial F0 during voicing identification in their bimodal (CI + HA) condition than in the CI-alone condition. The results demonstrate the adjunctive benefit of acoustic hearing from the non-implanted ear for listening conditions involving spectrotemporally complex stimuli. This finding may lead to the development of a clinically feasible perceptual weighting task that could inform clinicians about bimodal efficacy and the risk-benefit profile associated with bilateral CI recommendation.

 
more » « less
NSF-PAR ID:
10400668
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Acoustical Society of America (ASA)
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
153
Issue:
3
ISSN:
0001-4966
Page Range / eLocation ID:
p. 1580-1590
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance. 
    more » « less
  2. Abstract Research Highlights

    Children and adults conceptually and perceptually categorize speech and song from age 4.

    Listeners use F0 instability, harmonicity, spectral flux, and utterance duration to determine whether vocal stimuli sound like song.

    Acoustic cue weighting changes with age, becoming adult‐like at age 8 for perceptual categorization and at age 12 for conceptual differentiation.

    Young children are still learning to categorize speech and song, which leaves open the possibility that music‐ and language‐specific skills are not so domain‐specific.

     
    more » « less
  3. Purpose: The goal of this study was to assess the listening behavior and social engagement of cochlear implant (CI) users and normal-hearing (NH) adults in daily life and relate these actions to objective hearing outcomes. Method: Ecological momentary assessments (EMAs) collected using a smartphone app were used to probe patterns of listening behavior in CI users and age-matched NH adults to detect differences in social engagement and listening behavior in daily life. Participants completed very short surveys every 2 hr to provide snapshots of typical, everyday listening and socializing, as well as longer, reflective surveys at the end of the day to assess listening strategies and coping behavior. Speech perception testing, with accompanying ratings of task difficulty, was also performed in a lab setting to uncover possible correlations between objective and subjective listening behavior. Results: Comparisons between speech intelligibility testing and EMA responses showed poorer performing CI users spending more time at home and less time conversing with others than higher performing CI users and their NH peers. Perception of listening difficulty was also very different for CI users and NH listeners, with CI users reporting little difficulty despite poor speech perception performance. However, both CI users and NH listeners spent most of their time in listening environments they considered “not difficult.” CI users also reported using several compensatory listening strategies, such as visual cues, whereas NH listeners did not. Conclusion: Overall, the data indicate systematic differences between how individual CI users and NH adults navigate and manipulate listening and social environments in everyday life. 
    more » « less
  4. Abstract

    Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions.

     
    more » « less
  5. Period-doubled voice consists of two alternating periods with multiple frequencies and is often perceived as rough with an indeterminate pitch. Past pitch-matching studies in period-doubled voice found that the perceived pitch was lower as the degree of amplitude and frequency modulation between the two alternating periods increased. The perceptual outcome also differed across f0s and modulation types: a lower f0 prompted earlier identification of a lower pitch, and the matched pitch dropped more quickly in frequency- than amplitude-modulated tokens (Sun & Xu, 2002; Bergan & Titze, 2001). However, it is unclear how listeners perceive period doubling when identifying linguistic tones. In an artificial language learning paradigm, this study used resynthesized stimuli with alternating amplitudes and/or frequencies of varying degrees, based on a production study of period-doubled voice (Huang, 2022). Listeners were native speakers of English and Mandarin. We confirm the positive relationship between the modulation degree and the proportion of low tones heard, and find that frequency modulation biased listeners to choose more low-tone options than amplitude modulation. However, a higher f0 (300 Hz) leads to a low-tone percept in more amplitude-modulated tokens than a lower f0 (200 Hz). Both English and Mandarin listeners behaved similarly, suggesting that pitch perception during period doubling is not language-specific. Furthermore, period doubling is predicted to signal low tones in languages, even when the f0 is high. 
    more » « less