skip to main content


Title: Revisiting the left ear advantage for phonetic cues to talker identification
Previous research suggests that learning to use a phonetic property [e.g., voice-onset-time, (VOT)] for talker identity supports a left ear processing advantage. Specifically, listeners trained to identify two “talkers” who only differed in characteristic VOTs showed faster talker identification for stimuli presented to the left ear compared to that presented to the right ear, which is interpreted as evidence of hemispheric lateralization consistent with task demands. Experiment 1 ( n =  97) aimed to replicate this finding and identify predictors of performance; experiment 2 ( n =  79) aimed to replicate this finding under conditions that better facilitate observation of laterality effects. Listeners completed a talker identification task during pretest, training, and posttest phases. Inhibition, category identification, and auditory acuity were also assessed in experiment 1. Listeners learned to use VOT for talker identity, which was positively associated with auditory acuity. Talker identification was not influenced by ear of presentation, and Bayes factors indicated strong support for the null. These results suggest that talker-specific phonetic variation is not sufficient to induce a left ear advantage for talker identification; together with the extant literature, this instead suggests that hemispheric lateralization for talker-specific phonetic variation requires phonetic variation to be conditioned on talker differences in source characteristics.  more » « less
Award ID(s):
1827591
PAR ID:
10387943
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
152
Issue:
5
ISSN:
0001-4966
Page Range / eLocation ID:
3107 to 3123
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.

     
    more » « less
  2. Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance. 
    more » « less
  3. This study investigates the integration of word-initial fundamental frequency (F0) and voice-onset-time (VOT) in stop voicing categorization for adult listeners with normal hearing (NH) and unilateral cochlear implant (CI) recipients utilizing a bimodal hearing configuration [CI + contralateral hearing aid (HA)]. Categorization was assessed for ten adults with NH and ten adult bimodal listeners, using synthesized consonant stimuli interpolating between /ba/ and /pa/ exemplars with five-step VOT and F0 conditions. All participants demonstrated the expected categorization pattern by reporting /ba/ for shorter VOTs and /pa/ for longer VOTs, with NH listeners showing more use of VOT as a voicing cue than CI listeners in general. When VOT becomes ambiguous between voiced and voiceless stops, NH users make more use of F0 as a cue to voicing than CI listeners, and CI listeners showed greater utilization of initial F0 during voicing identification in their bimodal (CI + HA) condition than in the CI-alone condition. The results demonstrate the adjunctive benefit of acoustic hearing from the non-implanted ear for listening conditions involving spectrotemporally complex stimuli. This finding may lead to the development of a clinically feasible perceptual weighting task that could inform clinicians about bimodal efficacy and the risk-benefit profile associated with bilateral CI recommendation.

     
    more » « less
  4. Abstract

    Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known ascontextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disruptauditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.

     
    more » « less
  5. This study uses a response mouse-tracking paradigm to examine the role of sub-phonemic information in online lexical ambiguity resolution of continuous speech. We examine listeners’ sensitivity to the sub-phonemic information that is specific to the ambiguous internal open juncture /s/-stop sequences in American English (e.g., “ place kin” vs. “ play skin”), that is, voice onset time (VOT) indicating different degrees of aspiration (e.g., long VOT for “ k in” vs. short VOT for “ s k in”) in connected speech contexts. A cross-splicing method was used to create two-word sequences (e.g., “ place kin” or “ play skin”) with matching VOTs (long for “ k in”; short for “ s k in”) or mismatching VOTs ( short for “ k in”; long for “ s k in”). Participants ( n = 20) heard the two-word sequences, while looking at computer displays with the second word in the left/right corner (“ KIN” and “ SKIN”). Then, listeners’ click responses and mouse movement trajectories were recorded. Click responses show significant effects of VOT manipulation, while mouse trajectories do not. Our results show that stop-release information, whether temporal or spectral, can (mis)guide listeners’ interpretation of the possible location of a word boundary between /s/ and a following stop, even when other aspects in the acoustic signal (e.g., duration of /s/) point to the alternative segmentation. Taken together, our results suggest that segmentation and lexical access are highly attuned to bottom-up phonetic information; our results have implications for a model of spoken language recognition with position-specific representations available at the prelexical level and also allude to the possibility that detailed phonetic information may be stored in the listeners’ lexicons.

     
    more » « less