Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.
- Award ID(s):
- 1827591
- PAR ID:
- 10387943
- Date Published:
- Journal Name:
- The Journal of the Acoustical Society of America
- Volume:
- 152
- Issue:
- 5
- ISSN:
- 0001-4966
- Page Range / eLocation ID:
- 3107 to 3123
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance.more » « less
-
This study investigates the integration of word-initial fundamental frequency (F0) and voice-onset-time (VOT) in stop voicing categorization for adult listeners with normal hearing (NH) and unilateral cochlear implant (CI) recipients utilizing a bimodal hearing configuration [CI + contralateral hearing aid (HA)]. Categorization was assessed for ten adults with NH and ten adult bimodal listeners, using synthesized consonant stimuli interpolating between /ba/ and /pa/ exemplars with five-step VOT and F0 conditions. All participants demonstrated the expected categorization pattern by reporting /ba/ for shorter VOTs and /pa/ for longer VOTs, with NH listeners showing more use of VOT as a voicing cue than CI listeners in general. When VOT becomes ambiguous between voiced and voiceless stops, NH users make more use of F0 as a cue to voicing than CI listeners, and CI listeners showed greater utilization of initial F0 during voicing identification in their bimodal (CI + HA) condition than in the CI-alone condition. The results demonstrate the adjunctive benefit of acoustic hearing from the non-implanted ear for listening conditions involving spectrotemporally complex stimuli. This finding may lead to the development of a clinically feasible perceptual weighting task that could inform clinicians about bimodal efficacy and the risk-benefit profile associated with bilateral CI recommendation.
-
Abstract Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as
contextual tuning ). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disruptauditory attention . Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area. -
This study uses a response mouse-tracking paradigm to examine the role of sub-phonemic information in online lexical ambiguity resolution of continuous speech. We examine listeners’ sensitivity to the sub-phonemic information that is specific to the ambiguous internal open juncture /s/-stop sequences in American English (e.g., “ place kin” vs. “ play skin”), that is, voice onset time (VOT) indicating different degrees of aspiration (e.g., long VOT for “ k in” vs. short VOT for “ s k in”) in connected speech contexts. A cross-splicing method was used to create two-word sequences (e.g., “ place kin” or “ play skin”) with matching VOTs (long for “ k in”; short for “ s k in”) or mismatching VOTs ( short for “ k in”; long for “ s k in”). Participants ( n = 20) heard the two-word sequences, while looking at computer displays with the second word in the left/right corner (“ KIN” and “ SKIN”). Then, listeners’ click responses and mouse movement trajectories were recorded. Click responses show significant effects of VOT manipulation, while mouse trajectories do not. Our results show that stop-release information, whether temporal or spectral, can (mis)guide listeners’ interpretation of the possible location of a word boundary between /s/ and a following stop, even when other aspects in the acoustic signal (e.g., duration of /s/) point to the alternative segmentation. Taken together, our results suggest that segmentation and lexical access are highly attuned to bottom-up phonetic information; our results have implications for a model of spoken language recognition with position-specific representations available at the prelexical level and also allude to the possibility that detailed phonetic information may be stored in the listeners’ lexicons.