skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Bears don’t always mess with beers: Limits on generalization of statistical learning in speech
Abstract Perception changes rapidly and implicitly as a function of passive exposure to speech that samples different acoustic distributions. Past research has shown that this statistical learning generalizes across talkers and, to some extent, new items, but these studies involved listeners’ active engagement in processing statistics-bearing stimuli. In this study, we manipulated the relationship between voice onset time (VOT) and fundamental frequency (F0) to establish distributional regularities either aligned with American English or reversed to create a subtle foreign accent. We then tested whether statistical learning across passive exposure to these distributions generalized to new items never experienced in the accent. Experiment 1 showed statistical learning across passive exposure but no generalization of learning when exposure and test items shared the same initial consonant but differed in vowels (bear/pear → beer/pier) or when they differed in initial consonant but shared distributional regularities across VOT and F0 dimensions (deer/tear → beer/pier). Experiment 2 showed generalization to stimuli that shared the statistics-bearing phoneme (bear/pear → beer/pier), but only when the response set included tokens from both exposure and generalization stimuli. Moreover, statistical learning transferred to influence the subtle acoustics of listeners’ own speech productions but did not generalize to influence productions of stimuli not heard in the accent. In sum, passive exposure is thus sufficient to support statistical learning and its generalization, but task demands modulate this dynamic. Moreover, production does not simply mirror perception: generalization in perception was not accompanied by transfer to production.  more » « less
Award ID(s):
1950054 2346989
PAR ID:
10611898
Author(s) / Creator(s):
; ;
Publisher / Repository:
Psychonomic Society
Date Published:
Journal Name:
Psychonomic Bulletin & Review
ISSN:
1069-9384
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions. 
    more » « less
  2. In speech perception, when a primary acoustic cue (e.g., VOT) is ambiguous, listeners may increase the weight of a secondary cue (e.g., F0). In experiment 1, we compared the cue-weighting adjustment strategies across younger and older normal-hearing adults with a distributional learning paradigm. Two groups of native English listeners were exposed to voicing contrasts that were ambiguous in either VOT or F0. Additionally, listeners may access lexical information to help resolve the ambiguity in the acoustic signal. Older listeners have been reported to use lexical information to a greater degree than younger listeners. In experiment 2, using a lexically guided learning paradigm, we tested if younger and older adults differ in their use of lexical information when learning to interpret ambiguous acoustic tokens. There were four types of exposure, in which stimuli differed in lexical status (day-*tay; *doy-toy) and the acoustic ambiguity involved either only VOT or both VOT and F0. Preliminary results from younger normal-hearing, listeners showed significant speech adaptation effects, with a significant change in cue weights in distributional learning and salient lexical bias in lexically guided learning. More data will be collected from older adults to assess the extent of perceptual learning relative to younger adults. 
    more » « less
  3. This dissertation compared speech perception across younger and older normal-hearing adults. We ask four research questions to assess acoustic cue weighting and the role of contextual information (lexical information and speaking rate) in speech perception. Experiment 1 tested for age-related changes in cue-weighting. The absolute weights showed that older listeners relied on both VOT and F0 more than younger listeners, and some listeners’ reliance on VOT correlated with inhibitory control when perceiving the /d/-/t/ contrast. The relative weights suggested that older listeners relied on VOT less and F0 more than younger listeners. Experiment 2 tested whether younger and older listeners used different cue-weighting adjustment strategies in distributional learning. Both older and younger listeners adjusted their reliance on acoustic cues when the primary acoustic cue (VOT) became ambiguous. Older listeners adjusted F0 to a greater degree than younger listeners, while younger listeners adjusted VOT more than older listeners. With a lexically-guided learning paradigm, Experiment 3 explored if younger and older adults differed in their use of lexical information when learning to map acoustic tokens that were ambiguous. Older and younger listeners utilized the lexical context to the same extent. In Experiment 4, the contextual effect of speaking rate was examined by embedding voicing contrasts in short and long syllables and presenting these syllables to younger and older listeners. Older and younger listeners compensated for variation in speaking rate in a similar manner as younger listeners. The findings in perceptual learning demonstrate perceptual flexibility among normal-hearing older listeners, despite an assumed decline in temporal processing. 
    more » « less
  4. Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance. 
    more » « less
  5. Normal-hearing older listeners are as accurate as younger listeners when perceiving native English words in quiet despite challenges in temporal processing. Older listeners may compensate for the declined use of fine-grained temporal cues by reducing the weight of temporal cues (VOT) and increase the reliance on other acoustic correlates (F0) of the sound contrast. In Experiment 1, younger (age 18–25) and older (age 55–65) normal-hearing listeners participate in an online 2AFC identification task with /d/-/t/ contrast varying in both VOT and F0. We predict that, while both younger and older listeners rely more on VOT than on F0, older listeners, because of their reduced temporal processing abilities, rely on F0 to a larger degree than younger listeners. Temporal processing not only involves local durational cues of the target segments, but also global contextual cues such as speaking rate. In Experiment 2, the same listeners complete another online 2AFC identification task with /dɑ/-/tɑ/ syllables that vary in VOT and vowel duration (short versus long). We predict that older listeners exhibit a smaller shift in the /d/-/t/ category boundary between the long and short vowel durations than younger listeners since older adults are less sensitive to contextual temporal information. 
    more » « less