Purpose The “bubble noise” technique has recently been introduced as a method to identify the regions in time–frequency maps (i.e., spectrograms) of speech that are especially important for listeners in speech recognition. This technique identifies regions of “importance” that are specific to the speech stimulus and the listener, thus permitting these regions to be compared across different listener groups. For example, in cross-linguistic and second-language (L2) speech perception, this method identifies differences in regions of importance in accomplishing decisions of phoneme category membership. This research note describes the application of bubble noise to the study of language learning for 3 different language pairs: Hindi English bilinguals' perception of the /v/–/w/ contrast in American English, native English speakers' perception of the tense/lax contrast for Korean fricatives and affricates, and native English speakers' perception of Mandarin lexical tone. Conclusion We demonstrate that this technique provides insight on what information in the speech signal is important for native/first-language listeners compared to nonnative/L2 listeners. Furthermore, the method can be used to examine whether L2 speech perception training is effective in bringing the listener's attention to the important cues.
more »
« less
This content will become publicly available on January 10, 2026
Incidental Nonspeech Auditory Learning Scaffolds Phonetic, Category, and Word Learning in a Foreign Language Classroom
Abstract There is considerable lab‐based evidence for successful incidental learning, in which a learner's attention is directed away from the to‐be‐learned stimulus and towards another stimulus. In this study, we extend incidental learning research into the language learning classroom. Three groups of adult second language (L2) learners (N= 52) engaged in structured classroom Mandarin learning took part in an 8‐week study. One group served as a classroom‐only control group. The second group underwent additional intentional auditory training involving Mandarin speech and explicit feedback. The third group underwent additional incidental learning combined with nonspeech “perceptual building block” categories—categories that share critical perceptual dimensions with target L2 speech categories but that are not perceived as speech. We demonstrate that when supplemented with structured classroom learning, incidental learning involving nonspeech analogs promotes phonetic, category, and word learning equivalent to learning from more traditional intentional auditory training.
more »
« less
- Award ID(s):
- 2420979
- PAR ID:
- 10611903
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Language Learning
- ISSN:
- 0023-8333
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One of the main challenges individuals face when learning an additional language (L2) is learning its sound system, which includes learning to perceive L2 sounds accurately. High variability phonetic training (HVPT) is one method that has proven highly effective at helping individuals develop robust L2 perceptual categories, and recent meta-analytic work suggests that multi-talker training conditions provide a small but statistically reliable benefit compared to single-talker training. However, no study has compared lower and higher variability multi-talker conditions to determine how the number of talkers affects training outcomes, even though such information can shed additional light on how talker variability affects phonetic training. In this study, we randomly assigned 458 L2 Spanish learners to a two-talker or six-talker HVPT group or to a control group that did not receive HVPT. Training focused on L2 Spanish stops. We tested performance on trained talkers and words as well as several forms of generalization. The experimental groups improved more and demonstrated greater generalization than the control group, but neither experimental group outpaced the other. The number of sessions experimental participants completed moderated learning gains.more » « less
-
Recent work on perceptual learning for speech has suggested that while high-variability training typically results in generalization, low-variability exposure can sometimes be sufficient for cross-talker generalization. We tested predictions of a similarity-based account, according to which, generalization depends on training-test talker similarity rather than on exposure to variability. We compared perceptual adaptation to second-language (L2) speech following single- or multiple-talker training with a round-robin design in which four L2 English talkers from four different first-language (L1) backgrounds served as both training and test talkers. After exposure to 60 L2 English sentences in one training session, cross-talker/cross-accent generalization was possible (but not guaranteed) following either multiple- or single-talker training with variation across training-test talker pairings. Contrary to predictions of the similarity-based account, adaptation was not consistently better for identical than for mismatched training-test talker pairings, and generalization patterns were asymmetrical across training-test talker pairs. Acoustic analyses also revealed a dissociation between phonetic similarity and cross-talker/cross-accent generalization. Notably, variation in adaptation and generalization related to variation in training phase intelligibility. Together with prior evidence, these data suggest that perceptual learning for speech may benefit from some combination of exposure to talker variability, training-test similarity, and high training phase intelligibility.more » « less
-
Speech recognition by both humans and machines frequently fails in non-optimal yet common situations. For example, word recognition error rates for second-language (L2) speech can be high, especially under conditions involving background noise. At the same time, both human and machine speech recognition sometimes shows remarkable robustness against signal- and noise-related degradation. Which acoustic features of speech explain this substantial variation in intelligibility? Current approaches align speech to text to extract a small set of pre-defined spectro-temporal properties from specific sounds in particular words. However, variation in these properties leaves much cross-talker variation in intelligibility unexplained. We examine an alternative approach utilizing a perceptual similarity space acquired using self-supervised learning. This approach encodes distinctions between speech samples without requiring pre-defined acoustic features or speech-to-text alignment. We show that L2 English speech samples are less tightly clustered in the space than L1 samples reflecting variability in English proficiency among L2 talkers. Critically, distances in this similarity space are perceptually meaningful: L1 English listeners have lower recognition accuracy for L2 speakers whose speech is more distant in the space from L1 speech. These results indicate that perceptual similarity may form the basis for an entirely new speech and language analysis approach.more » « less
-
null (Ed.)Category learning is fundamental to cognition, but little is known about how it proceeds in real-world environments when learners do not have instructions to search for category-relevant information, do not make overt category decisions, and do not experience direct feedback. Prior research demonstrates that listeners can acquire task-irrelevant auditory categories incidentally as they engage in primarily visuomotor tasks. The current study examines the factors that support this incidental category learning. Three experiments systematically manipulated the relationship of four novel auditory categories with a consistent visual feature (color or location) that informed a simple behavioral keypress response regarding the visual feature. In both an in-person experiment and two online replications with extensions, incidental auditory category learning occurred reliably when category exemplars consistently aligned with visuomotor demands of the primary task, but not when they were misaligned. The presence of an additional irrelevant visual feature that was uncorrelated with the primary task demands neither enhanced nor harmed incidental learning. By contrast, incidental learning did not occur when auditory categories were aligned consistently with one visual feature, but the motor response in the primary task was aligned with another, category-unaligned visual feature. Moreover, category learning did not reliably occur across passive observation or when participants made a category-nonspecific, generic motor response. These findings show that incidental learning of categories is strongly mediated by the character of coincident behavior.more » « less
An official website of the United States government
