skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Generalized perceptual adaptation to second-language speech: Variability, similarity, and intelligibility
Recent work on perceptual learning for speech has suggested that while high-variability training typically results in generalization, low-variability exposure can sometimes be sufficient for cross-talker generalization. We tested predictions of a similarity-based account, according to which, generalization depends on training-test talker similarity rather than on exposure to variability. We compared perceptual adaptation to second-language (L2) speech following single- or multiple-talker training with a round-robin design in which four L2 English talkers from four different first-language (L1) backgrounds served as both training and test talkers. After exposure to 60 L2 English sentences in one training session, cross-talker/cross-accent generalization was possible (but not guaranteed) following either multiple- or single-talker training with variation across training-test talker pairings. Contrary to predictions of the similarity-based account, adaptation was not consistently better for identical than for mismatched training-test talker pairings, and generalization patterns were asymmetrical across training-test talker pairs. Acoustic analyses also revealed a dissociation between phonetic similarity and cross-talker/cross-accent generalization. Notably, variation in adaptation and generalization related to variation in training phase intelligibility. Together with prior evidence, these data suggest that perceptual learning for speech may benefit from some combination of exposure to talker variability, training-test similarity, and high training phase intelligibility.  more » « less
Award ID(s):
1921678
PAR ID:
10593937
Author(s) / Creator(s):
; ;
Publisher / Repository:
Acoustical Society of America (ASA)
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
154
Issue:
3
ISSN:
0001-4966
Format(s):
Medium: X Size: p. 1601-1613
Size(s):
p. 1601-1613
Sponsoring Org:
National Science Foundation
More Like this
  1. Speech recognition by both humans and machines frequently fails in non-optimal yet common situations. For example, word recognition error rates for second-language (L2) speech can be high, especially under conditions involving background noise. At the same time, both human and machine speech recognition sometimes shows remarkable robustness against signal- and noise-related degradation. Which acoustic features of speech explain this substantial variation in intelligibility? Current approaches align speech to text to extract a small set of pre-defined spectro-temporal properties from specific sounds in particular words. However, variation in these properties leaves much cross-talker variation in intelligibility unexplained. We examine an alternative approach utilizing a perceptual similarity space acquired using self-supervised learning. This approach encodes distinctions between speech samples without requiring pre-defined acoustic features or speech-to-text alignment. We show that L2 English speech samples are less tightly clustered in the space than L1 samples reflecting variability in English proficiency among L2 talkers. Critically, distances in this similarity space are perceptually meaningful: L1 English listeners have lower recognition accuracy for L2 speakers whose speech is more distant in the space from L1 speech. These results indicate that perceptual similarity may form the basis for an entirely new speech and language analysis approach. 
    more » « less
  2. When listeners encounter a difficult-to-understand talker in a difficult-to-understand situation, their perceptual mechanisms can adapt, making the talker in the situation easier to understand. This study examined talker-specific perceptual adaptation experimentally by embedding speech from second-language (L2) English talkers in varying levels of noise and collecting transcriptions from first-language English listeners (ten talkers, 100 listeners per experiment). Experiments 1 and 2 demonstrated that prior experience with a L2 talker's speech presented first without noise and then with gradually increasing levels of noise facilitated recognition of that talker in loud noise. Experiment 3 tested whether adaptation is driven by tuning-in to the talker's voice and speech patterns, by examining recognition of speech-in-loud-noise following experience with the talker in quiet. Finally, experiment 4 tested whether adaptation is driven by tuning-out the background noise, by measuring speech-in-loud-noise recognition after experience with the talker in consistently loud noise. The results showed that both tuning-in to the talker and tuning-out the noise contribute to talker-specific perceptual adaptation to L2 speech-in-noise. 
    more » « less
  3. One of the main challenges individuals face when learning an additional language (L2) is learning its sound system, which includes learning to perceive L2 sounds accurately. High variability phonetic training (HVPT) is one method that has proven highly effective at helping individuals develop robust L2 perceptual categories, and recent meta-analytic work suggests that multi-talker training conditions provide a small but statistically reliable benefit compared to single-talker training. However, no study has compared lower and higher variability multi-talker conditions to determine how the number of talkers affects training outcomes, even though such information can shed additional light on how talker variability affects phonetic training. In this study, we randomly assigned 458 L2 Spanish learners to a two-talker or six-talker HVPT group or to a control group that did not receive HVPT. Training focused on L2 Spanish stops. We tested performance on trained talkers and words as well as several forms of generalization. The experimental groups improved more and demonstrated greater generalization than the control group, but neither experimental group outpaced the other. The number of sessions experimental participants completed moderated learning gains. 
    more » « less
  4. Unfamiliar accents can cause word recognition challenges, particularly in noisy environments, but few studies have incorporated quantitative pronunciation distance metrics to explain intelligibility differences across accents. To address this gap, intelligibility was measured for 18 talkers -- two from each of three first-language, one bilingual, and five second-language accents -- in quiet and two noise conditions. The relations between two edit distance metrics, which quantify phonetic differences from a reference accent, and intelligibility scores were assessed. Intelligibility was quantified through both fuzzy string matching and percent words correct. Both edit distance metrics were significantly related to intelligibility scores; a heuristic edit distance metric was the best predictor of intelligibility for both scoring methods. Further, there were stronger effects of edit distance as the listening condition increased in difficulty. Talker accent also contributed substantially to intelligibility models, but relations between accent and edit distance did not consistently pattern for the two talkers representing each accent. Frequency of production differences in vowels and consonants was negatively correlated with intelligibility, particularly for consonants. Together, these results suggest that significant amounts of variability in intelligibility across accents can be predicted by phonetic differences from the listener’s home accent. However, talker- and accent-specific pronunciation features, including suprasegmental characteristics, must be quantified to fully explain intelligibility across talkers and listening conditions. 
    more » « less
  5. Native talkers are able to enhance acoustic characteristics of their speech in a speaking style known as “clear speech,” which is better understood by listeners than “plain speech.” However, despite substantial research in the area of clear speech, it is less clear whether non-native talkers of various proficiency levels are able to adopt a clear speaking style and if so, whether this style has perceptual benefits for native listeners. In the present study, native English listeners evaluated plain and clear speech produced by three groups: native English talkers, non-native talkers with lower proficiency, and non-native talkers with higher proficiency. Listeners completed a transcription task (i.e., an objective measure of the speech intelligibility). We investigated intelligibility as a function of language background and proficiency and also investigated the acoustic modifications that are associated with these perceptual benefits. The results of the study suggest that both native and non-native talkers modulate their speech when asked to adopt a clear speaking style, but that the size of the acoustic modifications, as well as consequences of this speaking style for perception differ as a function of language background and language proficiency. 
    more » « less