skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Perceptual learning alters effects of foreign language backgrounds in speech-in-speech recognition
The effect of training on linguistic release from masking (LRM) was examined. In a pre-test and post-test, English monolingual listeners transcribed sentences presented with English and Dutch maskers. During training, participants transcribed sentences with either Dutch, English, or white noise maskers and received feedback. LRM was evident in the pre-test (performance was better with Dutch maskers) but was eliminated after training (masker conditions did not differ). Thus, the informational masking driving LRM can be ameliorated through training. This study is a basis for future research examining the specific aspects of informational masking that change as a function of experience.  more » « less
Award ID(s):
2126888
PAR ID:
10589344
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Acoustical Society of America (ASA)
Date Published:
Journal Name:
JASA Express Letters
Volume:
2
Issue:
8
ISSN:
2691-1191
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent work on perceptual learning for speech has suggested that while high-variability training typically results in generalization, low-variability exposure can sometimes be sufficient for cross-talker generalization. We tested predictions of a similarity-based account, according to which, generalization depends on training-test talker similarity rather than on exposure to variability. We compared perceptual adaptation to second-language (L2) speech following single- or multiple-talker training with a round-robin design in which four L2 English talkers from four different first-language (L1) backgrounds served as both training and test talkers. After exposure to 60 L2 English sentences in one training session, cross-talker/cross-accent generalization was possible (but not guaranteed) following either multiple- or single-talker training with variation across training-test talker pairings. Contrary to predictions of the similarity-based account, adaptation was not consistently better for identical than for mismatched training-test talker pairings, and generalization patterns were asymmetrical across training-test talker pairs. Acoustic analyses also revealed a dissociation between phonetic similarity and cross-talker/cross-accent generalization. Notably, variation in adaptation and generalization related to variation in training phase intelligibility. Together with prior evidence, these data suggest that perceptual learning for speech may benefit from some combination of exposure to talker variability, training-test similarity, and high training phase intelligibility. 
    more » « less
  2. Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic–phonetic models in more proficient listeners. Significance StatementBehavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent. 
    more » « less
  3. Abstract Multilingual speakers can find speech recognition in everyday environments like restaurants and open-plan offices particularly challenging. In a world where speaking multiple languages is increasingly common, effective clinical and educational interventions will require a better understanding of how factors like multilingual contexts and listeners’ language proficiency interact with adverse listening environments. For example, word and phrase recognition is facilitated when competing voices speak different languages. Is this due to a “release from masking” from lower-level acoustic differences between languages and talkers, or higher-level cognitive and linguistic factors? To address this question, we created a “one-man bilingual cocktail party” selective attention task using English and Mandarin speech from one bilingual talker to reduce low-level acoustic cues. In Experiment 1, 58 listeners more accurately recognized English targets when distracting speech was Mandarin compared to English. Bilingual Mandarin–English listeners experienced significantly more interference and intrusions from the Mandarin distractor than did English listeners, exacerbated by challenging target-to-masker ratios. In Experiment 2, 29 Mandarin–English bilingual listeners exhibited linguistic release from masking in both languages. Bilinguals experienced greater release from masking when attending to English, confirming an influence of linguistic knowledge on the “cocktail party” paradigm that is separate from primarily energetic masking effects. Effects of higher-order language processing and expertise emerge only in the most demanding target-to-masker contexts. The “one-man bilingual cocktail party” establishes a useful tool for future investigations and characterization of communication challenges in the large and growing worldwide community of Mandarin–English bilinguals. 
    more » « less
  4. We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve SpanishEnglish ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, despite the fact that the shared language in these tasks is the target language text, not the source language audio. Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU. 
    more » « less
  5. We examined the neural correlates underlying the semantic processing of native- and nonnative-accented sentences, presented in quiet or embedded in multi-talker noise. Implementing a semantic violation paradigm, 36 English monolingual young adults listened to American-accented (native) and Chinese-accented (nonnative) English sentences with or without semantic anomalies, presented in quiet or embedded in multi-talker noise, while EEG was recorded. After hearing each sentence, participants verbally repeated the sentence, which was coded and scored as an offline comprehension accuracy measure. In line with earlier behavioral studies, the negative impact of background noise on sentence repetition accuracy was higher for nonnative-accented than for native-accented sentences. At the neural level, the N400 effect for semantic anomaly was larger for nativeaccented than for nonnative-accented sentences, and was also larger for sentences presented in quiet than in noise, indicating impaired lexical-semantic access when listening to nonnative-accented speech or sentences embedded in noise. No semantic N400 effect was observed for nonnative-accented sentences presented in noise. Furthermore, the frequency of neural oscillations in the alpha frequency band (an index of online cognitive listening effort) was higher when listening to sentences in noise versus in quiet, but no difference was observed across the accent conditions. Semantic anomalies presented in background noise also elicited higher theta activity, whereas processing nonnative-accented anomalies was associated with decreased theta activity. Taken together, we found that listening to nonnative accents or background noise is associated with processing challenges during online semantic access, leading to decreased comprehension accuracy. However, the underlying cognitive mechanism (e.g., associated listening efforts) might manifest differently across accented speech processing and speech in noise processing. 
    more » « less