skip to main content


Title: Perceptual consequences of native and non-native clear speech
Native talkers are able to enhance acoustic characteristics of their speech in a speaking style known as “clear speech,” which is better understood by listeners than “plain speech.” However, despite substantial research in the area of clear speech, it is less clear whether non-native talkers of various proficiency levels are able to adopt a clear speaking style and if so, whether this style has perceptual benefits for native listeners. In the present study, native English listeners evaluated plain and clear speech produced by three groups: native English talkers, non-native talkers with lower proficiency, and non-native talkers with higher proficiency. Listeners completed a transcription task (i.e., an objective measure of the speech intelligibility). We investigated intelligibility as a function of language background and proficiency and also investigated the acoustic modifications that are associated with these perceptual benefits. The results of the study suggest that both native and non-native talkers modulate their speech when asked to adopt a clear speaking style, but that the size of the acoustic modifications, as well as consequences of this speaking style for perception differ as a function of language background and language proficiency.  more » « less
Award ID(s):
1941739
NSF-PAR ID:
10325320
Author(s) / Creator(s):
;
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
151
Issue:
2
ISSN:
0001-4966
Page Range / eLocation ID:
1246 to 1258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Previous research has shown that native listeners benefit from clearly produced speech, as well as from predictable semantic context when these enhancements are delivered in native speech. However, it is unclear whether native listeners benefit from acoustic and semantic enhancements differently when listening to other varieties of speech, including non-native speech. The current study examines to what extent native English listeners benefit from acoustic and semantic cues present in native and non-native English speech. Native English listeners transcribed sentence final words that were of different levels of semantic predictability, produced in plain- or clear-speaking styles by Native English talkers and by native Mandarin talkers of higher- and lower-proficiency in English. The perception results demonstrated that listeners benefited from semantic cues in higher- and lower-proficiency talkers’ speech (i.e., transcribed speech more accurately), but not from acoustic cues, even though higher-proficiency talkers did make substantial acoustic enhancements from plain to clear speech. The current results suggest that native listeners benefit more robustly from semantic cues than from acoustic cues when those cues are embedded in non-native speech.

     
    more » « less
  2. Speech recognition by both humans and machines frequently fails in non-optimal yet common situations. For example, word recognition error rates for second-language (L2) speech can be high, especially under conditions involving background noise. At the same time, both human and machine speech recognition sometimes shows remarkable robustness against signal- and noise-related degradation. Which acoustic features of speech explain this substantial variation in intelligibility? Current approaches align speech to text to extract a small set of pre-defined spectro-temporal properties from specific sounds in particular words. However, variation in these properties leaves much cross-talker variation in intelligibility unexplained. We examine an alternative approach utilizing a perceptual similarity space acquired using self-supervised learning. This approach encodes distinctions between speech samples without requiring pre-defined acoustic features or speech-to-text alignment. We show that L2 English speech samples are less tightly clustered in the space than L1 samples reflecting variability in English proficiency among L2 talkers. Critically, distances in this similarity space are perceptually meaningful: L1 English listeners have lower recognition accuracy for L2 speakers whose speech is more distant in the space from L1 speech. These results indicate that perceptual similarity may form the basis for an entirely new speech and language analysis approach.

     
    more » « less
  3. Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic–phonetic models in more proficient listeners.

    Significance StatementBehavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent.

     
    more » « less
  4. This study uses non-native perception data to examine the relationship between perceived phonetic similarity of segments and their phonological patterning. Segments that are phonetically similar to one another are anticipated to pattern together phonologically, and segments that share articulatory or acoustic properties are also expected to be perceived as similar. What is not yet clear is whether segments that pattern together phonologically are perceived as similar. This study addresses this question by examining how L1 English listeners and L1 Guébie listeners perceive non-native implosive consonants compared with plosives and sonorants. English does not have contrastive implosives, whereas Guébie has a bilabial implosive. The bilabial implosive phonologically patterns with sonorants in Guébie, to the exclusion of obstruents. Two perception experiments show English listeners make more perceptual categorization errors between implosives and voiced plosives than Guébie listeners do, but both listener groups are more likely to classify implosives as similar to voiced plosives than sonorants. The results also show that Guébie listeners are better at categorizing non-native implosive consonants (i.e., alveolar implosives) than English listeners, showing that listeners are able to extend features or gestures from their L1 to non-native implosive consonants. The results of these experiments suggest a cross-linguistic perceptual similarity hierarchy of implosives compared with other segments that are not affected by L1 phonological patterning.

     
    more » « less
  5. Abstract

    Multilingual speakers can find speech recognition in everyday environments like restaurants and open-plan offices particularly challenging. In a world where speaking multiple languages is increasingly common, effective clinical and educational interventions will require a better understanding of how factors like multilingual contexts and listeners’ language proficiency interact with adverse listening environments. For example, word and phrase recognition is facilitated when competing voices speak different languages. Is this due to a “release from masking” from lower-level acoustic differences between languages and talkers, or higher-level cognitive and linguistic factors? To address this question, we created a “one-man bilingual cocktail party” selective attention task using English and Mandarin speech from one bilingual talker to reduce low-level acoustic cues. In Experiment 1, 58 listeners more accurately recognized English targets when distracting speech was Mandarin compared to English. Bilingual Mandarin–English listeners experienced significantly more interference and intrusions from the Mandarin distractor than did English listeners, exacerbated by challenging target-to-masker ratios. In Experiment 2, 29 Mandarin–English bilingual listeners exhibited linguistic release from masking in both languages. Bilinguals experienced greater release from masking when attending to English, confirming an influence of linguistic knowledge on the “cocktail party” paradigm that is separate from primarily energetic masking effects. Effects of higher-order language processing and expertise emerge only in the most demanding target-to-masker contexts. The “one-man bilingual cocktail party” establishes a useful tool for future investigations and characterization of communication challenges in the large and growing worldwide community of Mandarin–English bilinguals.

     
    more » « less