skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 9, 2025

Title: Race Identification in American English
Purpose:This study examined the race identification of Southern American English speakers from two geographically distant regions in North Carolina. The purpose of this work is to explore how talkers' self-identified race, talker dialect region, and acoustic speech variables contribute to listener categorization of talker races. Method:Two groups of listeners heard a series of /h/–vowel–/d/ (/hVd/) words produced by Black and White talkers from East and West North Carolina, respectively. Results:Both Southern (North Carolina) and Midland (Indiana) listeners accurately categorized the race of all speakers with greater-than-chance accuracy; however, Western North Carolina Black talkers were categorized with the lowest accuracy, just above chance. Conclusions:The results suggest that similarities in the speech production patterns of West North Carolina Black and White talkers affect the racial categorization of Black, but not White talkers. The results are discussed with respect to the acoustic spectral features of the voices present in the sample population.  more » « less
Award ID(s):
2126414 2126405
PAR ID:
10559584
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Journal of Speech, Language, and Hearing Research
Date Published:
Journal Name:
Journal of Speech, Language, and Hearing Research
Volume:
67
Issue:
12
ISSN:
1092-4388
Page Range / eLocation ID:
4614 to 4627
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. When listeners encounter a difficult-to-understand talker in a difficult-to-understand situation, their perceptual mechanisms can adapt, making the talker in the situation easier to understand. This study examined talker-specific perceptual adaptation experimentally by embedding speech from second-language (L2) English talkers in varying levels of noise and collecting transcriptions from first-language English listeners (ten talkers, 100 listeners per experiment). Experiments 1 and 2 demonstrated that prior experience with a L2 talker's speech presented first without noise and then with gradually increasing levels of noise facilitated recognition of that talker in loud noise. Experiment 3 tested whether adaptation is driven by tuning-in to the talker's voice and speech patterns, by examining recognition of speech-in-loud-noise following experience with the talker in quiet. Finally, experiment 4 tested whether adaptation is driven by tuning-out the background noise, by measuring speech-in-loud-noise recognition after experience with the talker in consistently loud noise. The results showed that both tuning-in to the talker and tuning-out the noise contribute to talker-specific perceptual adaptation to L2 speech-in-noise. 
    more » « less
  2. Speech recognition by both humans and machines frequently fails in non-optimal yet common situations. For example, word recognition error rates for second-language (L2) speech can be high, especially under conditions involving background noise. At the same time, both human and machine speech recognition sometimes shows remarkable robustness against signal- and noise-related degradation. Which acoustic features of speech explain this substantial variation in intelligibility? Current approaches align speech to text to extract a small set of pre-defined spectro-temporal properties from specific sounds in particular words. However, variation in these properties leaves much cross-talker variation in intelligibility unexplained. We examine an alternative approach utilizing a perceptual similarity space acquired using self-supervised learning. This approach encodes distinctions between speech samples without requiring pre-defined acoustic features or speech-to-text alignment. We show that L2 English speech samples are less tightly clustered in the space than L1 samples reflecting variability in English proficiency among L2 talkers. Critically, distances in this similarity space are perceptually meaningful: L1 English listeners have lower recognition accuracy for L2 speakers whose speech is more distant in the space from L1 speech. These results indicate that perceptual similarity may form the basis for an entirely new speech and language analysis approach. 
    more » « less
  3. Abstract Multilingual speakers can find speech recognition in everyday environments like restaurants and open-plan offices particularly challenging. In a world where speaking multiple languages is increasingly common, effective clinical and educational interventions will require a better understanding of how factors like multilingual contexts and listeners’ language proficiency interact with adverse listening environments. For example, word and phrase recognition is facilitated when competing voices speak different languages. Is this due to a “release from masking” from lower-level acoustic differences between languages and talkers, or higher-level cognitive and linguistic factors? To address this question, we created a “one-man bilingual cocktail party” selective attention task using English and Mandarin speech from one bilingual talker to reduce low-level acoustic cues. In Experiment 1, 58 listeners more accurately recognized English targets when distracting speech was Mandarin compared to English. Bilingual Mandarin–English listeners experienced significantly more interference and intrusions from the Mandarin distractor than did English listeners, exacerbated by challenging target-to-masker ratios. In Experiment 2, 29 Mandarin–English bilingual listeners exhibited linguistic release from masking in both languages. Bilinguals experienced greater release from masking when attending to English, confirming an influence of linguistic knowledge on the “cocktail party” paradigm that is separate from primarily energetic masking effects. Effects of higher-order language processing and expertise emerge only in the most demanding target-to-masker contexts. The “one-man bilingual cocktail party” establishes a useful tool for future investigations and characterization of communication challenges in the large and growing worldwide community of Mandarin–English bilinguals. 
    more » « less
  4. Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance. 
    more » « less
  5. Abstract Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions. 
    more » « less