skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Relating pronunciation distance metrics to intelligibility across English accents
Unfamiliar accents can cause word recognition challenges, particularly in noisy environments, but few studies have incorporated quantitative pronunciation distance metrics to explain intelligibility differences across accents. To address this gap, intelligibility was measured for 18 talkers -- two from each of three first-language, one bilingual, and five second-language accents -- in quiet and two noise conditions. The relations between two edit distance metrics, which quantify phonetic differences from a reference accent, and intelligibility scores were assessed. Intelligibility was quantified through both fuzzy string matching and percent words correct. Both edit distance metrics were significantly related to intelligibility scores; a heuristic edit distance metric was the best predictor of intelligibility for both scoring methods. Further, there were stronger effects of edit distance as the listening condition increased in difficulty. Talker accent also contributed substantially to intelligibility models, but relations between accent and edit distance did not consistently pattern for the two talkers representing each accent. Frequency of production differences in vowels and consonants was negatively correlated with intelligibility, particularly for consonants. Together, these results suggest that significant amounts of variability in intelligibility across accents can be predicted by phonetic differences from the listener’s home accent. However, talker- and accent-specific pronunciation features, including suprasegmental characteristics, must be quantified to fully explain intelligibility across talkers and listening conditions.  more » « less
Award ID(s):
1941691
PAR ID:
10560262
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Journal of Phonetics
Volume:
107
Issue:
C
ISSN:
0095-4470
Page Range / eLocation ID:
101357
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Radek Skarnitzl & Jan Volín (Ed.)
    Unfamiliar native and non-native accents can cause word recognition challenges, particularly in noisy environments, but few studies have incorporated quantitative pronunciation distance metrics to explain intelligibility differences across accents. Here, intelligibility was measured for 18 talkers -- two from each of three native, one bilingual, and five non- native accents -- in three listening conditions (quiet and two noise conditions). Two variations of the Levenshtein pronunciation distance metric, which quantifies phonemic differences from a reference accent, were assessed for their ability to predict intelligibility. An unweighted Levenshtein distance metric was the best intelligibility predictor; talker accent further predicted performance. Accuracy did not fall along a native - non-native divide. Thus, phonemic differences from the listener’s home accent primarily determine intelligibility, but other accent- specific pronunciation features, including suprasegmental characteristics, must be quantified to fully explain intelligibility across talkers and listening conditions. These results have implications for pedagogical practices and speech perception theories. 
    more » « less
  2. Abstract Children exhibit preferences for familiar accents early in life. However, they frequently have more difficulty distinguishing between first language (L1) accents than second language (L2) accents in categorization tasks. Few studies have addressed children’s perception of accent strength, or the relation between accent strength and objective measures of pronunciation distance. To address these gaps, 6- and 12-year-olds and adults ranked talkers’ perceived distance from the local accent (i.e., Midland American English). Rankings were compared with objective distance measures. Acoustic and phonetic distance measures were significant predictors of ladder rankings, but there was no evidence that children and adults significantly differed in their sensitivity to accent strength. Levenshtein Distance, a phonetic distance metric, was the strongest predictor of perceptual rankings for both children and adults. As a percept, accent strength has critical implications for social judgments, which determine real world social outcomes for talkers with non-local accents. 
    more » « less
  3. Recent work on perceptual learning for speech has suggested that while high-variability training typically results in generalization, low-variability exposure can sometimes be sufficient for cross-talker generalization. We tested predictions of a similarity-based account, according to which, generalization depends on training-test talker similarity rather than on exposure to variability. We compared perceptual adaptation to second-language (L2) speech following single- or multiple-talker training with a round-robin design in which four L2 English talkers from four different first-language (L1) backgrounds served as both training and test talkers. After exposure to 60 L2 English sentences in one training session, cross-talker/cross-accent generalization was possible (but not guaranteed) following either multiple- or single-talker training with variation across training-test talker pairings. Contrary to predictions of the similarity-based account, adaptation was not consistently better for identical than for mismatched training-test talker pairings, and generalization patterns were asymmetrical across training-test talker pairs. Acoustic analyses also revealed a dissociation between phonetic similarity and cross-talker/cross-accent generalization. Notably, variation in adaptation and generalization related to variation in training phase intelligibility. Together with prior evidence, these data suggest that perceptual learning for speech may benefit from some combination of exposure to talker variability, training-test similarity, and high training phase intelligibility. 
    more » « less
  4. Listeners attend to variation in segmental and prosodic cues when judging accent strength. The relative contributions of these cues to perceptions of accentedness in English remains open for investigation, although objective accent distance measures (such as Levenshtein distance) appear to be reliable tools for predicting perceptual distance. Levenshtein distance, however, only accounts for phonemic information in the signal. The purpose of the current study was to examine the relative contributions of phonemic (Levenshtein) and holistic acoustic (dynamic time warping) distances from the local accent to listeners’ accent rankings for nine non-local native and nonnative accents. Listeners (n =52) ranked talkers on perceived distance from the local accent (Midland American English) using a ladder task for three sentence-length stimuli. Phonemic and holistic acoustic distances between Midland American English and the other accents were quantified using both weighted and unweighted Levenshtein distance measures, and dynamic time warping (DTW). Results reveal that all three metrics contribute to perceived accent distance, with the weighted Levenshtein slightly outperforming the other measures. Moreover, the relative contribution of phonemic and holistic acoustic cues was driven by the speaker’s accent. Both nonnative and non-local native accents were included in this study, and the benefits of considering both of these accent groups in studying phonemic and acoustic cues used by listeners is discussed. 
    more » « less
  5. N/A (Ed.)
    Automatic pronunciation assessment (APA) plays an important role in providing feedback for self-directed language learners in computer-assisted pronunciation training (CAPT). Several mispronunciation detection and diagnosis (MDD) systems have achieved promising performance based on end-to-end phoneme recognition. However, assessing the intelligibility of second language (L2) remains a challenging problem. One issue is the lack of large-scale labeled speech data from non-native speakers. Additionally, relying only on one aspect (e.g., accuracy) at a phonetic level may not provide a sufficient assessment of pronunciation quality and L2 intelligibility. It is possible to leverage segmental/phonetic-level features such as goodness of pronunciation (GOP), however, feature granularity may cause a discrepancy in prosodic-level (suprasegmental) pronunciation assessment. In this study, Wav2vec 2.0-based MDD and Goodness Of Pronunciation feature-based Transformer are employed to characterize L2 intelligibility. Here, an L2 speech dataset, with human-annotated prosodic (suprasegmental) labels, is used for multi-granular and multi-aspect pronunciation assessment and identification of factors important for intelligibility in L2 English speech. The study provides a transformative comparative assessment of automated pronunciation scores versus the relationship between suprasegmental features and listener perceptions, which taken collectively can help support the development of instantaneous assessment tools and solutions for L2 learners. 
    more » « less