skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dynamic acoustic vowel distances within and across dialects
Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.  more » « less
Award ID(s):
1843454
PAR ID:
10560980
Author(s) / Creator(s):
Publisher / Repository:
Acoustical Society of America
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
156
Issue:
4
ISSN:
0001-4966
Page Range / eLocation ID:
2497 to 2507
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Online data collection allows for access to diverse populations. In the current study, we used online recruitment and data collection methods to obtain a corpus of read speech from adult talkers representing three authentic regional dialects of American English and one novel dialect created for the corpus. The authentic dialects (New England, Northern, and Southern American English) are each represented by 8–10 talkers, ranging in age from 22 to 75 years old. The novel dialect was produced by five Spanish-English bilinguals with training in linguistics, who were asked to produce Spanish /o/ in an otherwise English segmental context. One vowel contrast was selected for each dialect, in which the vowels within the contrast are acoustically more similar in the target dialect than in the other dialects. Each talker produced one familiar short story with 40 tokens of each vowel within the target contrast for their dialect, as well as a set of real words and nonwords that represent both the target vowel contrast for their dialect and the other three vowel contrasts for comparison across dialects. Preliminary acoustic analysis reveals both cross-dialect and within-dialect variability in the target vowel contrasts. The corpus materials are available to the scholarly community. 
    more » « less
  2. Anticipatory coarticulation is a highly informative cue to upcoming linguistic information: listeners can identify that the word is ben and not bed by hearing the vowel alone. The present study compares the relative performances of human listeners and a self-supervised pre-trained speech model (wav2vec 2.0) in the use of nasal coarticulation to classify vowels. Stimuli consisted of nasalized (from CVN words) and non-nasalized (from CVCs) American English vowels produced by 60 humans and generated in 36 TTS voices. wav2vec 2.0 performance is similar to human listener performance, in aggregate. Broken down by vowel type: both wav2vec 2.0 and listeners perform higher for non-nasalized vowels produced naturally by humans. However, wav2vec 2.0 shows higher correct classification performance for nasalized vowels, than for non-nasalized vowels, for TTS voices. Speaker-level patterns reveal that listeners' use of coarticulation is highly variable across talkers. wav2vec 2.0 also shows cross-talker variability in performance. Analyses also reveal differences in the use of multiple acoustic cues in nasalized vowel classifications across listeners and the wav2vec 2.0. Findings have implications for understanding how coarticulatory variation is used in speech perception. Results also can provide insight into how neural systems learn to attend to the unique acoustic features of coarticulation. 
    more » « less
  3. Listeners attend to variation in segmental and prosodic cues when judging accent strength. The relative contributions of these cues to perceptions of accentedness in English remains open for investigation, although objective accent distance measures (such as Levenshtein distance) appear to be reliable tools for predicting perceptual distance. Levenshtein distance, however, only accounts for phonemic information in the signal. The purpose of the current study was to examine the relative contributions of phonemic (Levenshtein) and holistic acoustic (dynamic time warping) distances from the local accent to listeners’ accent rankings for nine non-local native and nonnative accents. Listeners (n =52) ranked talkers on perceived distance from the local accent (Midland American English) using a ladder task for three sentence-length stimuli. Phonemic and holistic acoustic distances between Midland American English and the other accents were quantified using both weighted and unweighted Levenshtein distance measures, and dynamic time warping (DTW). Results reveal that all three metrics contribute to perceived accent distance, with the weighted Levenshtein slightly outperforming the other measures. Moreover, the relative contribution of phonemic and holistic acoustic cues was driven by the speaker’s accent. Both nonnative and non-local native accents were included in this study, and the benefits of considering both of these accent groups in studying phonemic and acoustic cues used by listeners is discussed. 
    more » « less
  4. This study examines apparent-time variation in the use of multiple acoustic cues present on coarticulatorily nasalized vowels in California English. Eighty-nine listeners ranging in age from 18-58 (grouped into 3 apparent-time categories based on year of birth) performed lexical identifications on syllables excised from words with oral and nasal codas from six speakers who produced either minimal (n=3) or extensive (n=3) anticipatory nasal coarticulation (realized by greater vowel nasalization, F1 bandwidth, and diphthongization on vowels in CVN contexts). Results showed no differences across listeners’ identification for Extensively coarticulated vowels, as well as oral vowels by both types of speakers (all at-ceiling). Yet, performance for the Minimal Coarticulators’ nasalized vowels was lowest for the older listener group and increased over apparent-time. Perceptual cue-weighting analyses revealed that older listeners rely more on F1 bandwidth, while younger listeners rely more on acoustic nasality, as coarticulatory cues providing information about lexical identity. Thus, there is evidence for variation in apparent- time in the use of the different coarticulatory cues present on vowels. Younger listeners’ cue weighting allows them flexibility to identify lexical items given a range of coarticulatory variation across (here, younger) speakers, while older listeners’ cue weighting leads to reduced performance for talkers producing innovative phonetic forms. This study contributes to our understanding of the relationship between multidimensional acoustic features resulting from coarticulation and the perceptual re-weighting of cues that can lead to sound change over time. 
    more » « less
  5. To test the hypothesis that intraspeaker variation in vowel formants is related to the direction of diachronic change, we compare the direction of change in apparent time with the axis of intraspeaker variation in F1 and F2 for vowel phonemes in several corpora of North American and Scottish English. These vowels were measured automatically with a scheme (tested on hand-measured vowels) that considers the frequency, bandwidth, and amplitude of the first three formants in reference to a prototype. In the corpus data, we find that the axis of intraspeaker variation is typically aligned vertically, presumably corresponding to the degree of jaw opening for individual tokens, but for the North American GOOSE vowel, the axis of intraspeaker variation is aligned with the (horizontal) axis of diachronic change for this vowel across North America. This may help to explain why fronting and unrounding of high back vowels are common shifts across languages. 
    more » « less