skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The stories and words online regional dialect (SWORD) corpus
Online data collection allows for access to diverse populations. In the current study, we used online recruitment and data collection methods to obtain a corpus of read speech from adult talkers representing three authentic regional dialects of American English and one novel dialect created for the corpus. The authentic dialects (New England, Northern, and Southern American English) are each represented by 8–10 talkers, ranging in age from 22 to 75 years old. The novel dialect was produced by five Spanish-English bilinguals with training in linguistics, who were asked to produce Spanish /o/ in an otherwise English segmental context. One vowel contrast was selected for each dialect, in which the vowels within the contrast are acoustically more similar in the target dialect than in the other dialects. Each talker produced one familiar short story with 40 tokens of each vowel within the target contrast for their dialect, as well as a set of real words and nonwords that represent both the target vowel contrast for their dialect and the other three vowel contrasts for comparison across dialects. Preliminary acoustic analysis reveals both cross-dialect and within-dialect variability in the target vowel contrasts. The corpus materials are available to the scholarly community.  more » « less
Award ID(s):
1843454
PAR ID:
10537790
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
AIP Publishing
Date Published:
Journal Name:
Proceedings of meetings on acoustics
ISSN:
1939-800X
Page Range / eLocation ID:
060003
Format(s):
Medium: X
Location:
Ottawa, Ontario, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects. 
    more » « less
  2. Existing large language models (LLMs) that mainly focus on Standard American English (SAE) often lead to significantly worse performance when being applied to other English dialects. While existing mitigations tackle discrepancies for individual target dialects, they assume access to high-accuracy dialect identification systems. The boundaries between dialects are inherently flexible, making it difficult to categorize language into discrete predefined categories. In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. The compositional architecture of DADA allows for both targeted adaptation to specific dialect variants and simultaneous adaptation to various dialects. We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects. 
    more » « less
  3. Purpose:This study examined the race identification of Southern American English speakers from two geographically distant regions in North Carolina. The purpose of this work is to explore how talkers' self-identified race, talker dialect region, and acoustic speech variables contribute to listener categorization of talker races. Method:Two groups of listeners heard a series of /h/–vowel–/d/ (/hVd/) words produced by Black and White talkers from East and West North Carolina, respectively. Results:Both Southern (North Carolina) and Midland (Indiana) listeners accurately categorized the race of all speakers with greater-than-chance accuracy; however, Western North Carolina Black talkers were categorized with the lowest accuracy, just above chance. Conclusions:The results suggest that similarities in the speech production patterns of West North Carolina Black and White talkers affect the racial categorization of Black, but not White talkers. The results are discussed with respect to the acoustic spectral features of the voices present in the sample population. 
    more » « less
  4. This study describes linguistic and social factors favoring acquisition of a low back vowel contrast by native speakers of Canadian English living in New York City (NYC). Previous literature has found that new phonemic distinctions seem difficult to acquire, both in L2 and D2 (second dialect) learning contexts. In contrast, this analysis shows that Canadian expats who have been exposed to NYC English due to mobility show small but significant distinctions between the COT and CAUGHT classes. Intriguingly, the social factor most strongly influencing the magnitude of this new contrast is not total years spent in NYC or even identification as a New Yorker, but choice of partner: Canadians married to New Yorkers show greater COT/CAUGHT contrast. These findings suggest that long term, consistent input from a regular and important interlocutor may facilitate the acquisition of new contrasts in a second dialect. 
    more » « less
  5. Abstract Research has suggested that children who speak African American English (AAE) have difficulty using features produced in Mainstream American English (MAE) but not AAE, to comprehend sentences in MAE. However, past studies mainly examined dialect features, such as verbal -s , that are produced as final consonants with shorter durations when produced in conversation which impacts their phonetic saliency. Therefore, it is unclear if previous results are due to the phonetic saliency of the feature or how AAE speakers process MAE dialect features more generally. This study evaluated if there were group differences in how AAE- and MAE-speaking children used the auxiliary verbs was and were, a dialect feature with increased phonetic saliency but produced differently between the dialects, to interpret sentences in MAE. Participants aged 6, 5–10, and 0 years, who spoke MAE or AAE, completed the DELV-ST, a vocabulary measure (PVT), and a sentence comprehension task. In the sentence comprehension task, participants heard sentences in MAE that had either unambiguous or ambiguous subjects. Sentences with ambiguous subjects were used to evaluate group differences in sentence comprehension. AAE-speaking children were less likely than MAE-speaking children to use the auxiliary verbs was and were to interpret sentences in MAE. Furthermore, dialect density was predictive of Black participant’s sensitivity to the auxiliary verb. This finding is consistent with how the auxiliary verb is produced between the two dialects: was is used to mark both singular and plural subjects in AAE, while MAE uses was for singular and were for plural subjects. This study demonstrated that even when the dialect feature is more phonetically salient, differences between how verb morphology is produced in AAE and MAE impact how AAE-speaking children comprehend MAE sentences. 
    more » « less