skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automatic Dialect Density Estimation for African American English
In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We investigate several acoustic and language modeling features, including the commonly used X-vector representation and ComParE feature set, in addition to information extracted from ASR transcripts of the audio files and prosodic information. To address issues of limited labeled data, we use a weakly supervised model to project prosodic and X-vector features into low-dimensional task-relevant representations. An XGBoost model is then used to predict the speaker's dialect density from these features and show which are most significant during inference. We evaluate the utility of these features both alone and in combination for the given task. This work, which does not rely on hand-labeled transcripts, is performed on audio segments from the CORAAL database. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database and propose this work as a tool for explaining and mitigating bias in speech technology.  more » « less
Award ID(s):
2202585
PAR ID:
10426185
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
ISCA
Date Published:
Journal Name:
INTERSPEECH 2023
Page Range / eLocation ID:
1283-1287
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper evaluates an innovative framework for spoken dialect density prediction on children's and adults' African American English. A speaker's dialect density is defined as the frequency with which dialect-specific language characteristics occur in their speech. Rather than treating the presence or absence of a target dialect in a user's speech as a binary decision, instead, a classifier is trained to predict the level of dialect density to provide a higher degree of specificity in downstream tasks. For this, self-supervised learning representations from HuBERT, handcrafted grammar-based features extracted from ASR transcripts, prosodic features, and other feature sets are experimented with as the input to an XGBoost classifier. Then, the classifier is trained to assign dialect density labels to short recorded utterances. High dialect density level classification accuracy is achieved for child and adult speech and demonstrated robust performance across age and regional varieties of dialect. Additionally, this work is used as a basis for analyzing which acoustic and grammatical cues affect machine perception of dialect. 
    more » « less
  2. IEEE SIGNAL PROCESSING SOCIETY (Ed.)
    This paper 1 presents a novel system which utilizes acoustic, phonological, morphosyntactic, and prosodic information for binary automatic dialect detection of African American English. We train this system utilizing adult speech data and then evaluate on both children’s and adults’ speech with unmatched training and testing scenarios. The proposed system combines novel and state-of-the-art architectures, including a multi-source transformer language model pre-trained on Twitter text data and fine-tuned on ASR transcripts as well as an LSTM acoustic model trained on self-supervised learning representations, in order to learn a comprehensive view of dialect. We show robust, explainable performance across recording conditions for different features for adult speech, but fusing multiple features is important for good results on children’s speech. 
    more » « less
  3. Large language models (LLMs) are fast becoming ubiquitous and have shown impressive performance in various natural language processing (NLP) tasks. Annotating data for downstream applications is a resource-intensive task in NLP. Recently, the use of LLMs as a cost-effective data annotator for annotating data used to train other models or as an assistive tool has been explored. Yet, little is known regarding the societal implications of using LLMs for data annotation. In this work, focusing on hate speech detection, we investigate how using LLMs such as GPT-4 and Llama-3 for hate speech detection can lead to different performances for different text dialects and racial bias in online hate detection classifiers. We used LLMs to predict hate speech in seven hate speech datasets and trained classifiers on the LLM annotations of each dataset. Using tweets written in African-American English (AAE) and Standard American English (SAE), we show that classifiers trained on LLM annotations assign tweets written in AAE to negative classes (e.g., hate, offensive, abuse, racism, etc.) at a higher rate than tweets written in SAE and that the classifiers have a higher false positive rate towards AAE tweets. We explore the effect of incorporating dialect priming in the prompting techniques used in prediction, showing that introducing dialect increases the rate at which AAE tweets are assigned to negative classes. 
    more » « less
  4. Abstract Research has suggested that children who speak African American English (AAE) have difficulty using features produced in Mainstream American English (MAE) but not AAE, to comprehend sentences in MAE. However, past studies mainly examined dialect features, such as verbal -s , that are produced as final consonants with shorter durations when produced in conversation which impacts their phonetic saliency. Therefore, it is unclear if previous results are due to the phonetic saliency of the feature or how AAE speakers process MAE dialect features more generally. This study evaluated if there were group differences in how AAE- and MAE-speaking children used the auxiliary verbs was and were, a dialect feature with increased phonetic saliency but produced differently between the dialects, to interpret sentences in MAE. Participants aged 6, 5–10, and 0 years, who spoke MAE or AAE, completed the DELV-ST, a vocabulary measure (PVT), and a sentence comprehension task. In the sentence comprehension task, participants heard sentences in MAE that had either unambiguous or ambiguous subjects. Sentences with ambiguous subjects were used to evaluate group differences in sentence comprehension. AAE-speaking children were less likely than MAE-speaking children to use the auxiliary verbs was and were to interpret sentences in MAE. Furthermore, dialect density was predictive of Black participant’s sensitivity to the auxiliary verb. This finding is consistent with how the auxiliary verb is produced between the two dialects: was is used to mark both singular and plural subjects in AAE, while MAE uses was for singular and were for plural subjects. This study demonstrated that even when the dialect feature is more phonetically salient, differences between how verb morphology is produced in AAE and MAE impact how AAE-speaking children comprehend MAE sentences. 
    more » « less
  5. We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and ratings of toxicity in several widely used hate speech datasets. Then, we show that models trained on these corpora acquire and propagate these biases, such that AAE tweets and tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. Finally, we propose dialect and race priming as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive. 
    more » « less