Large language models (LLMs) are fast becoming ubiquitous and have shown impressive performance in various natural language processing (NLP) tasks. Annotating data for downstream applications is a resource-intensive task in NLP. Recently, the use of LLMs as a cost-effective data annotator for annotating data used to train other models or as an assistive tool has been explored. Yet, little is known regarding the societal implications of using LLMs for data annotation. In this work, focusing on hate speech detection, we investigate how using LLMs such as GPT-4 and Llama-3 for hate speech detection can lead to different performances for different text dialects and racial bias in online hate detection classifiers. We used LLMs to predict hate speech in seven hate speech datasets and trained classifiers on the LLM annotations of each dataset. Using tweets written in African-American English (AAE) and Standard American English (SAE), we show that classifiers trained on LLM annotations assign tweets written in AAE to negative classes (e.g., hate, offensive, abuse, racism, etc.) at a higher rate than tweets written in SAE and that the classifiers have a higher false positive rate towards AAE tweets. We explore the effect of incorporating dialect priming in the prompting techniques used in prediction, showing that introducing dialect increases the rate at which AAE tweets are assigned to negative classes. 
                        more » 
                        « less   
                    
                            
                            DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
                        
                    
    
            Existing large language models (LLMs) that mainly focus on Standard American English (SAE) often lead to significantly worse performance when being applied to other English dialects. While existing mitigations tackle discrepancies for individual target dialects, they assume access to high-accuracy dialect identification systems. The boundaries between dialects are inherently flexible, making it difficult to categorize language into discrete predefined categories. In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. The compositional architecture of DADA allows for both targeted adaptation to specific dialect variants and simultaneous adaptation to various dialects. We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2247357
- PAR ID:
- 10506663
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Journal Name:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Page Range / eLocation ID:
- 13776 to 13793
- Format(s):
- Medium: X
- Location:
- Singapore
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Online data collection allows for access to diverse populations. In the current study, we used online recruitment and data collection methods to obtain a corpus of read speech from adult talkers representing three authentic regional dialects of American English and one novel dialect created for the corpus. The authentic dialects (New England, Northern, and Southern American English) are each represented by 8–10 talkers, ranging in age from 22 to 75 years old. The novel dialect was produced by five Spanish-English bilinguals with training in linguistics, who were asked to produce Spanish /o/ in an otherwise English segmental context. One vowel contrast was selected for each dialect, in which the vowels within the contrast are acoustically more similar in the target dialect than in the other dialects. Each talker produced one familiar short story with 40 tokens of each vowel within the target contrast for their dialect, as well as a set of real words and nonwords that represent both the target vowel contrast for their dialect and the other three vowel contrasts for comparison across dialects. Preliminary acoustic analysis reveals both cross-dialect and within-dialect variability in the target vowel contrasts. The corpus materials are available to the scholarly community.more » « less
- 
            Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects.more » « less
- 
            Identifying linguistic differences between dialects of a language often requires expert knowledge and meticulous human analysis. This is largely due to the complexity and nuance involved in studying various dialects. We present a novel approach to extract distinguishing lexical features of dialects by utilizing interpretable dialect classifiers, even in the absence of human experts. We explore both posthoc and intrinsic approaches to interpretability, conduct experiments on Mandarin, Italian, and Low Saxon, and experimentally demonstrate that our method successfully identifies key language-specific lexical features that contribute to dialectal variationsmore » « less
- 
            The retraction of /s/ in /str/, eg street, is a sound change found in certain English dialects. Previous work suggests that /s/-retraction arises from lower spectral frequency /s/ in /str/. The extent to which /s/-retraction differs across English dialects is unclear. This paper presents results from a large-scale, acoustic phonetic study of sibilants in 420 speakers, from 6 spontaneous speech corpora (9 dialects) of North American and Scottish English. Spectral Centre of Gravity was modelled from automatic measures of word-initial sibilants. Female speakers show higher frequency sibilants than males, but more so for /s/ than /ʃ/; /s/ is also higher in American than Canadian/Scottish dialects; /ʃ/ is surprisingly variable. /s/-retraction, modelled as retraction ratios, is generally greater for /str/ than /spr skr/, but varies by dialect; females show more retraction in /str/ than males. Dialectal and social factors clearly influence /s/-retraction in English clusters /sp st sk/, /spr skr/, and /str/.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
