FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English

Rios, Anthony

doi:10.1609/aaai.v34i01.5434

Citation Details

FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English

Hate speech and offensive language are rampant on social media. Machine learning has provided a way to moderate foul language at scale. However, much of the current research focuses on overall performance. Models may perform poorly on text written in a minority dialectal language. For instance, a hate speech classifier may produce more false positives on tweets written in African-American Vernacular English (AAVE). To measure these problems, we need text written in both AAVE and Standard American English (SAE). Unfortunately, it is challenging to curate data for all linguistic styles in a timely manner—especially when we are constrained to specific problems, social media platforms, or by limited resources. In this paper, we answer the question, “How can we evaluate the performance of classifiers across minority dialectal languages when they are not present within a particular dataset?” Specifically, we propose an automated fairness fuzzing tool called FuzzE to quantify the fairness of text classifiers applied to AAVE text using a dataset that only contains text written in SAE. Overall, we find that the fairness estimates returned by our technique moderately correlates with the use of real ground-truth AAVE text. Warning: Offensive language is displayed in this manuscript. more »

Award ID(s):: 1947697

PAR ID:: 10412932

Author(s) / Creator(s):: Rios, Anthony

Date Published:: 2020-06-02

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 34

Issue:: 01

ISSN:: 2159-5399

Page Range / eLocation ID:: 881 to 889

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v34i01.5434

More Like this