Impact of Annotator Demographics on Sentiment Dataset Labeling

Ding, Yi; You, Jacob; Machulla, Tonja-Katrin; Jacobs, Jennifer; Sen, Pradeep; Höllerer, Tobias

doi:10.1145/3555632

Citation Details

Impact of Annotator Demographics on Sentiment Dataset Labeling

As machine learning methods become more powerful and capture more nuances of human behavior, biases in the dataset can shape what the model learns and is evaluated on. This paper explores and attempts to quantify the uncertainties and biases due to annotator demographics when creating sentiment analysis datasets. We ask >1000 crowdworkers to provide their demographic information and annotations for multimodal sentiment data and its component modalities. We show that demographic differences among annotators impute a significant effect on their ratings, and that these effects also occur in each component modality. We compare predictions of different state-of-the-art multimodal machine learning algorithms against annotations provided by different demographic groups, and find that changing annotator demographics can cause >4.5 in accuracy difference when determining positive versus negative sentiment. Our findings underscore the importance of accounting for crowdworker attributes, such as demographics, when building datasets, evaluating algorithms, and interpreting results for sentiment analysis. more »

Award ID(s):: 1911230

PAR ID:: 10477639

Author(s) / Creator(s):: Ding, Yi; You, Jacob; Machulla, Tonja-Katrin; Jacobs, Jennifer; Sen, Pradeep; Höllerer, Tobias

Publisher / Repository:: ACM CSCW

Date Published:: 2022-11-07

Journal Name:: Proceedings of the ACM on Human-Computer Interaction

Volume:: 6

Issue:: CSCW2

ISSN:: 2573-0142

Page Range / eLocation ID:: 1 to 22

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3555632

More Like this