Who Broke Amazon Mechanical Turk?: An Analysis of Crowdsourcing Data Quality over Time

Marshall, Catherine C.; Goguladinne, Partha S.R.; Maheshwari, Mudit; Sathe, Apoorva; Shipman, Frank M.

doi:10.1145/3578503.3583622

Citation Details

Who Broke Amazon Mechanical Turk?: An Analysis of Crowdsourcing Data Quality over Time

We present the results of a survey fielded in June of 2022 as a lens to examine recent data reliability issues on Amazon Mechanical Turk. We contrast bad data from this survey with bad data from the same survey fielded among US workers in October 2013, April 2018, and February 2019. Application of an established data cleaning scheme reveals that unusable data has risen from a little over 2% in 2013 to almost 90% in 2022. Through symptomatic diagnosis, we attribute the data reliability drop not to an increase in bad faith work, but rather to a continuum of English proficiency levels. A qualitative analysis of workers’ responses to open-ended questions allows us to distinguish between low fluency workers, ultra-low fluency workers, satisficers, and bad faith workers. We go on to show the effects of the new low fluency work on Likert scale data and on the study’s qualitative results. Attention checks are shown to be much less effective than they once were at identifying survey responses that should be discarded. more »

Award ID(s):: 1816923

PAR ID:: 10462328

Author(s) / Creator(s):: Marshall, Catherine C.; Goguladinne, Partha S.R.; Maheshwari, Mudit; Sathe, Apoorva; Shipman, Frank M.

Date Published:: 2023-04-30

Journal Name:: WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

Page Range / eLocation ID:: 335 to 345

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3578503.3583622

More Like this