skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: The Expertise Involved in Deciding which HITs are Worth Doing on Amazon Mechanical Turk
Crowdworkers depend on Amazon Mechanical Turk (AMT) as an important source of income and it is left to workers to determine which tasks on AMT are fair and worth completing. While there are existing tools that assist workers in making these decisions, workers still spend significant amounts of time finding fair labor. Difficulties in this process may be a contributing factor in the imbalance between the median hourly earnings ($2.00/hour) and what the average requester pays ($11.00/hour). In this paper, we study how novices and experts select what tasks are worth doing. We argue that differences between the two populations likely lead to the wage imbalances. For this purpose, we first look at workers' comments in TurkOpticon (a tool where workers share their experience with requesters on AMT). We use this study to start to unravel what fair labor means for workers. In particular, we identify the characteristics of labor that workers consider is of "good quality'' and labor that is of "poor quality'' (e.g., work that pays too little.) Armed with this knowledge, we then conduct an experiment to study how experts and novices rate tasks that are of both good and poor quality. Through our research we uncover that experts and novices both treat good quality labor in the same way. However, there are significant differences in how experts and novices rate poor quality labor, and whether they believe the poor quality labor is worth doing. This points to several future directions, including machine learning models that support workers in detecting poor quality labor, and paths for educating novice workers on how to make better labor decisions on AMT.  more » « less
Award ID(s):
1928528
PAR ID:
10276156
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Computer supported cooperative work CSCW
ISSN:
1573-7551
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Crowdsourcing markets provide workers with a centralized place to find paid work. What may not be obvious at first glance is that, in addition to the work they do for pay, crowd workers also have to shoulder a variety of unpaid invisible labor in these markets, which ultimately reduces workers' hourly wages. Invisible labor includes finding good tasks, messaging requesters, or managing payments. However, we currently know little about how much time crowd workers actually spend on invisible labor or how much it costs them economically. To ensure a fair and equitable future for crowd work, we need to be certain that workers are being paid fairly for ALL of the work they do. In this paper, we conduct a field study to quantify the invisible labor in crowd work. We build a plugin to record the amount of time that 100 workers on Amazon Mechanical Turk dedicate to invisible labor while completing 40,903 tasks. If we ignore the time workers spent on invisible labor, workers' median hourly wage was $3.76. But, we estimated that crowd workers in our study spent 33% of their time daily on invisible labor, dropping their median hourly wage to $2.83. We found that the invisible labor differentially impacts workers depending on their skill level and workers' demographics. The invisible labor category that took the most time and that was also the most common revolved around workers having to manage their payments. The second most time-consuming invisible labor category involved hyper-vigilance, where workers vigilantly watched over requesters' profiles for newly posted work or vigilantly searched for labor. We hope that through our paper, the invisible labor in crowdsourcing becomes more visible, and our results help to reveal the larger implications of the continuing invisibility of labor in crowdsourcing. 
    more » « less
  2. Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI. 
    more » « less
  3. Ethical decision-making is difficult, certainly for robots let alone humans. If a robot's ethical decision-making process is going to be designed based on some approximation of how humans operate, then the assumption is that a good model of how humans make an ethical choice is readily available. Yet no single ethical framework seems sufficient to capture the diversity of human ethical decision making. Our work seeks to develop the computational underpinnings that will allow a robot to use multiple ethical frameworks that guide it towards doing the right thing. As a step towards this goal, we have collected data investigating how regular adults and ethics experts approach ethical decisions related to the use in a healthcare and game playing scenario. The decisions made by the former group is intended to represent an approximation of a folk morality approach to these dilemmas. On the other hand, experts were asked to judge what decision would result if a person was using one of several different types of ethical frameworks. The resulting data may reveal which features of the pill sorting and game playing scenarios contribute to similarities and differences between expert and non-expert responses. This type of approach to programming a robot may one day be able to rely on specific features of an interaction to determine which ethical framework to use in the robot's decision making. 
    more » « less
  4. Crowd workers struggle to earn adequate wages. Given the limited task-related information provided on crowd platforms, workers often fail to estimate how long it would take to complete certain microtasks. Although there exist a few third-party tools and online communities that provide estimates of working times, such information is limited to microtasks that have been previously completed by other workers, and such tasks are usually booked immediately by experienced workers. This paper presents a computational technique for predicting microtask working times (i.e., how much time it takes to complete microtasks) based on past experiences of workers regarding similar tasks. The following two challenges were addressed during development of the proposed predictive model — (i) collection of sufficient training data labeled with accurate working times, and (ii) evaluation and optimization of the prediction model. The paper first describes how 7,303 microtask submission data records were collected using a web browser extension — installed by 83 Amazon Mechanical Turk (AMT) workers — created for characterization of the diversity of worker behavior to facilitate accurate recording of working times. Next, challenges encountered in defining evaluation and/or objective functions have been described based on the tolerance demonstrated by workers with regard to prediction errors. To this end, surveys were conducted in AMT asking workers how they felt regarding prediction errors in working times pertaining to microtasks simulated using an “imaginary” AI system. Based on 91,060 survey responses submitted by 875 workers, objective/evaluation functions were derived for use in the prediction model to reflect whether or not the calculated prediction errors would be tolerated by workers. Evaluation results based on worker perceptions of prediction errors revealed that the proposed model was capable of predicting worker-tolerable working times in 73.6% of all tested microtask cases. Further, the derived objective function contributed to realization of accurate predictions across microtasks with more diverse durations. 
    more » « less
  5. Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently good quality. This can be due to the way that the requester presents a task and how they interact with the workers. This paper introduces DialCrowd 2.0 to help requesters obtain higher quality data by, for example, presenting tasks more clearly and facilitating effective communication with workers. DialCrowd 2.0 guides developers in creating improved Human Intelligence Tasks (HITs) and is directly applicable to the workflows used currently by developers and researchers. 
    more » « less