skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Performance of Paid and Volunteer Image Labeling in Citizen Science — A Retrospective Analysis
Citizen science projects that rely on human computation can attempt to solicit volunteers or use paid microwork platforms such as Amazon Mechanical Turk. To better understand these approaches, this paper analyzes crowdsourced image label data sourced from an environmental justice project looking at wetland loss off the coast of Louisiana. This retrospective analysis identifies key differences between the two populations: while Mechanical Turk workers are accessible, cost-efficient, and rate more images than volunteers (on average), their labels are of lower quality, whereas volunteers can achieve high accuracy with comparably few votes. Volunteer organizations can also interface with the educational or outreach goals of an organization in ways that the limited context of microwork prevents.  more » « less
Award ID(s):
1816426
PAR ID:
10382191
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
Volume:
10
Issue:
1
ISSN:
2769-1330
Page Range / eLocation ID:
64 to 73
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a crowd-driven adjudication system for rejected work on Amazon Mechanical Turk. The Mechanical Turk crowdsourcing platform allows Requesters to approve or reject assignments submitted by Workers. If the work is rejected, then Workers aren’t paid, and their reputation suffers. Currently, there is no built-in mechanism for Workers to appeal rejections, other than contacting Requesters directly. The time it takes Requesters to review potentially incorrectly rejected tasks means that their costs are substantially higher than the payment amount that is in dispute. As a solution to this issue, we present an automated appeals system called Turkish Judge which employs crowd workers as judges to adjudicate whether work was fairly rejected when their peers initiate an appeal. We describe our system, analyze the added cost to Requesters, and discuss the advantages of such a system to the Mechanical Turk marketplace and other similar microtasking platforms. 
    more » « less
  2. We present and analyze results from a pilot study that explores how crowdsourcing can be used in the process of generating distractors (incorrect an-swer choices) in multiple-choice concept inventories (conceptual tests of under-standing). To our knowledge, we are the first to propose and study this approach. Using Amazon Mechanical Turk, we collected approximately 180 open-ended responses to several question stems from the Cybersecurity Concept Inventory of the Cybersecurity Assessment Tools Project and from the Digital Logic Concept Inventory. We generated preliminary distractors by filtering responses, grouping similar responses, selecting the four most frequent groups, and refining a repre-sentative distractor for each of these groups. We analyzed our data in two ways. First, we compared the responses and resulting distractors with those from the aforementioned inventories. Second, we obtained feedback from Amazon Mechanical Turk on the resulting new draft test items (including distractors) from additional subjects. Challenges in using crowdsourcing include controlling the selection of subjects and filtering out re-sponses that do not reflect genuine effort. Despite these challenges, our results suggest that crowdsourcing can be a very useful tool in generating effective dis-tractors (attractive to subjects who do not understand the targeted concept). Our results also suggest that this method is faster, easier, and cheaper than is the tra-ditional method of having one or more experts draft distractors, building on talk-aloud interviews with subjects to uncover their misconceptions. Our results are significant because generating effective distractors is one of the most difficult steps in creating multiple-choice assessments. 
    more » « less
  3. We present and analyze results from a pilot study that explores how crowdsourcing can be used in the process of generating distractors (incorrect answer choices) in multiple-choice concept inventories (conceptual tests of under-standing). To our knowledge, we are the first to propose and study this approach. Using Amazon Mechanical Turk, we collected approximately 180 open-ended responses to several question stems from the Cybersecurity Concept Inventory of the Cybersecurity Assessment Tools Project and from the Digital Logic Concept Inventory. We generated preliminary distractors by filtering responses, grouping similar responses, selecting the four most frequent groups, and refining a repre-sentative distractor for each of these groups. We analyzed our data in two ways. First, we compared the responses and resulting distractors with those from the aforementioned inventories. Second, we obtained feedback from Amazon Mechanical Turk on the resulting new draft test items (including distractors) from additional subjects. Challenges in using crowdsourcing include controlling the selection of subjects and filtering out responses that do not reflect genuine effort. Despite these challenges, our results suggest that crowdsourcing can be a very useful tool in generating effective dis-tractors (attractive to subjects who do not understand the targeted concept). Our results also suggest that this method is faster, easier, and cheaper than is the traditional method of having one or more experts draft distractors, building on talk-aloud interviews with subjects to uncover their misconceptions. Our results are significant because generating effective distractors is one of the most difficult steps in creating multiple-choice assessments. 
    more » « less
  4. Confirmation bias is a type of cognitive bias that involves seeking and prioritizing information that conforms to a pre-existing view or hypothesis that can negatively affect the decision-making process. We investigate the manifestation and mitigation of confirmation bias with an emphasis on the use of visualization. In a series of Amazon Mechanical Turk studies, participants selected evidence that supported or refuted a given hypothesis. We demonstrated the presence of confirmation bias and investigated the use of five simple visual representations, using color, positional, and length encodings for mitigating this bias. We found that at worst, visualization had no effect in the amount of confirmation bias present, and at best, it was successful in mitigating the bias. We discuss these results in light of factors that can complicate visual debiasing in non-experts. 
    more » « less
  5. Teachable interfaces can enable end-users to personalize machine learning applications by explicitly providing a few training examples. They promise higher robustness in the real world by significantly constraining conditions of the learning task to a specific user and their environment. While facilitating user control, their effectiveness can be hindered by lack of expertise or misconceptions. Through a mobile teachable testbed in Amazon Mechanical Turk, we explore how non-experts conceptualize, experience, and reflect on their engagement with machine teaching in the context of object recognition. 
    more » « less