skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, April 12 until 2:00 AM ET on Saturday, April 13 due to maintenance. We apologize for the inconvenience.

Title: Comparing Experts and Novices for AI Data Work: Insights on Allocating Human Intelligence to Design a Conversational Agent
Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI.  more » « less
Award ID(s):
2001851 1952085 1939606 2000782
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
Page Range / eLocation ID:
195 to 206
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, the popularity of AI-enabled conversational agents or chatbots has risen as an alternative to traditional online surveys to elicit information from people. However, there is a gap in using single-agent chatbots to converse and gather multi-faceted information across a wide variety of topics. Prior works suggest that single-agent chatbots struggle to understand user intentions and interpret human language during a multi-faceted conversation. In this work, we investigated how multi-agent chatbot systems can be utilized to conduct a multi-faceted conversation across multiple domains. To that end, we conducted a Wizard of Oz study to investigate the design of a multi-agent chatbot for gathering public input across multiple high-level domains and their associated topics. Next, we designed, developed, and evaluated CommunityBots - a multi-agent chatbot platform where each chatbot handles a different domain individually. To manage conversation across multiple topics and chatbots, we proposed a novel Conversation and Topic Management (CTM) mechanism that handles topic-switching and chatbot-switching based on user responses and intentions. We conducted a between-subject study comparing CommunityBots to a single-agent chatbot baseline with 96 crowd workers. The results from our evaluation demonstrate that CommunityBots participants were significantly more engaged, provided higher quality responses, and experienced fewer conversation interruptions while conversing with multiple different chatbots in the same session. We also found that the visual cues integrated with the interface helped the participants better understand the functionalities of the CTM mechanism, which enabled them to perceive changes in textual conversation, leading to better user satisfaction. Based on the empirical insights from our study, we discuss future research avenues for multi-agent chatbot design and its application for rich information elicitation.

    more » « less
  2. As AI-based face recognition technologies are increasingly adopted for high-stakes applications like locating suspected criminals, public concerns about the accuracy of these technologies have grown as well. These technologies often present a human expert with a shortlist of high-confidence candidate faces from which the expert must select correct match(es) while avoiding false positives, which we term the “last-mile problem.” We propose Second Opinion, a web-based software tool that employs a novel crowdsourcing workflow inspired by cognitive psychology, seed-gather-analyze, to assist experts in solving the last-mile problem. We evaluated Second Opinion with a mixed-methods lab study involving 10 experts and 300 crowd workers who collaborate to identify people in historical photos. We found that crowds can eliminate 75% of false positives from the highest-confidence candidates suggested by face recognition, and that experts were enthusiastic about using Second Opinion in their work. We also discuss broader implications for crowd–AI interaction and crowdsourced person identification. 
    more » « less
  3. null (Ed.)
    Crowdsourced content creation like articles or slogans can be powered by crowds of volunteers or workers from paid task markets. Volunteers often have expertise and are intrinsically motivated, but are a limited resource, and are not always reliably available. On the other hand, paid crowd workers are reliably available, can be guided to produce high-quality content, but cost money. How can these different populations of crowd workers be leveraged together to power cost-effective yet high-quality crowd-powered content-creation systems? To answer this question, we need to understand the strengths and weaknesses of each. We conducted an online study where we hired paid crowd workers and recruited volunteers from social media to complete three content creation tasks for three real-world non-profit organizations that focus on empowering women. These tasks ranged in complexity from simply generating keywords or slogans to creating a draft biographical article. Our results show that paid crowds completed work and structured content following editorial guidelines more effectively. However, volunteer crowds provide content that is more original. Based on the findings, we suggest that crowd-powered content-creation systems could gain the best of both worlds by leveraging volunteers to scaffold the direction that original content should take; while having paid crowd workers structure content and prepare it for real world use. 
    more » « less
  4. null (Ed.)
    The artificial intelligence (AI) industry has created new jobs that are essential to the real world deployment of intelligent systems. Part of the job focuses on labeling data for machine learning models or having workers complete tasks that AI alone cannot do. These workers are usually known as ‘crowd workers’—they are part of a large distributed crowd that is jointly (but separately) working on the tasks although they are often invisible to end-users, leading to workers often being paid below minimum wage and having limited career growth. In this chapter, we draw upon the field of human–computer interaction to provide research methods for studying and empowering crowd workers. We present our Computational Worker Leagues which enable workers to work towards their desired professional goals and also supply quantitative information about crowdsourcing markets. This chapter demonstrates the benefits of this approach and highlights important factors to consider when researching the experiences of crowd workers. 
    more » « less
  5. AI-based educational technologies may be most welcome in classrooms when they align with teachers' goals, preferences, and instructional practices. Teachers, however, have scarce time to make such customizations themselves. How might the crowd be leveraged to help time-strapped teachers? Crowdsourcing pipelines have traditionally focused on content generation. It is an open question how a pipeline might be designed so the crowd can succeed in a revision/customization task. In this paper, we explore an initial version of a teacher-guided crowdsourcing pipeline designed to improve the adaptive math hints of an AI-based tutoring system so they fit teachers' preferences, while requiring minimal expert guidance. In two experiments involving 144 math teachers and 481 crowdworkers, we found that such an expert-guided revision pipeline could save experts' time and produce better crowd-revised hints (in terms of teacher satisfaction) than two comparison conditions. The revised hints however, did not improve on the existing hints in the AI tutor, which were carefully-written but still have room for improvement and customization. Further analysis revealed that the main challenge for crowdworkers may lie in understanding teachers' brief written comments and implementing them in the form of effective edits, without introducing new problems. We also found that teachers preferred their own revisions over other sources of hints, and exhibited varying preferences for hints. Overall, the results confirm that there is a clear need for customizing hints to individual teachers' preferences. They also highlight the need for more elaborate scaffolds so the crowd can have specific knowledge of the requirements that teachers have for hints. The study represents a first exploration in the literature of how to support crowds with minimal expert guidance in revising and customizing instructional materials. 
    more » « less