skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 31, 2026

Title: A Survey on the Recent Advancements in Human-Centered Dialog Systems
Dialog systems (e.g., chatbots) have been widely studied, yet related research that leverages artificial intelligence (AI) and natural language processing (NLP) is constantly evolving. These systems have typically been developed to interact with humans in the form of speech, visual, or text conversation. As humans continue to adopt dialog systems for various objectives, there is a need to involve humans in every facet of the dialog development life cycle for synergistic augmentation of both the humans and the dialog system actors in real-world settings. We provide a holistic literature survey on the recent advancements inhuman-centered dialog systems(HCDS). Specifically, we provide background context surrounding the recent advancements in machine learning-based dialog systems and human-centered AI. We then bridge the gap between the two AI sub-fields and organize the research works on HCDS under three major categories (i.e., Human-Chatbot Collaboration, Human-Chatbot Alignment, Human-Centered Chatbot Design & Governance). In addition, we discuss the applicability and accessibility of the HCDS implementations through benchmark datasets, application scenarios, and downstream NLP tasks.  more » « less
Award ID(s):
2006816 2007100
PAR ID:
10627006
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Computing Surveys
Volume:
57
Issue:
10
ISSN:
0360-0300
Page Range / eLocation ID:
1 to 36
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The prevalence and success of AI applications have been tempered by concerns about the controllability of AI systems about AI's impact on the future of work. These concerns reflect two aspects of a central question: how would humans work with AI systems? While research on AI safety focuses on designing AI systems that allow humans to safely instruct and control AI systems, research on AI and the future of work focuses on the impact of AI on humans who may be unable to do so. This Blue Sky Ideas paper proposes a unifying set of declarative principles that enable a more uniform evaluation of arbitrary AI systems along multiple dimensions of the extent to which they are suitable for use by specific classes of human operators. It leverages recent AI research and the unique strengths of the field to develop human-centric principles for AI systems that address the concerns noted above. 
    more » « less
  2. This paper presents an innovative testing framework, testFAILS, designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard and Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies. 
    more » « less
  3. Abstract Recent work has explored how complementary strengths of humans and artificial intelligence (AI) systems might be productively combined. However, successful forms of human–AI partnership have rarely been demonstrated in real‐world settings. We present the iterative design and evaluation of Lumilo, smart glasses that help teachers help their students in AI‐supported classrooms by presenting real‐time analytics about students’ learning, metacognition, and behavior. Results from a field study conducted in K‐12 classrooms indicate that students learn more when teachers and AI tutors work together during class. We discuss implications of this research for the design of human–AI partnerships. We argue for more participatory approaches to research and design in this area, in which practitioners and other stakeholders are deeply, meaningfully involved throughout the process. Furthermore, we advocate for theory‐building and for principled approaches to the study of human–AI decision‐making in real‐world contexts. 
    more » « less
  4. Abstract AI assistance is readily available to humans in a variety of decision-making applications. In order to fully understand the efficacy of such joint decision-making, it is important to first understand the human’s reliance on AI. However, there is a disconnect between how joint decision-making is studied and how it is practiced in the real world. More often than not, researchers ask humans to provide independent decisions before they are shown AI assistance. This is done to make explicit the influence of AI assistance on the human’s decision. We develop a cognitive model that allows us to infer thelatentreliance strategy of humans on AI assistance without asking the human to make an independent decision. We validate the model’s predictions through two behavioral experiments. The first experiment follows aconcurrentparadigm where humans are shown AI assistance alongside the decision problem. The second experiment follows asequentialparadigm where humans provide an independent judgment on a decision problem before AI assistance is made available. The model’s predicted reliance strategies closely track the strategies employed by humans in the two experimental paradigms. Our model provides a principled way to infer reliance on AI-assistance and may be used to expand the scope of investigation on human-AI collaboration. 
    more » « less
  5. Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 23 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and stateof-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work. 
    more » « less