skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 11, 2026

Title: A No Free Lunch Theorem for Human-AI Collaboration
The gold standard in human-AI collaboration is complementarity: when combined performance exceeds both the human and algorithm alone. We investigate this challenge in binary classification settings where the goal is to maximize 0-1 accuracy. Given two or more agents who can make calibrated probabilistic predictions, we show a No Free Lunch-style result. Any deterministic collaboration strategy (a function mapping calibrated probabilities into binary classifications) that does not essentially always defer to the same agent will sometimes perform worse than the least accurate agent. In other words, complementarity cannot be achieved for free. The result does suggest one model of collaboration with guarantees, where one agent identifies obvious errors of the other agent. We also use the result to understand the necessary conditions enabling the success of other collaboration techniques, providing guidance to human-AI collaboration.  more » « less
Award ID(s):
2339427
PAR ID:
10616972
Author(s) / Creator(s):
; ;
Publisher / Repository:
Proceedings of the AAAI Conference on Artificial Intelligence
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
39
Issue:
13
ISSN:
2159-5399
Page Range / eLocation ID:
14369 to 14376
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Dynamic conversational agent-based support for collaborative learning has shown significant positive effects on learning over no-support or static-support control conditions in prior studies. In order to understand the boundary between human-led and AI-led support for collaboration, we compare in this study an approach where the agent’s primary role is to help students regulate their own collaboration with two more typical prompting strategies that are used only during a reflection phase: one designed to provide a specific informational focus for the reflection, and the other designed to draw out evaluation, elaboration, and exploration of alternative perspectives. Significant positive effects on learning over and above just the human-led form of support are observed when either of the prompting strategies are used. 
    more » « less
  2. In order to achieve effective human-AI collaboration, it is necessary for an AI agent to align its behavior with the human's expectations. When the agent generates a task plan without such considerations, it may often result in inexplicable behavior from the human's point of view. This may have serious implications for the human, from increased cognitive load to more serious concerns of safety around the physical agent. In this work, we present an approach to generate explicable behavior by minimizing the distance between the agent's plan and the plan expected by the human. To this end, we learn a mapping between plan distances (distances between expected and agent plans) and human's plan scoring scheme. The plan generation process uses this learned model as a heuristic. We demonstrate the effectiveness of our approach in a delivery robot domain. 
    more » « less
  3. Artificial intelligence (AI) has the potential to improve human decision-making by providing decision recommendations and problem-relevant information to assist human decision-makers. However, the full realization of the potential of human–AI collaboration continues to face several challenges. First, the conditions that support complementarity (i.e., situations in which the performance of a human with AI assistance exceeds the performance of an unassisted human or the AI in isolation) must be understood. This task requires humans to be able to recognize situations in which the AI should be leveraged and to develop new AI systems that can learn to complement the human decision-maker. Second, human mental models of the AI, which contain both expectations of the AI and reliance strategies, must be accurately assessed. Third, the effects of different design choices for human-AI interaction must be understood, including both the timing of AI assistance and the amount of model information that should be presented to the human decision-maker to avoid cognitive overload and ineffective reliance strategies. In response to each of these three challenges, we present an interdisciplinary perspective based on recent empirical and theoretical findings and discuss new research directions. 
    more » « less
  4. Human-AI collaboration is an increasingly commonplace part of decision-making in real world applications. However, how humans behave when collaborating with AI is not well understood. We develop metacognitive bandits, a computational model of a human's advice-seeking behavior when working with an AI. The model describes a person's metacognitive process of deciding when to rely on their own judgment and when to solicit the advice of the AI. It also accounts for the difficulty of each trial in making the decision to solicit advice. We illustrate that the metacognitive bandit makes decisions similar to humans in a behavioral experiment. We also demonstrate that algorithm aversion, a widely reported bias, can be explained as the result of a quasi-optimal sequential decision-making process. Our model does not need to assume any prior biases towards AI to produce this behavior. 
    more » « less
  5. Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybrid performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms. 
    more » « less