skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bayesian modeling of human–AI complementarity
Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybrid performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms.  more » « less
Award ID(s):
1927245 1900644
PAR ID:
10349250
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
11
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Keathley, H.; Enos, J.; Parrish, M. (Ed.)
    The role of human-machine teams in society is increasing, as big data and computing power explode. One popular approach to AI is deep learning, which is useful for classification, feature identification, and predictive modeling. However, deep learning models often suffer from inadequate transparency and poor explainability. One aspect of human systems integration is the design of interfaces that support human decision-making. AI models have multiple types of uncertainty embedded, which may be difficult for users to understand. Humans that use these tools need to understand how much they should trust the AI. This study evaluates one simple approach for communicating uncertainty, a visual confidence bar ranging from 0-100%. We perform a human-subject online experiment using an existing image recognition deep learning model to test the effect of (1) providing single vs. multiple recommendations from the AI and (2) including uncertainty information. For each image, participants described the subject in an open textbox and rated their confidence in their answers. Performance was evaluated at four levels of accuracy ranging from the same as the image label to the correct category of the image. The results suggest that AI recommendations increase accuracy, even if the human and AI have different definitions of accuracy. In addition, providing multiple ranked recommendations, with or without the confidence bar, increases operator confidence and reduces perceived task difficulty. More research is needed to determine how people approach uncertain information from an AI system and develop effective visualizations for communicating uncertainty. 
    more » « less
  2. An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints. 
    more » « less
  3. An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints. 
    more » « less
  4. The gold standard in human-AI collaboration is complementarity: when combined performance exceeds both the human and algorithm alone. We investigate this challenge in binary classification settings where the goal is to maximize 0-1 accuracy. Given two or more agents who can make calibrated probabilistic predictions, we show a No Free Lunch-style result. Any deterministic collaboration strategy (a function mapping calibrated probabilities into binary classifications) that does not essentially always defer to the same agent will sometimes perform worse than the least accurate agent. In other words, complementarity cannot be achieved for free. The result does suggest one model of collaboration with guarantees, where one agent identifies obvious errors of the other agent. We also use the result to understand the necessary conditions enabling the success of other collaboration techniques, providing guidance to human-AI collaboration. 
    more » « less
  5. Abstract The classification of variable objects provides insight into a wide variety of astrophysics ranging from stellar interiors to galactic nuclei. The Zwicky Transient Facility (ZTF) provides time-series observations that record the variability of more than a billion sources. The scale of these data necessitates automated approaches to make a thorough analysis. Building on previous work, this paper reports the results of the ZTF Source Classification Project (SCoPe), which trains neural network and XGBoost (XGB) machine-learning (ML) algorithms to perform dichotomous classification of variable ZTF sources using a manually constructed training set containing 170,632 light curves. We find that several classifiers achieve high precision and recall scores, suggesting the reliability of their predictions for 209,991,147 light curves across 77 ZTF fields. We also identify the most important features for XGB classification and compare the performance of the two ML algorithms, finding a pattern of higher precision among XGB classifiers. The resulting classification catalog is available to the public, and the software developed forSCoPeis open source and adaptable to future time-domain surveys. 
    more » « less