skip to main content


Title: Combining human predictions with model probabilities via confusion matrices and calibration
An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.  more » « less
Award ID(s):
1927245
NSF-PAR ID:
10349255
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in Neural Information Processing Systems, 34
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human nor model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints. 
    more » « less
  2. Algorithmic case-based decision support provides examples to help human make sense of predicted labels and aid human in decision-making tasks. Despite the promising performance of supervised learning, representations learned by supervised models may not align well with human intuitions: what models consider as similar examples can be perceived as distinct by humans. As a result, they have limited effectiveness in case-based decision support. In this work, we incorporate ideas from metric learning with supervised learning to examine the importance of alignment for effective decision support. In addition to instance-level labels, we use human-provided triplet judgments to learn human-compatible decision-focused representations. Using both synthetic data and human subject experiments in multiple classification tasks, we demonstrate that such representation is better aligned with human perception than representation solely optimized for classification. Human-compatible representations identify nearest neighbors that are perceived as more similar by humans and allow humans to make more accurate predictions, leading to substantial improvements in human decision accuracies (17.8% in butterfly vs. moth classification and 13.2% in pneumonia classification). 
    more » « less
  3. Over the last century, direct human modification has been a major driver of coastal wetland degradation, resulting in widespread losses of wetland vegetation and a transition to open water. High-resolution satellite imagery is widely available for monitoring changes in present-day wetlands; however, understanding the rates of wetland vegetation loss over the last century depends on the use of historical panchromatic aerial photographs. In this study, we compared manual image thresholding and an automated machine learning (ML) method in detecting wetland vegetation and open water from historical panchromatic photographs in the Florida Everglades, a subtropical wetland landscape. We compared the same classes delineated in the historical photographs to 2012 multispectral satellite imagery and assessed the accuracy of detecting vegetation loss over a 72 year timescale (1940 to 2012) for a range of minimum mapping units (MMUs). Overall, classification accuracies were >95% across the historical photographs and satellite imagery, regardless of the classification method and MMUs. We detected a 2.3–2.7 ha increase in open water pixels across all change maps (overall accuracies > 95%). Our analysis demonstrated that ML classification methods can be used to delineate wetland vegetation from open water in low-quality, panchromatic aerial photographs and that a combination of images with different resolutions is compatible with change detection. The study also highlights how evaluating a range of MMUs can identify the effect of scale on detection accuracy and change class estimates as well as in determining the most relevant scale of analysis for the process of interest. 
    more » « less
  4. IEEE Open Journal of the Computer Society (Ed.)
    While neural networks have been achieving increasingly significant excitement in solving classification tasks such as natural language processing, their lack of interpretability becomes a great challenge for neural networks to be deployed in certain high-stakes human-centered applications. To address this issue, we propose a new approach for generating interpretable predictions by inferring a simple three-layer neural network with threshold activations, so that it can benefit from effective neural network training algorithms and at the same time, produce human-understandable explanations for the results. In particular, the hidden layer neurons in the proposed model are trained with floating point weights and binary output activations. The output neuron is also trainable as a threshold logic function that implements a disjunctive operation, forming the logical-OR of the first-level threshold logic functions. This neural network can be trained using state-of-the-art training methods to achieve high prediction accuracy. An important feature of the proposed architecture is that only a simple greedy algorithm is required to provide an explanation with the prediction that is human-understandable. In comparison with other explainable decision models, our proposed approach achieves more accurate predictions on a broad set of tabular data classification datasets. 
    more » « less
  5. Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybrid performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms. 
    more » « less