skip to main content

Title: Active Bayesian Assessment of Black-Box Classifiers
Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a critical need to both reliably assess the performance of these pre-trained models and to perform this assessment in a label-efficient manner (given that labels may be scarce and costly to collect). In this paper, we introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We begin by developing inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling, enabling better performance assessment with fewer labels. We demonstrate significant gains from our proposed active Bayesian approach via a series of systematic empirical experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.
; ; ;
Award ID(s):
1927245 1900644
Publication Date:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Active learning methods, like uncertainty sampling, combined with probabilistic prediction techniques have achieved success in various problems like image classification and text classification. For more complex multivariate prediction tasks, the relationships between labels play an important role in designing structured classifiers with better performance. However, computational time complexity limits prevalent probabilistic methods from effectively supporting active learning. Specifically, while non-probabilistic methods based on structured support vector ma-chines can be tractably applied to predicting cuts and bipartite matchings, conditional random fields are intractable for these structures. We propose an adversarial approach for active learning with structured prediction domains that is tractable for cuts and matching. We evaluate this approach algorithmically in two important structured prediction problems: multi-label classification and object tracking in videos. We demonstrate better accuracy and computational efficiency for our proposed method.
  2. Abstract Neural state classification (NSC) is a recently proposed method for runtime predictive monitoring of hybrid automata (HA) using deep neural networks (DNNs). NSC trains a DNN as an approximate reachability predictor that labels an HA state x as positive if an unsafe state is reachable from x within a given time bound, and labels x as negative otherwise. NSC predictors have very high accuracy, yet are prone to prediction errors that can negatively impact reliability. To overcome this limitation, we present neural predictive monitoring (NPM), a technique that complements NSC predictions with estimates of the predictive uncertainty. These measures yield principled criteria for the rejection of predictions likely to be incorrect, without knowing the true reachability values. We also present an active learning method that significantly reduces the NSC predictor’s error rate and the percentage of rejected predictions. We develop two versions of NPM based, respectively, on the use of frequentist and Bayesian techniques to learn the predictor and the rejection rule. Both versions are highly efficient, with computation times on the order of milliseconds, and effective, managing in our experimental evaluation to successfully reject almost all incorrect predictions. In our experiments on a benchmark suite of six hybridmore »systems, we found that the frequentist approach consistently outperforms the Bayesian one. We also observed that the Bayesian approach is less practical, requiring a careful and problem-specific choice of hyperparameters.« less
  3. Semi-supervised learning uses underlying relationships in data with a scarcity of ground-truth labels. In this paper, we introduce an uncertainty quantification (UQ) method for graph-based semi-supervised multi-class classification problems. We not only predict the class label for each data point, but also provide a confidence score for the prediction. We adopt a Bayesian approach and propose a graphical multi-class probit model together with an effective Gibbs sampling procedure. Furthermore, we propose a confidence measure for each data point that correlates with the classification performance. We use the empirical properties of the proposed confidence measure to guide the design of a humanin-the-loop system. The uncertainty quantification algorithm and the human-in-the-loop system are successfully applied to classification problems in image processing and ego-motion analysis of body-worn videos.
  4. Gorodkin, Jan (Ed.)
    Abstract Motivation When learning to subtype complex disease based on next-generation sequencing data, the amount of available data is often limited. Recent works have tried to leverage data from other domains to design better predictors in the target domain of interest with varying degrees of success. But they are either limited to the cases requiring the outcome label correspondence across domains or cannot leverage the label information at all. Moreover, the existing methods cannot usually benefit from other information available a priori such as gene interaction networks. Results In this article, we develop a generative optimal Bayesian supervised domain adaptation (OBSDA) model that can integrate RNA sequencing (RNA-Seq) data from different domains along with their labels for improving prediction accuracy in the target domain. Our model can be applied in cases where different domains share the same labels or have different ones. OBSDA is based on a hierarchical Bayesian negative binomial model with parameter factorization, for which the optimal predictor can be derived by marginalization of likelihood over the posterior of the parameters. We first provide an efficient Gibbs sampler for parameter inference in OBSDA. Then, we leverage the gene-gene network prior information and construct an informed and flexible variationalmore »family to infer the posterior distributions of model parameters. Comprehensive experiments on real-world RNA-Seq data demonstrate the superior performance of OBSDA, in terms of accuracy in identifying cancer subtypes by utilizing data from different domains. Moreover, we show that by taking advantage of the prior network information we can further improve the performance. Availability and implementation The source code for implementations of OBSDA and SI-OBSDA are available at the following link. Supplementary information Supplementary data are available at Bioinformatics online.« less
  5. Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. We develop a Bayesian framework for combining the predictions and different types of confidence scores from humans and machines. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. We apply this framework to a large-scale dataset where humans and a variety of convolutional neural networks perform the same challenging image classification task. We show empirically and theoretically that complementarity can be achieved even if the human and machine classifiers perform at different accuracy levels as long as these accuracy differences fall within a bound determined by the latent correlation between human and machine classifier confidence scores. In addition, we demonstrate that hybrid human–machine performance can be improved by differentiating between the errors that humans and machine classifiers make across different class labels. Finally, our results show that eliciting and including human confidence ratings improve hybridmore »performance in the Bayesian combination model. Our approach is applicable to a wide variety of classification problems involving human and machine algorithms.« less