Prediction and Outlier Detection in Classification Problems

Guan, Leying; Tibshirani, Robert

doi:10.1111/rssb.12443

Citation Details

Prediction and Outlier Detection in Classification Problems

Abstract

We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction sets). BCOPS constructs a prediction set C(x) as a subset of class labels, possibly empty. It tries to optimize the out-of-sample performance, aiming to include the correct class and to detect outliers x as often as possible. BCOPS returns no prediction (corresponding to C(x) equal to the empty set) if it infers x to be an outlier. The proposed method combines supervised learning algorithms with conformal prediction to minimize a misclassification loss averaged over the out-of-sample distribution. The constructed prediction sets have a finite sample coverage guarantee without distributional assumptions. We also propose a method to estimate the outlier detection rate of a given procedure. We prove asymptotic consistency and optimality of our proposals under suitable assumptions and illustrate our methods on real data examples.

NSF-PAR ID:: 10398638

Author(s) / Creator(s):: Guan, Leying; Tibshirani, Robert

Publisher / Repository:: Oxford University Press

Date Published:: 2022-02-15

Journal Name:: Journal of the Royal Statistical Society Series B: Statistical Methodology

Volume:: 84

Issue:: 2

ISSN:: 1369-7412

Format(s):: Medium: X Size: p. 524-546

Size(s):: ["p. 524-546"]

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1111/rssb.12443

More Like this