Abstract Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This article presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction. 
                        more » 
                        « less   
                    This content will become publicly available on March 6, 2026
                            
                            Assessing Classification Models of Pharmaceuticals With Conformal Prediction
                        
                    
    
            ABSTRACT Conformal predictions transform a measurable, heuristic notion of uncertainty into statistically valid confidence intervals such that, for a future sample, the true class prediction will be included in the conformal prediction set at a predetermined confidence. In a Bayesian perspective, common estimates of uncertainty in multivariate classification, namelyp‐values, only provide the probability that the data fits the presumed class model,P(D|M). Conformal predictions, on the other hand, address the more meaningful probability that a model fits the data,P(M|D). Herein, two methods to perform inductive conformal predictions are investigated—the traditional Split Conformal Prediction that uses an external calibration set and a novel Bagged Conformal Prediction, closely related to Cross Conformal Predictions, that utilizes bagging to calibrate the heuristic notions of uncertainty. Methods for preprocessing the conformal prediction scores to improve performance are discussed and investigated. These conformal prediction strategies are applied to identifying four non‐steroidal anti‐inflammatory drugs (NSAIDs) from hyperspectral Raman imaging data. In addition to assigning meaningful confidence intervals on the model results, we herein demonstrate how conformal predictions can add additional diagnostics for model quality and method stability. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2003839
- PAR ID:
- 10576958
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Journal of Chemometrics
- Volume:
- 39
- Issue:
- 3
- ISSN:
- 0886-9383
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract To assess the effect of uncertainties in solar wind driving on the predictions from the operational configuration of the Space Weather Modeling Framework, we have developed a nonparametric method for generating multiple possible realizations of the solar wind just upstream of the bow shock, based on observations near the first Lagrangian point. We have applied this method to the solar wind inputs at the upstream boundary of Space Weather Modeling Framework and have simulated the geomagnetic storm of 5 April 2010. We ran a 40‐member ensemble for this event and have used this ensemble to quantify the uncertainty in the predicted Sym‐H index and ground magnetic disturbances due to the uncertainty in the upstream boundary conditions. Both the ensemble mean and the unperturbed simulation tend to underpredict the magnitude of Sym‐H in the quiet interval before the storm and overpredict in the storm itself, consistent with previous work. The ensemble mean is a more accurate predictor of Sym‐H, improving the mean absolute error by nearly 2 nT for this interval and displaying a smaller bias. We also examine the uncertainty in predicted maxima in ground magnetic disturbances. The confidence intervals are typically narrow during periods where the predicted dBH/dtis low. The confidence intervals are often much wider where the median prediction is for enhanced dBH/dt. The ensemble also allows us to identify intervals of activity that cannot be explained by uncertainty in the solar wind driver, driving further model improvements. This work demonstrates the feasibility and importance of ensemble modeling for space weather applications.more » « less
- 
            Conformal prediction is a powerful tool to generate uncertainty sets with guaranteed coverage using any predictive model, under the assumption that the training and test data are i.i.d.. Recently, it has been shown that adversarial examples are able to manipulate conformal methods to construct prediction sets with invalid coverage rates, as the i.i.d. assumption is violated. To address this issue, a recent work, Randomized Smoothed Conformal Prediction (RSCP), was first proposed to certify the robustness of conformal prediction methods to adversarial noise. However, RSCP has two major limitations: (i) its robustness guarantee is flawed when used in practice and (ii) it tends to produce large uncertainty sets. To address these limitations, we first propose a novel framework called RSCP+ to provide provable robustness guarantee in evaluation, which fixes the issues in the original RSCP method. Next, we propose two novel methods, Post-Training Transformation (PTT) and Robust Conformal Training (RCT), to effectively reduce prediction set size with little computation overhead. Experimental results in CIFAR10, CIFAR100, and ImageNet suggest the baseline method only yields trivial predictions including full label set, while our methods could boost the efficiency by up to 4.36×, 5.46×, and 16.9× respectively and provide practical robustness guarantee.more » « less
- 
            In regression problems where there is no known true underlying model, conformal prediction methods enable prediction intervals to be constructed without any assumptions on the distribution of the underlying data, except that the training and test data are assumed to be exchangeable. However, these methods bear a heavy computational cost—and, to be carried out exactly, the regression algorithm would need to be fitted infinitely many times. In practice, the conformal prediction method is run by simply considering only a finite grid of finely spaced values for the response variable. This paper develops discretized conformal prediction algorithms that are guaranteed to cover the target value with the desired probability and that offer a trade‐off between computational cost and prediction accuracy. Copyright © 2018 John Wiley & Sons, Ltd.more » « less
- 
            ABSTRACT Determining if target samples are members of a particular source class of samples has a large variety of applications within many disciplines. In particular, one‐class classification (OCC) is essential in many areas, such as food contamination or product authentication. There are numerous widely accepted methods for OCC, but these OCC methods involve optimizing tuning parameters such as the number of principal components (PCs). This study presents the development and application of a rigorous autonomous OCC process based on a hybrid fusion consensus technique, termed consensus OCC (Con OCC). The Con OCC method uses the new physicochemical responsive integrated similarity measure (PRISM) composed of multiple similarity measures all independent of optimization. Similarity values are fused to a single value describing the degree of sample similarity to a collection of samples. Two approaches are developed to translate each sample‐wise PRISM value to a probability of class membership: conformal predictionp‐values andz‐scores. These two methods are evaluated as separate Con OCC processes using seven datasets measured across a variety of instruments. In both cases, class membership labels are not used to set decision thresholds, and classifiers are not optimized relative to respective tuning parameters. Results indicate thatz‐scoring often produces better results, but conformal prediction provides greater consistency across datasets. That is,z‐score values tend to range across datasets while conformal predictionp‐values do not.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
