Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available July 30, 2025
-
Free, publicly-accessible full text available July 27, 2025
-
Free, publicly-accessible full text available July 10, 2025
-
Agrawal, Shipra ; Roth, Aaron (Ed.)Free, publicly-accessible full text available July 3, 2025
-
Free, publicly-accessible full text available May 15, 2025
-
Free, publicly-accessible full text available December 30, 2024
-
Free, publicly-accessible full text available December 30, 2024
-
Abstract Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterising the fit of the model to the underlying conditional law of labels given the features vector (Y∣X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y∣X and treats that as a black-box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form H0:E[Df(Bern(η(X))‖Bern(η^(X)))]≤τ where Df represents an f-divergence function, and η(x), η^(x), respectively, denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called Goodness-of-fit with Randomisation and Scoring Procedure (GRASP) for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X GRASP designed for model-X settings where the joint distribution of the features vector is known. Model-X GRASP uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.