NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Online conformal prediction with decaying step sizes

Angelopoulos, Anastasios Nikolas; Barber, Rina Foygel; Bates, Stephen (July 2024, Proceedings of the 41st International Conference on Machine Learning)

We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level for every time point, not just on average over the observed sequence.
more » « less
Full Text Available
Learn then test: Calibrating predictive algorithms to achieve risk control

https://doi.org/10.1214/24-AOAS1998

Angelopoulos, Anastasios N; Bates, Stephen; Candès, Emmanuel J; Jordan, Michael I; Lei, Lihua (June 2025, The Annals of Applied Statistics)

Free, publicly-accessible full text available June 1, 2026
Delegating Data Collection in Decentralized Machine Learning

Ananthakrishnan, Nivasini; Bates, Stephen; Jordan, Michael; Haghtalab, Nika (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research)

Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal performance of any model. We show that a principal can cope with such asymmetry via simple linear contracts that achieve $$1-1/\epsilon$$ fraction of the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract. We also analyze the optimal utility and linear contracts for the more complex setting of multiple interactions.
more » « less
Full Text Available
Delegating Data Collection in Decentralized Machine Learning

Ananthakrishnan, Nivasini; Bates, Stephen; Jordan, Michael; Haghtalab, Nika (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research)

Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal performance of any model. We show that a principal can cope with such asymmetry via simple linear contracts that achieve $$1-1/\epsilon$$ fraction of the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract. We also analyze the optimal utility and linear contracts for the more complex setting of multiple interactions.
more » « less
Full Text Available
Cross-Validation: What Does It Estimate and How Well Does It Do It?

https://doi.org/10.1080/01621459.2023.2197686

Bates, Stephen; Hastie, Trevor; Tibshirani, Robert (May 2023, Journal of the American Statistical Association)

Full Text Available
Testing for outliers with conformal p-values

https://doi.org/10.1214/22-AOS2244

Bates, Stephen; Candès, Emmanuel; Lei, Lihua; Romano, Yaniv; Sesia, Matteo (February 2023, The Annals of Statistics)

Full Text Available
Private Prediction Sets

https://doi.org/10.1162/99608f92.16c71dad

Angelopoulos, Anastasios Nikolas; Bates, Stephen; Zrnic, Tijana; Jordan, Michael I. (April 2022, Harvard Data Science Review)

Full Text Available
Robust Calibration with Multi-domain Temperature Scaling

Yu, Yaodong; Bates, Stephen; Ma, Yi; Jordan, Michael (January 2022, Advances in neural information processing systems)

Full Text Available
Conformal prediction for the design problem

Fannjiang, Clara; Bates, Stephen; Angelopoulos, Anastasios N.; Listgarten, Jennifer; Jordan, Michael I. (January 2022, Proceedings of the National Academy of Sciences of the United States of America)

Full Text Available
False discovery rate control in genome-wide association studies with population structure

https://doi.org/10.1073/pnas.2105841118

Sesia, Matteo; Bates, Stephen; Candès, Emmanuel; Marchini, Jonathan; Sabatti, Chiara (October 2021, Proceedings of the National Academy of Sciences)

We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
more » « less
Full Text Available

« Prev Next »

Search for: All records