A Rademacher Complexity Based Method for Controlling Power and Confidence Level in Adaptive Statistical Analysis

De Stefani, Lorenzo; Upfal, Eli

doi:10.1109/DSAA.2019.00021

Citation Details

A Rademacher Complexity Based Method for Controlling Power and Confidence Level in Adaptive Statistical Analysis

While standard statistical inference techniques and machine learning generalization bounds assume that tests are run on data selected independently of the hypotheses, practical data analysis and machine learning are usually iterative and adaptive processes where the same holdout data is often used for testing a sequence of hypotheses (or models), which may each depend on the outcome of the previous tests on the same data. In this work, we present RADABOUND a rigorous, efficient and practical procedure for controlling the generalization error when using a holdout sample for multiple adaptive testing. Our solution is based on a new application of the Rademacher Complexity generalization bounds, adapted to dependent tests. We demonstrate the statistical power and practicality of our method through extensive simulations and comparisons to alternative approaches. In particular, we show that our rigorous solution is a substantially more powerful and efficient than the differential privacy based approach proposed in Dwork et al. [1]-[3]. more »

Award ID(s):: 1813444

PAR ID:: 10183277

Author(s) / Creator(s):: De Stefani, Lorenzo; Upfal, Eli

Date Published:: 2019-10-01

Journal Name:: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Page Range / eLocation ID:: 71 to 80

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/DSAA.2019.00021

More Like this