skip to main content


This content will become publicly available on December 26, 2024

Title: The HulC: confidence regions from convex hulls
We develop and analyse the HulC, an intuitive and general method for constructing confidence sets using the convex hull of estimates constructed from subsets of the data. Unlike classical methods which are based on estimating the (limiting) distribution of an estimator, the HulC is often simpler to use and effectively bypasses this step. In comparison to the bootstrap, the HulC requires fewer regularity conditions and succeeds in many examples where the bootstrap provably fails. Unlike sub-sampling, the HulC does not require knowledge of the rate of convergence of the estimators on which it is based. The validity of the HulC requires knowledge of the (asymptotic) median bias of the estimators. We further analyse a variant of our basic method, called the Adaptive HulC, which is fully data-driven and estimates the median bias using sub-sampling. We discuss these methods in the context of several challenging inferential problems which arise in parametric, semi-parametric, and non-parametric inference. Although our focus is on validity under weak regularity conditions, we also provide some general results on the width of the HulC confidence sets, showing that in many cases the HulC confidence sets have near-optimal width.  more » « less
Award ID(s):
2113684
NSF-PAR ID:
10524531
Author(s) / Creator(s):
; ;
Publisher / Repository:
Royal Statistical Society
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
86
Issue:
3
ISSN:
1369-7412
Page Range / eLocation ID:
586 to 622
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The goal of this paper is to develop a practical and general-purpose approach to construct confidence intervals for differentially private parametric estimation. We find that the parametric bootstrap is a simple and effective solution. It cleanly reasons about variability of both the data sample and the randomized privacy mechanism and applies "out of the box" to a wide class of private estimation routines. It can also help correct bias caused by clipping data to limit sensitivity. We prove that the parametric bootstrap gives consistent confidence intervals in two broadly relevant settings, including a novel adaptation to linear regression that avoids accessing the covariate data multiple times. We demonstrate its effectiveness for a variety of estimators, and find empirically that it provides confidence intervals with good coverage even at modest sample sizes and performs better than alternative approaches. 
    more » « less
  2. Summary

    We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed by using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.

     
    more » « less
  3. Summary

    We derive non-parametric confidence intervals for the eigenvalues of the Hessian at modes of a density estimate. This provides information about the strength and shape of modes and can also be used as a significance test. We use a data-splitting approach in which potential modes are identified by using the first half of the data and inference is done with the second half of the data. To obtain valid confidence sets for the eigenvalues, we use a bootstrap based on an elementary symmetric polynomial transformation. This leads to valid bootstrap confidence sets regardless of any multiplicities in the eigenvalues. We also suggest a new method for bandwidth selection, namely choosing the bandwidth to maximize the number of significant modes. We show by example that this method works well. Even when the true distribution is singular, and hence does not have a density (in which case cross-validation chooses a zero bandwidth), our method chooses a reasonable bandwidth.

     
    more » « less
  4. This paper highlights a tension between semiparametric efficiency and bootstrap consistency in the context of a canonical semiparametric estimation problem, namely the problem of estimating the average density. It is shown that although simple plug-in estimators suffer from bias problems preventing them from achieving semiparametric efficiency under minimal smoothness conditions, the nonparametric bootstrap automatically corrects for this bias and that, as a result, these seemingly inferior estimators achieve bootstrap consistency under minimal smoothness conditions. In contrast, several “debiased” estimators that achieve semiparametric efficiency under minimal smoothness conditions do not achieve bootstrap consistency under those same conditions. 
    more » « less
  5. We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call “the split likelihood-ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.

     
    more » « less