skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Corrected generalized cross-validation for finite ensembles of penalized estimators
Abstract Generalized cross-validation (GCV) is a widely used method for estimating the squared out-of-sample prediction risk that employs scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.  more » « less
Award ID(s):
1945428
PAR ID:
10542681
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
87
Issue:
2
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 289-318
Size(s):
p. 289-318
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $$\hat\beta(y,X)$$ where differentiation is taken with respect to both X and y; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residuals in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of the $$\hat\beta$$ and the effective degrees of freedom of the M-estimators, which are of independent interest. 
    more » « less
  2. Summary The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multi-parameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to pre-existing methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models. 
    more » « less
  3. This paper addresses the deconvolution problem of estimating a square-integrable probability density from observations contaminated with additive measurement errors having a known density. The estimator begins with a density estimate of the contaminated observations and minimizes a reconstruction error penalized by an integrated squared m-th derivative. Theory for deconvolution has mainly focused on kernel- or wavelet-based techniques, but other methods including spline-based techniques and this smoothnesspenalized estimator have been found to outperform kernel methods in simulation studies. This paper fills in some of these gaps by establishing asymptotic guarantees for the smoothness-penalized approach. Consistency is established in mean integrated squared error, and rates of convergence are derived for Gaussian, Cauchy, and Laplace error densities, attaining some lower bounds already in the literature. The assumptions are weak for most results; the estimator can be used with a broader class of error densities than the deconvoluting kernel. Our application example estimates the density of the mean cytotoxicity of certain bacterial isolates under random sampling; this mean cytotoxicity can only be measured experimentally with additive error, leading to the deconvolution problem. We also describe a method for approximating the solution by a cubic spline, which reduces to a quadratic program. 
    more » « less
  4. We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size n and number of features p are large, and n/p can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as n,pā†’āˆž, even when the dimension p of the feature vectors is comparable with or greater than the sample size n. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO. 
    more » « less
  5. We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size n and number of features p are large, and n/p can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as n,pā†’āˆž, even when the dimension p of the feature vectors is comparable with or greater than the sample size n. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO. 
    more » « less