skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Confidence intervals for the Cox model test error from cross‐validation
Summary Cross‐validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.  more » « less
Award ID(s):
2113389
PAR ID:
10552981
Author(s) / Creator(s):
;
Publisher / Repository:
Statistics in Medicine
Date Published:
Journal Name:
Statistics in Medicine
Volume:
42
Issue:
25
ISSN:
0277-6715
Page Range / eLocation ID:
4532 to 4541
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many machine learning models have tuning parameters to be determined by the training data, and cross‐validation (CV) is perhaps the most commonly used method for selecting tuning parameters. This work concerns the problem of estimating the generalization error of a CV‐tuned predictive model. We propose to use an honest leave‐one‐out cross‐validation framework to produce a nearly unbiased estimator of the post‐tuning generalization error. By using the kernel support vector machine and the kernel logistic regression as examples, we demonstrate that the honest leave‐one‐out cross‐validation has very competitive performance even when competing with the state‐of‐the‐art .632+ estimator. 
    more » « less
  2. We propose a method for constructing confidence intervals that account for many forms of spatial correlation. The interval has the familiar “estimator plus and minus a standard error times a critical value” form, but we propose new methods for constructing the standard error and the critical value. The standard error is constructed using population principal components from a given “worst‐case” spatial correlation model. The critical value is chosen to ensure coverage in a benchmark parametric model for the spatial correlations. The method is shown to control coverage in finite sample Gaussian settings in a restricted but nonparametric class of models and in large samples whenever the spatial correlation is weak, that is, with average pairwise correlations that vanish as the sample size gets large. We also provide results on the efficiency of the method. 
    more » « less
  3. Abstract Summary The accurate estimation of prediction errors in time series is an important problem. It immediately affects the accuracy of prediction intervals but also the quality of a number of widely used time series model selection criteria such as AIC and others. Except for simple cases, however, it is difficult or even infeasible to obtain exact analytical expressions for one-step and multi-step predictions. This may be one of the reasons that, unlike in the independent case (see Efron, 2004), until today there has been no fully established methodology for time series prediction error estimation. Starting from an approximation to the bias-variance decomposition of the squared prediction error, this work is therefore concerned with the estimation of prediction errors in both univariate and multivariate stationary time series. In particular, several estimates are developed for a general class of predictors that includes most of the popular linear, nonlinear, parametric and nonparametric time series models used in practice, where causal invertible ARMA and nonparametric AR processes are discussed as lead examples. Simulation results indicate that the proposed estimators perform quite well in finite samples. The estimates may also be used for model selection when the purpose of modeling is prediction. 
    more » « less
  4. Abstract In this paper, we propose a flexible nested error regression small area model with high-dimensional parameter that incorporates heterogeneity in regression coefficients and variance components. We develop a new robust small area-specific estimating equations method that allows appropriate pooling of a large number of areas in estimating small area-specific model parameters. We propose a parametric bootstrap and jackknife method to estimate not only the mean squared errors but also other commonly used uncertainty measures such as standard errors and coefficients of variation. We conduct both model-based and design-based simulation experiments and real-life data analysis to evaluate the proposed methodology. 
    more » « less
  5. Abstract Machine learning (ML) has been applied to space weather problems with increasing frequency in recent years, driven by an influx of in-situ measurements and a desire to improve modeling and forecasting capabilities throughout the field. Space weather originates from solar perturbations and is comprised of the resulting complex variations they cause within the numerous systems between the Sun and Earth. These systems are often tightly coupled and not well understood. This creates a need for skillful models with knowledge about the confidence of their predictions. One example of such a dynamical system highly impacted by space weather is the thermosphere, the neutral region of Earth’s upper atmosphere. Our inability to forecast it has severe repercussions in the context of satellite drag and computation of probability of collision between two space objects in low Earth orbit (LEO) for decision making in space operations. Even with (assumed) perfect forecast of model drivers, our incomplete knowledge of the system results in often inaccurate thermospheric neutral mass density predictions. Continuing efforts are being made to improve model accuracy, but density models rarely provide estimates of confidence in predictions. In this work, we propose two techniques to develop nonlinear ML regression models to predict thermospheric density while providing robust and reliable uncertainty estimates: Monte Carlo (MC) dropout and direct prediction of the probability distribution, both using the negative logarithm of predictive density (NLPD) loss function. We show the performance capabilities for models trained on both local and global datasets. We show that the NLPD loss provides similar results for both techniques but the direct probability distribution prediction method has a much lower computational cost. For the global model regressed on the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, we achieve errors of approximately 11% on independent test data with well-calibrated uncertainty estimates. Using an in-situ CHAllenging Minisatellite Payload (CHAMP) density dataset, models developed using both techniques provide test error on the order of 13%. The CHAMP models—on validation and test data—are within 2% of perfect calibration for the twenty prediction intervals tested. We show that this model can also be used to obtain global density predictions with uncertainties at a given epoch. 
    more » « less