Summary Cross‐validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.
more »
« less
Estimation of prediction error in time series
Abstract Summary The accurate estimation of prediction errors in time series is an important problem. It immediately affects the accuracy of prediction intervals but also the quality of a number of widely used time series model selection criteria such as AIC and others. Except for simple cases, however, it is difficult or even infeasible to obtain exact analytical expressions for one-step and multi-step predictions. This may be one of the reasons that, unlike in the independent case (see Efron, 2004), until today there has been no fully established methodology for time series prediction error estimation. Starting from an approximation to the bias-variance decomposition of the squared prediction error, this work is therefore concerned with the estimation of prediction errors in both univariate and multivariate stationary time series. In particular, several estimates are developed for a general class of predictors that includes most of the popular linear, nonlinear, parametric and nonparametric time series models used in practice, where causal invertible ARMA and nonparametric AR processes are discussed as lead examples. Simulation results indicate that the proposed estimators perform quite well in finite samples. The estimates may also be used for model selection when the purpose of modeling is prediction.
more »
« less
- Award ID(s):
- 1934568
- PAR ID:
- 10466964
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrika
- ISSN:
- 0006-3444
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilise and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimise the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.more » « less
-
Abstract Estimation of uncertainties (random error statistics) of radio occultation (RO) observations is important for their effective assimilation in numerical weather prediction (NWP) models. Average uncertainties can be estimated for large samples of RO observations and these statistics may be used for specifying the observation errors in NWP data assimilation. However, the uncertainties of individual RO observations vary, and so using average uncertainty estimates will overestimate the uncertainties of some observations and underestimate those of others, reducing their overall effectiveness in the assimilation. Several parameters associated with RO observations or their atmospheric environments have been proposed to estimate individual RO errors. These include the standard deviation of bending angle (BA) departures from either climatology in the upper stratosphere and lower mesosphere (STDV) or the sample mean between 40 and 60 km (STD4060), the local spectral width (LSW), and the magnitude of the horizontal gradient of refractivity (|∇HN|). In this paper we show how the uncertainties of two RO datasets, COSMIC-2 and Spire BA, as well as their combination, vary with these parameters. We find that the uncertainties are highly correlated with STDV and STD4060 in the stratosphere, and with LSW and |∇HN| in the lower troposphere. These results suggest a hybrid error model for individual BA observations that uses an average statistical model of RO errors modified by STDV or STD4060 above 30 km, and LSW or |∇HN| below 8 km. Significance StatementThese results contribute to the understanding of the sources of uncertainties in radio occultation observations. They could be used to improve the effectiveness of these observations in their assimilation into numerical weather prediction and reanalysis models by improving the estimation of their observational errors.more » « less
-
Abstract Over the last three decades, many growth and yield systems developed for the southeast USA have incorporated methods to create a compatible basal area (BA) prediction and projection equation. This technique allows practitioners to calibrate BA models using both measurements at a given arbitrary age, as well as the increment in BA when time series panel data are available. As a result, model parameters for either prediction or projection alternatives are compatible. One caveat of this methodology is that pairs of observations used to project forward have the same weight as observations from a single measurement age, regardless of the projection time interval. To address this problem, we introduce a variance–covariance structure giving different weights to predictions with variable intervals. To test this approach, prediction and projection equations were fitted simultaneously using an ad hoc matrix structure. We tested three different error structures in fitting models with (i) homoscedastic errors described by a single parameter (Method 1); (ii) heteroscedastic errors described with a weighting factor $${w}_t$$ (Method 2); and (iii) errors including both prediction ($$\overset{\smile }{\varepsilon }$$) and projection errors ($$\tilde{\varepsilon}$$) in the weighting factor $${w}_t$$ (Method 3). A rotation-age dataset covering nine sites, each including four blocks with four silvicultural treatments per block, was used for model calibration and validation, including explicit terms for each treatment. Fitting using an error structure which incorporated the combined error term ($$\overset{\smile }{\varepsilon }$$ and $$\tilde{\varepsilon}$$) into the weighting factor $${w}_t$$ (Method 3), generated better results according to the root mean square error with respect to the other two methods evaluated. Also, the system of equations that incorporated silvicultural treatments as dummy variables generated lower root mean square error (RMSE) and Akaike’s index values (AIC) in all methods. Our results show a substantial improvement over the current prediction-projection approach, resulting in consistent estimators for BA.more » « less
-
Abstract Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (JointAdaptive predictioN-regionEstimation forTime-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlledK-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.more » « less
An official website of the United States government

