skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: STATIONARY JACKKNIFE
Variance estimation is an important aspect in statistical inference, especially in the dependent data situations. Resamplingmethods are ideal for solving this problem since these do not require restrictive distributional assumptions. In this paper, wedevelop a novel resampling method in the Jackknife family called the stationary jackknife. It can be used to estimatethe variance of a statistic in the cases where observations are from a general stationary sequence. Unlike the moving blockjackknife, the stationary jackknife computes the jackknife replication by deleting a variable length block and thelength has a truncated geometric distribution. Under appropriate assumptions, we can show the stationary jackknifevariance estimator is a consistent estimator for the case of the sample mean and, more generally, for a class of nonlinearstatistics. Further, the stationary jackknife is shown to provide reasonable variance estimation for a wider range ofexpected block lengths when compared with the moving block jackknife by simulation.  more » « less
Award ID(s):
2235457
PAR ID:
10542182
Author(s) / Creator(s):
;
Publisher / Repository:
John Wiley & Sons Ltd
Date Published:
Journal Name:
Journal of time series analysis
Volume:
45
Issue:
3
ISSN:
0143-9782
Page Range / eLocation ID:
333 - 360
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chen, Yi-Hau; Stufken, John; Judy_Wang, Huixia (Ed.)
    Though introduced nearly 50 years ago, the infinitesimal jackknife (IJ) remains a popular modern tool for quantifying predictive uncertainty in complex estimation settings. In particular, when supervised learning ensembles are constructed via bootstrap samples, recent work demonstrated that the IJ estimate of variance is particularly convenient and useful. However, despite the algebraic simplicity of its final form, its derivation is rather complex. As a result, studies clarifying the intuition behind the estimator or rigorously investigating its properties have been severely lacking. This work aims to take a step forward on both fronts. We demonstrate that surprisingly, the exact form of the IJ estimator can be obtained via a straightforward linear regression of the individual bootstrap estimates on their respective weights or via the classical jackknife. The latter realization allows us to formally investigate the bias of the IJ variance estimator and better characterize the settings in which its use is appropriate. Finally, we extend these results to the case of U-statistics where base models are constructed via subsampling rather than bootstrapping and provide a consistent estimate of the resulting variance. 
    more » « less
  2. The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application. 
    more » « less
  3. Abstract Understanding the impacts of pandemics on public health and related societal issues at granular levels is of great interest. COVID-19 is affecting everyone in the globe and mask wearing is one of the few precautions against it. To quantify people’s perception of mask effectiveness and to prevent the spread of COVID-19 for small areas, we use Understanding America Study’s (UAS) survey data on COVID-19 as our primary data source. Our data analysis shows that direct survey-weighted estimates for small areas could be highly unreliable. In this paper, we develop a synthetic estimation method to estimate proportions of perceived mask effectiveness for small areas using a logistic model that combines information from multiple data sources. We select our working model using an extensive data analysis facilitated by a new variable selection criterion for survey data and benchmarking ratios. We suggest a jackknife method to estimate the variance of our estimator. From our data analysis, it is evident that our proposed synthetic method outperforms the direct survey-weighted estimator with respect to commonly used evaluation measures. 
    more » « less
  4. Longitudinal clinical trials for which recurrent events endpoints are of interest are commonly subject to missing event data. Primary analyses in such trials are often performed assuming events are missing at random, and sensitivity analyses are necessary to assess robustness of primary analysis conclusions to missing data assumptions. Control‐based imputation is an attractive approach in superiority trials for imposing conservative assumptions on how data may be missing not at random. A popular approach to implementing control‐based assumptions for recurrent events is multiple imputation (MI), but Rubin's variance estimator is often biased for the true sampling variability of the point estimator in the control‐based setting. We propose distributional imputation (DI) with corresponding wild bootstrap variance estimation procedure for control‐based sensitivity analyses of recurrent events. We apply control‐based DI to a type I diabetes trial. In the application and simulation studies, DI produced more reasonable standard error estimates than MI with Rubin's combining rules in control‐based sensitivity analyses of recurrent events. 
    more » « less
  5. ABSTRACT Harmonizable processes are a class of nonstationary time series, that are characterized by their dependence between different frequencies of a time series. The covariance between two frequencies is the dual frequency spectral density, an object analogous to the spectral density function. Local stationarity is another popular form of nonstationarity, though thus far, little attention has been paid to the dual frequency spectral density of a locally stationary process. The focus of this paper is on the dual frequency spectral density of local stationary time series and locally periodic stationary time series, its natural extension. We show that there are some subtle but important differences between the dual frequency spectral density of an almost periodic stationary process and a locally periodic stationary time series. Estimation of the dual frequency spectral density is typically done by smoothing the dual frequency periodogram. We study the sampling properties of this estimator under the assumption of locally periodic stationarity. In particular, we obtain a Gaussian approximation for the smoothed dual frequency periodogram over a group of frequencies, allowing for the number of frequency lags to grow with sample size. These results are used to test for correlation between different frequency bands in the time series. The variance of the smooth dual frequency periodogram is quite complex. However, by identifying which covariances are the most pertinent we propose a nonparametric method for consistently estimating the variance. This is necessary for constructing confidence intervals or testing aspects of the dual frequency spectral density. Simulations are given to illustrate our results. 
    more » « less