Variance estimation is an important aspect in statistical inference, especially in the dependent data situations. Resamplingmethods are ideal for solving this problem since these do not require restrictive distributional assumptions. In this paper, wedevelop a novel resampling method in the Jackknife family called the stationary jackknife. It can be used to estimatethe variance of a statistic in the cases where observations are from a general stationary sequence. Unlike the moving blockjackknife, the stationary jackknife computes the jackknife replication by deleting a variable length block and thelength has a truncated geometric distribution. Under appropriate assumptions, we can show the stationary jackknifevariance estimator is a consistent estimator for the case of the sample mean and, more generally, for a class of nonlinearstatistics. Further, the stationary jackknife is shown to provide reasonable variance estimation for a wider range ofexpected block lengths when compared with the moving block jackknife by simulation.
more »
« less
The Impact of Application of the Jackknife to the Sample Median
The jackknife is a reliable tool for reducing the bias of a wide range of estimators. This note demonstrates that even such versatile tools have regularity conditions that can be violated even in relatively simple cases, and that caution needs to be exercised in their use. In particular, we show that the jackknife does not provide the expected reliability for bias-reduction for the sample median, because of subtle changes in behavior of the sample median as one moves between even and odd sample sizes. These considerations arose out of class discussions in a MS-level nonparametrics course.
more »
« less
- Award ID(s):
- 1712839
- PAR ID:
- 10298438
- Date Published:
- Journal Name:
- The American Statistician
- ISSN:
- 0003-1305
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Chen, Yi-Hau; Stufken, John; Judy_Wang, Huixia (Ed.)Though introduced nearly 50 years ago, the infinitesimal jackknife (IJ) remains a popular modern tool for quantifying predictive uncertainty in complex estimation settings. In particular, when supervised learning ensembles are constructed via bootstrap samples, recent work demonstrated that the IJ estimate of variance is particularly convenient and useful. However, despite the algebraic simplicity of its final form, its derivation is rather complex. As a result, studies clarifying the intuition behind the estimator or rigorously investigating its properties have been severely lacking. This work aims to take a step forward on both fronts. We demonstrate that surprisingly, the exact form of the IJ estimator can be obtained via a straightforward linear regression of the individual bootstrap estimates on their respective weights or via the classical jackknife. The latter realization allows us to formally investigate the bias of the IJ variance estimator and better characterize the settings in which its use is appropriate. Finally, we extend these results to the case of U-statistics where base models are constructed via subsampling rather than bootstrapping and provide a consistent estimate of the resulting variance.more » « less
-
Summary Computerised Record Linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of error‐free unique identifiers. A small amount of linkage errors can lead to substantial bias and increased variability in estimating parameters of a statistical model. In this paper, we propose a unified theory for statistical analysis with linked data. Our proposed method, unlike the ones available for secondary data analysis of linked data, exploits record linkage process data as an alternative to taking a costly sample to evaluate error rates from the record linkage procedure. A jackknife method is introduced to estimate bias, covariance matrix and mean squared error of our proposed estimators. Simulation results are presented to evaluate the performance of the proposed estimators that account for linkage errors.more » « less
-
Ribeiro, Pedro; Silva, Fernando; Mendes, José Fernando; Laureano, Rosário (Ed.)The availability of large datasets composed of graphs creates an unprecedented need to invent novel tools in statistical learning for graph-valued random variables. To characterize the average of a sample of graphs, one can compute the sample Frechet mean and median graphs. In this paper, we address the following foundational question: does a mean or median graph inherit the structural properties of the graphs in the sample? An important graph property is the edge density; we establish that edge density is an hereditary property, which can be transmitted from a graph sample to its sample Frechet mean or median graphs, irrespective of the method used to estimate the mean or the median. Because of the prominence of the Frechet mean in graph-valued machine learning, this novel theoretical result has some significant practical consequences.more » « less
-
The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application.more » « less
An official website of the United States government

