skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bias, Consistency, and Alternative Perspectives of the Infinitesimal Jackknife
Though introduced nearly 50 years ago, the infinitesimal jackknife (IJ) remains a popular modern tool for quantifying predictive uncertainty in complex estimation settings. In particular, when supervised learning ensembles are constructed via bootstrap samples, recent work demonstrated that the IJ estimate of variance is particularly convenient and useful. However, despite the algebraic simplicity of its final form, its derivation is rather complex. As a result, studies clarifying the intuition behind the estimator or rigorously investigating its properties have been severely lacking. This work aims to take a step forward on both fronts. We demonstrate that surprisingly, the exact form of the IJ estimator can be obtained via a straightforward linear regression of the individual bootstrap estimates on their respective weights or via the classical jackknife. The latter realization allows us to formally investigate the bias of the IJ variance estimator and better characterize the settings in which its use is appropriate. Finally, we extend these results to the case of U-statistics where base models are constructed via subsampling rather than bootstrapping and provide a consistent estimate of the resulting variance.  more » « less
Award ID(s):
2015400
PAR ID:
10536191
Author(s) / Creator(s):
; ;
Editor(s):
Chen, Yi-Hau; Stufken, John; Judy_Wang, Huixia
Publisher / Repository:
Statistica Sinica
Date Published:
Journal Name:
Statistica Sinica
ISSN:
1017-0405
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Variance estimation is an important aspect in statistical inference, especially in the dependent data situations. Resamplingmethods are ideal for solving this problem since these do not require restrictive distributional assumptions. In this paper, wedevelop a novel resampling method in the Jackknife family called the stationary jackknife. It can be used to estimatethe variance of a statistic in the cases where observations are from a general stationary sequence. Unlike the moving blockjackknife, the stationary jackknife computes the jackknife replication by deleting a variable length block and thelength has a truncated geometric distribution. Under appropriate assumptions, we can show the stationary jackknifevariance estimator is a consistent estimator for the case of the sample mean and, more generally, for a class of nonlinearstatistics. Further, the stationary jackknife is shown to provide reasonable variance estimation for a wider range ofexpected block lengths when compared with the moving block jackknife by simulation. 
    more » « less
  2. The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application. 
    more » « less
  3. Abstract While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee’s rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee’s rank correlation thus falls into a category of statistics that are asymptotically normal, but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee’s original proposal for testing independence and the analytic asymptotic variance estimator of Lin & Han (2022) for more general purposes. [Received on 5 April 2023. Editorial decision on 10 January 2024] 
    more » « less
  4. Abstract In this paper, we propose a flexible nested error regression small area model with high-dimensional parameter that incorporates heterogeneity in regression coefficients and variance components. We develop a new robust small area-specific estimating equations method that allows appropriate pooling of a large number of areas in estimating small area-specific model parameters. We propose a parametric bootstrap and jackknife method to estimate not only the mean squared errors but also other commonly used uncertainty measures such as standard errors and coefficients of variation. We conduct both model-based and design-based simulation experiments and real-life data analysis to evaluate the proposed methodology. 
    more » « less
  5. Abstract Standard estimators of the global average treatment effect can be biased in the presence of interference. This paper proposes regression adjustment estimators for removing bias due to interference in Bernoulli randomized experiments. We use a fitted model to predict the counterfactual outcomes of global control and global treatment. Our work differs from standard regression adjustments in that the adjustment variables are constructed from functions of the treatment assignment vector, and that we allow the researcher to use a collection of any functions correlated with the response, turning the problem of detecting interference into a feature engineering problem. We characterize the distribution of the proposed estimator in a linear model setting and connect the results to the standard theory of regression adjustments under SUTVA. We then propose an estimator that allows for flexible machine learning estimators to be used for fitting a nonlinear interference functional form. We propose conducting statistical inference via bootstrap and resampling methods, which allow us to sidestep the complicated dependences implied by interference and instead rely on empirical covariance structures. Such variance estimation relies on an exogeneity assumption akin to the standard unconfoundedness assumption invoked in observational studies. In simulation experiments, our methods are better at debiasing estimates than existing inverse propensity weighted estimators based on neighborhood exposure modeling. We use our method to reanalyze an experiment concerning weather insurance adoption conducted on a collection of villages in rural China. 
    more » « less