This content will become publicly available on November 20, 2024
 Award ID(s):
 2113171
 NSFPAR ID:
 10520182
 Publisher / Repository:
 Project Euclid
 Date Published:
 Journal Name:
 Electronic Journal of Statistics
 Volume:
 17
 Issue:
 2
 ISSN:
 19357524
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)Summary We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $n$, the smoothing spline estimator can be expressed as a linear combination of $n$ basis functions, requiring $O(n^3)$ computational time when the number $d$ of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the fullsample smoothing spline estimator can be approximated by an estimator based on $q$ randomly selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the same rate when $q$ is of order $O\{n^{2/(pr+1)}\}$, where $p\in [1,2]$ depends on the true function and $r > 1$ depends on the type of spline. Such a $q$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease $q$ to around $O\{n^{1/(pr+1)}\}$ when $d\leq pr+1$. Applications to synthetic and realworld datasets show that the proposed method leads to a smaller prediction error than other basis selection methods.more » « less

Abstract Analysis of time‐to‐event data using Cox's proportional hazards (PH) model is ubiquitous in scientific research. A sample is taken from the population of interest and covariate information is collected on everyone. If the event of interest is rare and covariate information is difficult to collect, the nested case‐control (NCC) design reduces costs with minimal impact on inferential precision. Under PH, application of the Cox model to data from a NCC sample provides consistent estimation of the hazard ratio. However, under non‐PH, the finite‐sample estimates corresponding to the Cox estimator depend on the number of controls sampled and the censoring distribution. We propose two estimators based on a binary predictor of interest: one recovers the estimand corresponding to the Cox model under a simple random sample, while the other recovers an estimand that does not depend on the censoring distribution. We derive the asymptotic distribution and provide finite‐sample variance estimators.

In this article, we study nonparametric inference for a covariateadjusted regression function. This parameter captures the average association between a continuous exposure and an outcome after adjusting for other covariates. Under certain causal conditions, it also corresponds to the average outcome had all units been assigned to a specific exposure level, known as the causal dose–response curve. We propose a debiased local linear estimator of the covariateadjusted regression function and demonstrate that our estimator converges pointwise to a meanzero normal limit distribution. We use this result to construct asymptotically valid confidence intervals for function values and differences thereof. In addition, we use approximation results for the distribution of the supremum of an empirical process to construct asymptotically valid uniform confidence bands. Our methods do not require undersmoothing, permit the use of dataadaptive estimators of nuisance functions, and our estimator attains the optimal rate of convergence for a twice differentiable regression function. We illustrate the practical performance of our estimator using numerical studies and an analysis of the effect of air pollution exposure on cardiovascular mortality.more » « less

Abstract We consider high‐dimensional inference for potentially misspecified Cox proportional hazard models based on low‐dimensional results by Lin and Wei (1989). A desparsified Lasso estimator is proposed based on the log partial likelihood function and shown to converge to a pseudo‐true parameter vector. Interestingly, the sparsity of the true parameter can be inferred from that of the above limiting parameter. Moreover, each component of the above (nonsparse) estimator is shown to be asymptotically normal with a variance that can be consistently estimated even under model misspecifications. In some cases, this asymptotic distribution leads to valid statistical inference procedures, whose empirical performances are illustrated through numerical examples.

The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easytoimplement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higherorder smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an indepth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel twoscale DNN (TDNN) estimator. The twoscale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the twoscale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and twoscale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the twoscale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nitesample performance of the suggested twoscale DNN method are illustrated with several simulation examples and a real data application.more » « less