skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An asymptotic and empirical smoothing parameters selection method for smoothing spline ANOVA models in large samples
Summary Large samples are generated routinely from various sources. Classic statistical models, such as smoothing spline ANOVA models, are not well equipped to analyse such large samples because of high computational costs. In particular, the daunting computational cost of selecting smoothing parameters renders smoothing spline ANOVA models impractical. In this article, we develop an asympirical, i.e., asymptotic and empirical, smoothing parameters selection method for smoothing spline ANOVA models in large samples. The idea of our approach is to use asymptotic analysis to show that the optimal smoothing parameter is a polynomial function of the sample size and an unknown constant. The unknown constant is then estimated through empirical subsample extrapolation. The proposed method significantly reduces the computational burden of selecting smoothing parameters in high-dimensional and large samples. We show that smoothing parameters chosen by the proposed method tend to the optimal smoothing parameters that minimize a specific risk function. In addition, the estimator based on the proposed smoothing parameters achieves the optimal convergence rate. Extensive simulation studies demonstrate the numerical advantage of the proposed method over competing methods in terms of relative efficacy and running time. In an application to molecular dynamics data containing nearly one million observations, the proposed method has the best prediction performance.  more » « less
Award ID(s):
1903226
PAR ID:
10230058
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biometrika
Volume:
108
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
149 to 166
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Summary We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $$n$$, the smoothing spline estimator can be expressed as a linear combination of $$n$$ basis functions, requiring $O(n^3)$ computational time when the number $$d$$ of predictors is two or more. Such a sizeable computational cost hinders the broad applicability of smoothing splines. In practice, the full-sample smoothing spline estimator can be approximated by an estimator based on $$q$$ randomly selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the same rate when $$q$$ is of order $$O\{n^{2/(pr+1)}\}$$, where $$p\in [1,2]$$ depends on the true function and $r > 1$ depends on the type of spline. Such a $$q$$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting basis functions corresponding to approximately equally spaced observations, the proposed method chooses a set of basis functions with great diversity. The asymptotic analysis shows that the proposed smoothing spline estimator can decrease $$q$$ to around $$O\{n^{1/(pr+1)}\}$$ when $$d\leq pr+1$$. Applications to synthetic and real-world datasets show that the proposed method leads to a smaller prediction error than other basis selection methods. 
    more » « less
  2. A popular method for flexible function estimation in nonparametric models is the smoothing spline. When applying the smoothing spline method, the nonparametric function is estimated via penalized least squares, where the penalty imposes a soft constraint on the function to be estimated. The specification of the penalty functional is usually based on a set of assumptions about the function. Choosing a reasonable penalty function is the key to the success of the smoothing spline method. In practice, there may exist multiple sets of widely accepted assumptions, leading to different penalties, which then yield different estimates. We refer to this problem as the problem of ambiguous penalties. Neglecting the underlying ambiguity and proceeding to the model with one of the candidate penalties may produce misleading results. In this article, we adopt a Bayesian perspective and propose a fully Bayesian approach that takes into consideration all the penalties as well as the ambiguity in choosing them. We also propose a sampling algorithm for drawing samples from the posterior distribution. Data analysis based on simulated and real‐world examples is used to demonstrate the efficiency of our proposed method. 
    more » « less
  3. Abstract Motivated by an analysis of single molecular experiments in the study of T‐cell signaling, a new model called varying coefficient frailty model with local linear estimation is proposed. Frailty models have been extensively studied, but extensions to nonconstant coefficients are limited to spline‐based methods that tend to produce estimation bias near the boundary. To address this problem, we introduce a local polynomial kernel smoothing technique with a modified expectation‐maximization algorithm to estimate the unknown parameters. Theoretical properties of the estimators, including their unbiased property near the boundary, are derived along with discussions on the asymptotic bias‐variance trade‐off. The finite sample performance is examined by simulation studies, and comparisons with existing spline‐based approaches are conducted to show the potential advantages of the proposed approach. The proposed method is implemented for the analysis of T‐cell signaling. The fitted varying coefficient model provides a rigorous quantification of an early and rapid impact on T‐cell signaling from the accumulation of bond lifetime, which can shed new light on the fundamental understanding of how T cells initiate immune responses. 
    more » « less
  4. null (Ed.)
    Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d.~smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to {\it generalized} Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d. 
    more » « less
  5. Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against ℓ2-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest ℓp-radius that can be certified decreases as O(1/d12−1p) with dimension d for p>2. Notably, for p≥2, this dependence on d is no better than that of the ℓp-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when p≥2. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an ℓ1 or an ℓ∞-norm ball, we show upper bounds of the form O(1/d) and O(1/d1−1p) respectively, which have an even worse dependence on d. 
    more » « less