skip to main content


Title: Simultaneous Selection and Inference for Varying Coefficients with Zero Regions: A Soft-Thresholding Approach
Abstract

Varying coefficient models have been used to explore dynamic effects in many scientific areas, such as in medicine, finance, and epidemiology. As most existing models ignore the existence of zero regions, we propose a new soft-thresholded varying coefficient model, where the coefficient functions are piecewise smooth with zero regions. Our new modeling approach enables us to perform variable selection, detect the zero regions of selected variables, obtain point estimates of the varying coefficients with zero regions, and construct a new type of sparse confidence intervals that accommodate zero regions. We prove the asymptotic properties of the estimator, based on which we draw statistical inference. Our simulation study reveals that the proposed sparse confidence intervals achieve the desired coverage probability. We apply the proposed method to analyze a large-scale preoperative opioid study.

 
more » « less
NSF-PAR ID:
10432533
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 3388-3401
Size(s):
["p. 3388-3401"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${\boldsymbol \theta }$ under the normal mean model ${\boldsymbol X}\sim N({\boldsymbol \theta },\sigma ^{2}\bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${\boldsymbol \theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${\boldsymbol \theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${\boldsymbol \theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of $\{1,...,d\}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $\theta _{j}$’s. To exploit the sparsity of ${\boldsymbol \theta }$, we allow the confidence interval for $\theta _{j}$ to degenerate to a single point 0 for any $j\notin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.

     
    more » « less
  2. Abstract

    This paper is motivated by studying differential brain activities to multiple experimental condition presentations in intracranial electroencephalography (iEEG) experiments. Contrasting effects of experimental conditions are often zero in most regions and nonzero in some local regions, yielding locally sparse functions. Such studies are essentially a function-on-scalar regression problem, with interest being focused not only on estimating nonparametric functions but also on recovering the function supports. We propose a weighted group bridge approach for simultaneous function estimation and support recovery in function-on-scalar mixed effect models, while accounting for heterogeneity present in functional data. We use B-splines to transform sparsity of functions to its sparse vector counterpart of increasing dimension, and propose a fast nonconvex optimization algorithm using nested alternative direction method of multipliers (ADMM) for estimation. Large sample properties are established. In particular, we show that the estimated coefficient functions are rate optimal in the minimax sense under the L2 norm and resemble a phase transition phenomenon. For support estimation, we derive a convergence rate under the norm that leads to a selection consistency property under δ-sparsity, and obtain a result under strict sparsity using a simple sufficient regularity condition. An adjusted extended Bayesian information criterion is proposed for parameter tuning. The developed method is illustrated through simulations and an application to a novel iEEG data set to study multisensory integration.

     
    more » « less
  3. Abstract

    For statistical inference on regression models with a diverging number of covariates, the existing literature typically makes sparsity assumptions on the inverse of the Fisher information matrix. Such assumptions, however, are often violated under Cox proportion hazards models, leading to biased estimates with under‐coverage confidence intervals. We propose a modified debiased lasso method, which solves a series of quadratic programming problems to approximate the inverse information matrix without posing sparse matrix assumptions. We establish asymptotic results for the estimated regression coefficients when the dimension of covariates diverges with the sample size. As demonstrated by extensive simulations, our proposed method provides consistent estimates and confidence intervals with nominal coverage probabilities. The utility of the method is further demonstrated by assessing the effects of genetic markers on patients' overall survival with the Boston Lung Cancer Survival Cohort, a large‐scale epidemiology study investigating mechanisms underlying the lung cancer.

     
    more » « less
  4. Abstract

    Statistical bias correction techniques are commonly used in climate model projections to reduce systematic biases. Among the several bias correction techniques, univariate linear bias correction (e.g., quantile mapping) is the most popular, given its simplicity. Univariate linear bias correction can accurately reproduce the observed mean of a given climate variable. However, when performed separately on multiple variables, it does not yield the observed multivariate cross‐correlation structure. In the current study, we consider the intrinsic properties of two candidate univariate linear bias‐correction approaches (simple linear regression and asynchronous regression) in estimating the observed cross‐correlation between precipitation and temperature. Two linear regression models are applied separately on both the observed and the projected variables. The analytical solution suggests that two candidate approaches simply reproduce the cross‐correlation from the general circulation models (GCMs) in the bias‐corrected data set because of their linearity. Our study adopts two frameworks, based on the Fisherz‐transformation and bootstrapping, to provide 95% lower and upper confidence limits (referred as the permissible bound) for the GCM cross‐correlation. Beyond the permissible bound, raw/bias‐corrected GCM cross‐correlation significantly differs from those observed. Two frameworks are applied on three GCMs from the CMIP5 multimodel ensemble over the coterminous United States. We found that (a) the univariate linear techniques fail to reproduce the observed cross‐correlation in the bias‐corrected data set over 90% (30–50%) of the grid points where the multivariate skewness coefficient values are substantial (small) and statistically significant (statistically insignificant) from zero; (b) the performance of the univariate linear techniques under bootstrapping (Fisherz‐transformation) remains uniform (non‐uniform) across climate regions, months, and GCMs; (c) grid points, where the observed cross‐correlation is statistically significant, witness a failure fraction of around 0.2 (0.8) under the Fisherz‐transformation (bootstrapping). The importance of reproducing cross‐correlations is also discussed along with an enquiry into the multivariate approaches that can potentially address the bias in yielding cross‐correlations.

     
    more » « less
  5. Abstract

    In light of the significant damage observed after earthquakes in Japan and New Zealand, enhanced performing seismic force‐resisting systems and energy dissipation devices are increasingly being utilized in buildings. Numerical models are needed to estimate the seismic response of these systems for seismic design or assessment. While there have been studies on modeling uncertainty, selecting the model features most important to response can remain ambiguous, especially if the structure employs less well‐established lateral force‐resisting systems and components. Herein, a global sensitivity analysis was used to address modeling uncertainty in specimens with elastic spines and force‐limiting connections (FLCs) physically tested at full‐scale at the E‐Defense shake table in Japan. Modeling uncertainty was addressed for both model class and model parameter uncertainty by varying primary models to develop several secondary models according to pre‐established uncertainty groups. Numerical estimates of peak story drift ratio and floor acceleration were compared to the results from the experimental testing program using confidence intervals and root‐mean‐square error. Metrics such as the coefficient of variation, variance, linear Pearson correlation coefficient, and Sobol index were used to gain intuition about each model feature's contribution to the dispersion in estimates of the engineering demands. Peak floor acceleration was found to be more sensitive to modeling uncertainty compared to story drift ratio. Assumptions for the spine‐to‐frame connection significantly impacted estimates of peak floor accelerations, which could influence future design methods for spines and FLC in enhanced lateral‐force resisting systems.

     
    more » « less