skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer’s disease. We also put R package “aispu” implementing the proposed test on GitHub.  more » « less
Award ID(s):
1711226
PAR ID:
10185057
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
21
Issue:
129
ISSN:
1532-4435
Page Range / eLocation ID:
1-67
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fan, J ; Pan, J. (Ed.)
    Testing whether the mean vector from some population is zero or not is a fundamental problem in statistics. In the high-dimensional regime, where the dimension of data p is greater than the sample size n, traditional methods such as Hotelling’s T2 test cannot be directly applied. One can project the high-dimensional vector onto a space of low dimension and then traditional methods can be applied. In this paper, we propose a projection test based on a new estimation of the optimal projection direction Σ^{−1}μ. Under the assumption that the optimal projection Σ^{−1}μ is sparse, we use a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it. Simulation studies and real data analysis are conducted to examine the finite sample performance of different tests in terms of type I error and power. 
    more » « less
  2. Summary

    The paper considers the problem of hypothesis testing and confidence intervals in high dimensional proportional hazards models. Motivated by a geometric projection principle, we propose a unified likelihood ratio inferential framework, including score, Wald and partial likelihood ratio statistics for hypothesis testing. Without assuming model selection consistency, we derive the asymptotic distributions of these test statistics, establish their semiparametric optimality and conduct power analysis under Pitman alternatives. We also develop new procedures to construct pointwise confidence intervals for the baseline hazard function and conditional hazard function. Simulation studies show that all tests proposed perform well in controlling type I errors. Moreover, the partial likelihood ratio test is empirically more powerful than the other tests. The methods proposed are illustrated by an example of a gene expression data set.

     
    more » « less
  3. Summary

    Covariate-adaptive randomization is popular in clinical trials with sequentially arrived patients for balancing treatment assignments across prognostic factors that may have influence on the response. However, existing theory on tests for the treatment effect under covariate-adaptive randomization is limited to tests under linear or generalized linear models, although the covariate-adaptive randomization method has been used in survival analysis for a long time. Often, practitioners will simply adopt a conventional test to compare two treatments, which is controversial since tests derived under simple randomization may not be valid in terms of type I error under other randomization schemes. We derive the asymptotic distribution of the partial likelihood score function under covariate-adaptive randomization and a working model that is subject to possible model misspecification. Using this general result, we prove that the partial likelihood score test that is robust against model misspecification under simple randomization is no longer robust but conservative under covariate-adaptive randomization. We also show that the unstratified log-rank test is conservative and the stratified log-rank test remains valid under covariate-adaptive randomization. We propose a modification to variance estimation in the partial likelihood score test, which leads to a score test that is valid and robust against arbitrary model misspecification under a large family of covariate-adaptive randomization schemes including simple randomization. Furthermore, we show that the modified partial likelihood score test derived under a correctly specified model is more powerful than log-rank-type tests in terms of Pitman’s asymptotic relative efficiency. Simulation studies about the type I error and power of various tests are presented under several popular randomization schemes.

     
    more » « less
  4. Summary

    We consider testing regression coefficients in high dimensional generalized linear models. By modifying the test statistic of Goeman and his colleagues for large but fixed dimensional settings, we propose a new test, based on an asymptotic analysis, that is applicable for diverging dimensions and is robust to accommodate a wide range of link functions. The power properties of the tests are evaluated asymptotically under two families of alternative hypotheses. In addition, a test in the presence of nuisance parameters is also proposed. The tests can provide p-values for testing significance of multiple gene sets, whose application is demonstrated in a case-study on lung cancer.

     
    more » « less
  5. This study introduces the statistical theory of using the Standardized Root Mean Squared Error (SRMR) to test close fit in ordinal factor analysis. We also compare the accuracy of confidence intervals (CIs) and tests of close fit based on the Standardized Root Mean Squared Error (SRMR) with those obtained based on the Root Mean Squared Error of Approximation (RMSEA). We use Unweighted Least Squares (ULS) estimation with a mean and variance corrected test statistic. The current (biased) implementation for the RMSEA never rejects that a model fits closely when data are binary and almost invariably rejects the model in large samples if data consist of five categories. The unbiased RMSEA produces better rejection rates, but it is only accurate enough when the number of variables is small (e.g., p = 10) and the degree of misfit is small. In contrast, across all simulated conditions, the tests of close fit based on the SRMR yield acceptable type I error rates. SRMR tests of close fit are also more powerful than those using the unbiased RMSEA. 
    more » « less