skip to main content


Title: Testing and Confidence Intervals for High Dimensional Proportional Hazards Models
Summary

The paper considers the problem of hypothesis testing and confidence intervals in high dimensional proportional hazards models. Motivated by a geometric projection principle, we propose a unified likelihood ratio inferential framework, including score, Wald and partial likelihood ratio statistics for hypothesis testing. Without assuming model selection consistency, we derive the asymptotic distributions of these test statistics, establish their semiparametric optimality and conduct power analysis under Pitman alternatives. We also develop new procedures to construct pointwise confidence intervals for the baseline hazard function and conditional hazard function. Simulation studies show that all tests proposed perform well in controlling type I errors. Moreover, the partial likelihood ratio test is empirically more powerful than the other tests. The methods proposed are illustrated by an example of a gene expression data set.

 
more » « less
NSF-PAR ID:
10397807
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
79
Issue:
5
ISSN:
1369-7412
Page Range / eLocation ID:
p. 1415-1437
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call “the split likelihood-ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.

     
    more » « less
  2. Summary

    A formal likelihood ratio hypothesis test for the validity of a parametric regression function is proposed, using a large dimensional, non-parametric double-cone alternative. For example, the test against a constant function uses the alternative of increasing or decreasing regression functions, and the test against a linear function uses the convex or concave alternative. The test proposed is exact and unbiased and the critical value is easily computed. The power of the test increases to 1 as the sample size increases, under very mild assumptions—even when the alternative is misspecified, i.e. the power of the test converges to 1 for any true regression function that deviates (in a non-degenerate way) from the parametric null hypothesis. We also formulate tests for the linear versus partial linear model and consider the special case of the additive model. Simulations show that our procedure behaves well consistently when compared with other methods. Although the alternative fit is non-parametric, no tuning parameters are involved. Supplementary materials with proofs and technical details are available on line.

     
    more » « less
  3. Summary

    Penalized regression spline models afford a simple mixed model representation in which variance components control the degree of non-linearity in the smooth function estimates. This motivates the study of lack-of-fit tests based on the restricted maximum likelihood ratio statistic which tests whether variance components are 0 against the alternative of taking on positive values. For this one-sided testing problem a further complication is that the variance component belongs to the boundary of the parameter space under the null hypothesis. Conditions are obtained on the design of the regression spline models under which asymptotic distribution theory applies, and finite sample approximations to the asymptotic distribution are provided. Test statistics are studied for simple as well as multiple-regression models.

     
    more » « less
  4. Biscarat, C. ; Campana, S. ; Hegner, B. ; Roiser, S. ; Rovelli, C.I. ; Stewart, G.A. (Ed.)
    High Energy Physics (HEP) experiments generally employ sophisticated statistical methods to present results in searches of new physics. In the problem of searching for sterile neutrinos, likelihood ratio tests are applied to short-baseline neutrino oscillation experiments to construct confidence intervals for the parameters of interest. The test statistics of the form Δχ 2 is often used to form the confidence intervals, however, this approach can lead to statistical inaccuracies due to the small signal rate in the region-of-interest. In this paper, we present a computational model for the computationally expensive Feldman-Cousins corrections to construct a statistically accurate confidence interval for neutrino oscillation analysis. The program performs a grid-based minimization over oscillation parameters and is written in C++. Our algorithms make use of vectorization through Eigen3, yielding a single-core speed-up of 350 compared to the original implementation, and achieve MPI data parallelism by employing DIY. We demonstrate the strong scaling of the application at High-Performance Computing (HPC) sites. We utilize HDF5 along with HighFive to write the results of the calculation to file. 
    more » « less
  5. Abstract

    Children exposed to mixtures of endocrine disrupting compounds such as phthalates are at high risk of experiencing significant friction in their growth and sexual maturation. This article is primarily motivated by a study that aims to assess the toxicants‐modified effects of risk factors related to the hazards of early or delayed onset of puberty among children living in Mexico City. To address the hypothesis of potential nonlinear modification of covariate effects, we propose a new Cox regression model with multiple functional covariate‐environment interactions, which allows covariate effects to be altered nonlinearly by mixtures of exposed toxicants. This new class of models is rather flexible and includes many existing semiparametric Cox models as special cases. To achieve efficient estimation, we develop the global partial likelihood method of inference, in which we establish key large‐sample results, including estimation consistency, asymptotic normality, semiparametric efficiency and the generalized likelihood ratio test for both parameters and nonparametric functions. The proposed methodology is examined via simulation studies and applied to the analysis of the motivating data, where maternal exposures to phthalates during the third trimester of pregnancy are found to be important risk modifiers for the age of attaining the first stage of puberty.The Canadian Journal of Statistics47: 204–221; 2019 © 2019 Statistical Society of Canada

     
    more » « less