skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Testing and Confidence Intervals for High Dimensional Proportional Hazards Models
Summary

The paper considers the problem of hypothesis testing and confidence intervals in high dimensional proportional hazards models. Motivated by a geometric projection principle, we propose a unified likelihood ratio inferential framework, including score, Wald and partial likelihood ratio statistics for hypothesis testing. Without assuming model selection consistency, we derive the asymptotic distributions of these test statistics, establish their semiparametric optimality and conduct power analysis under Pitman alternatives. We also develop new procedures to construct pointwise confidence intervals for the baseline hazard function and conditional hazard function. Simulation studies show that all tests proposed perform well in controlling type I errors. Moreover, the partial likelihood ratio test is empirically more powerful than the other tests. The methods proposed are illustrated by an example of a gene expression data set.

 
more » « less
PAR ID:
10397807
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
79
Issue:
5
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 1415-1437
Size(s):
p. 1415-1437
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this work, we address the question of how to enhance signal-agnostic searches by leveraging multiple testing strategies. Specifically, we consider hypothesis tests relying on machine learning, where model selection can introduce a bias towards specific families of new physics signals. Focusing on the New Physics Learning Machine, a methodology to perform a signal-agnostic likelihood-ratio test, we explore a number of approaches to multiple testing, such as combiningp-values and aggregating test statistics. Our findings show that it is beneficial to combine different tests, characterised by distinct choices of hyperparameters, and that performances comparable to the best available test are generally achieved, while also providing a more uniform response to various types of anomalies. This study proposes a methodology that is valid beyond machine learning approaches and could in principle be applied to a larger class model-agnostic analyses based on hypothesis testing.

     
    more » « less
  2. Summary

    A formal likelihood ratio hypothesis test for the validity of a parametric regression function is proposed, using a large dimensional, non-parametric double-cone alternative. For example, the test against a constant function uses the alternative of increasing or decreasing regression functions, and the test against a linear function uses the convex or concave alternative. The test proposed is exact and unbiased and the critical value is easily computed. The power of the test increases to 1 as the sample size increases, under very mild assumptions—even when the alternative is misspecified, i.e. the power of the test converges to 1 for any true regression function that deviates (in a non-degenerate way) from the parametric null hypothesis. We also formulate tests for the linear versus partial linear model and consider the special case of the additive model. Simulations show that our procedure behaves well consistently when compared with other methods. Although the alternative fit is non-parametric, no tuning parameters are involved. Supplementary materials with proofs and technical details are available on line.

     
    more » « less
  3. We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call “the split likelihood-ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.

     
    more » « less
  4. Summary

    Penalized regression spline models afford a simple mixed model representation in which variance components control the degree of non-linearity in the smooth function estimates. This motivates the study of lack-of-fit tests based on the restricted maximum likelihood ratio statistic which tests whether variance components are 0 against the alternative of taking on positive values. For this one-sided testing problem a further complication is that the variance component belongs to the boundary of the parameter space under the null hypothesis. Conditions are obtained on the design of the regression spline models under which asymptotic distribution theory applies, and finite sample approximations to the asymptotic distribution are provided. Test statistics are studied for simple as well as multiple-regression models.

     
    more » « less
  5. Summary Since the introduction of fiducial inference by Fisher in the 1930s, its application has been largely confined to relatively simple, parametric problems. In this paper, we present what might be the first time fiducial inference is systematically applied to estimation of a nonparametric survival function under right censoring. We find that the resulting fiducial distribution gives rise to surprisingly good statistical procedures applicable to both one-sample and two-sample problems. In particular, we use the fiducial distribution of a survival function to construct pointwise and curvewise confidence intervals for the survival function, and propose tests based on the curvewise confidence interval. We establish a functional Bernstein–von Mises theorem, and perform thorough simulation studies in scenarios with different levels of censoring. The proposed fiducial-based confidence intervals maintain coverage in situations where asymptotic methods often have substantial coverage problems. Furthermore, the average length of the proposed confidence intervals is often shorter than the length of confidence intervals for competing methods that maintain coverage. Finally, the proposed fiducial test is more powerful than various types of log-rank tests and sup log-rank tests in some scenarios. We illustrate the proposed fiducial test by comparing chemotherapy against chemotherapy combined with radiotherapy, using data from the treatment of locally unresectable gastric cancer. 
    more » « less