NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Exact selective inference with randomization

https://doi.org/10.1093/biomet/asae019

Panigrahi, Snigdha; Fry, Kevin; Taylor, Jonathan (April 2024, Biometrika)

We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce this problem to inference for a bivariate truncated Gaussian variable. By doing so, we give up some power that is achieved with approximate maximum likelihood estimation in Panigrahi & Taylor (2023). Yet our pivot always produces narrower confidence intervals than a closely related data-splitting procedure. We investigate the trade-off between power and exact selective inference on simulated datasets and an HIV drug resistance dataset.
more » « less
Full Text Available
Carving model-free inference

https://doi.org/10.1214/23-AOS2318

Panigrahi, Snigdha (December 2023, The Annals of Statistics)

Complex studies involve many steps. Selecting promising findings based on pilot data is a first step. As more observations are collected, the investigator must decide how to combine the new data with the pilot data to construct valid selective inference. Carving, introduced by Fithian, Sun and Taylor (2014), enables the reuse of pilot data during selective inference and accounts for overoptimism from the selection process. However, currently, carving is only justified for parametric models such as the commonly used Gaussian model. In this paper, we develop the asymptotic theory to substantiate the use of carving beyond Gaussian models. Our results indicate that carving produces valid and tight confidence intervals within a model-free setting, as demonstrated through simulated and real instances.
more » « less
Full Text Available
Approximate Post-Selective Inference for Regression with the Group LASSO

Panigrahi, Snigdha; MacDonald, Peter W; Kessler, Daniel (March 2023, Journal of machine learning research)

After selection with the Group LASSO (or generalized variants such as the overlapping, sparse, or standardized Group LASSO), inference for the selected parameters is unreliable in the absence of adjustments for selection bias. In the penalized Gaussian regression setup, existing approaches provide adjustments for selection events that can be expressed as linear inequalities in the data variables. Such a representation, however, fails to hold for selection with the Group LASSO and substantially obstructs the scope of subsequent post-selective inference. Key questions of inferential interest—for example, inference for the effects of selected variables on the outcome—remain unanswered. In the present paper, we develop a consistent, post-selective, Bayesian method to address the existing gaps by deriving a likelihood adjustment factor and an approximation thereof that eliminates bias from the selection of groups. Experiments on simulated data and data from the Human Connectome Project demonstrate that our method recovers the effects of parameters within the selected groups while paying only a small price for bias adjustment.
more » « less
Full Text Available
An omnibus test for detection of subgroup treatment effects via data partitioning

https://doi.org/10.1214/21-AOAS1589

Sun, Yifei; He, Xuming; Hu, Jianhua (December 2022, The Annals of Applied Statistics)

Full Text Available
Approximate Selective Inference via Maximum Likelihood

https://doi.org/10.1080/01621459.2022.2081575

Panigrahi, Snigdha; Taylor, Jonathan (January 2022, Journal of the American Statistical Association)
Regina Liu (Ed.)
Several strategies have been developed recently to ensure valid inference after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this article, we consider a selective inference framework for Gaussian data. We propose a new method for inference through approximate maximum likelihood estimation. Our goal is to: (a) achieve better inferential power with the aid of randomization, (b) bypass expensive MCMC sampling from exact conditional distributions that are hard to evaluate in closed forms. We construct approximate inference, for example, p-values, confidence intervals etc., by solving a fairly simple, convex optimization problem. We illustrate the potential of our method across wide-ranging values of signal-to-noise ratio in simulations. On a cancer gene expression dataset we find that our method improves upon the inferential power of some commonly used strategies for selective inference.
more » « less
Full Text Available
Smoothed quantile regression with large-scale inference

https://doi.org/10.1016/j.jeconom.2021.07.010

He, Xuming; Pan, Xiaoou; Tan, Kean Ming; Zhou, Wen-Xin (August 2021, Journal of Econometrics)

Full Text Available
A tail-based test to detect differential expression in RNA-sequencing data

https://doi.org/10.1177/0962280220951907

Chen, Jiong; Mi, Xinlei; Ning, Jing; He, Xuming; Hu, Jianhua (January 2021, Statistical Methods in Medical Research)
null (Ed.)
RNA sequencing data have been abundantly generated in biomedical research for biomarker discovery and other studies. Such data at the exon level are usually heavily tailed and correlated. Conventional statistical tests based on the mean or median difference for differential expression likely suffer from low power when the between-group difference occurs mostly in the upper or lower tail of the distribution of gene expression. We propose a tail-based test to make comparisons between groups in terms of a specific distribution area rather than a single location. The proposed test, which is derived from quantile regression, adjusts for covariates and accounts for within-sample dependence among the exons through a specified correlation structure. Through Monte Carlo simulation studies, we show that the proposed test is generally more powerful and robust in detecting differential expression than commonly used tests based on the mean or a single quantile. An application to TCGA lung adenocarcinoma data demonstrates the promise of the proposed method in terms of biomarker discovery.
more » « less
Full Text Available

Search for: All records