Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available January 2, 2026
-
We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce this problem to inference for a bivariate truncated Gaussian variable. By doing so, we give up some power that is achieved with approximate maximum likelihood estimation in Panigrahi & Taylor (2023). Yet our pivot always produces narrower confidence intervals than a closely related data-splitting procedure. We investigate the trade-off between power and exact selective inference on simulated datasets and an HIV drug resistance dataset.more » « less
-
Complex studies involve many steps. Selecting promising findings based on pilot data is a first step. As more observations are collected, the investigator must decide how to combine the new data with the pilot data to construct valid selective inference. Carving, introduced by Fithian, Sun and Taylor (2014), enables the reuse of pilot data during selective inference and accounts for overoptimism from the selection process. However, currently, carving is only justified for parametric models such as the commonly used Gaussian model. In this paper, we develop the asymptotic theory to substantiate the use of carving beyond Gaussian models. Our results indicate that carving produces valid and tight confidence intervals within a model-free setting, as demonstrated through simulated and real instances.more » « less
-
After selection with the Group LASSO (or generalized variants such as the overlapping, sparse, or standardized Group LASSO), inference for the selected parameters is unreliable in the absence of adjustments for selection bias. In the penalized Gaussian regression setup, existing approaches provide adjustments for selection events that can be expressed as linear inequalities in the data variables. Such a representation, however, fails to hold for selection with the Group LASSO and substantially obstructs the scope of subsequent post-selective inference. Key questions of inferential interest—for example, inference for the effects of selected variables on the outcome—remain unanswered. In the present paper, we develop a consistent, post-selective, Bayesian method to address the existing gaps by deriving a likelihood adjustment factor and an approximation thereof that eliminates bias from the selection of groups. Experiments on simulated data and data from the Human Connectome Project demonstrate that our method recovers the effects of parameters within the selected groups while paying only a small price for bias adjustment.more » « less
-
Regina Liu (Ed.)Several strategies have been developed recently to ensure valid inference after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this article, we consider a selective inference framework for Gaussian data. We propose a new method for inference through approximate maximum likelihood estimation. Our goal is to: (a) achieve better inferential power with the aid of randomization, (b) bypass expensive MCMC sampling from exact conditional distributions that are hard to evaluate in closed forms. We construct approximate inference, for example, p-values, confidence intervals etc., by solving a fairly simple, convex optimization problem. We illustrate the potential of our method across wide-ranging values of signal-to-noise ratio in simulations. On a cancer gene expression dataset we find that our method improves upon the inferential power of some commonly used strategies for selective inference.more » « less
-
Summary The goal of expression quantitative trait loci (eQTL) studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20 000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor: eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues as brain, liver, etc.—often the organs of most immediate medical relevance. Given the high-dimensional nature of these datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures. These testing procedures primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated with these variants, in a way that accounts for the considerable amount of selection. Yet, given the difficulty of procuring additional samples, this challenge is of practical importance. We illustrate in this work how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy with a 2-fold contribution: (1) it reflects the selection steps typically adopted in state of the art investigations and (2) it introduces the use of randomness instead of data-splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated.more » « less
An official website of the United States government

Full Text Available