skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: Constructing confidence intervals for selected parameters
Abstract

In large‐scale problems, it is common practice to select important parameters by a procedure such as the Benjamini and Hochberg procedure and construct confidence intervals (CIs) for further investigation while the false coverage‐statement rate (FCR) for the CIs is controlled at a desired level. Although the well‐known BY CIs control the FCR, they are uniformly inflated. In this paper, we propose two methods to construct shorter selective CIs. The first method produces shorter CIs by allowing a reduced number of selective CIs. The second method produces shorter CIs by allowing a prefixed proportion of CIs containing the values of uninteresting parameters. We theoretically prove that the proposed CIs are uniformly shorter than BY CIs and control the FCR asymptotically for independent data. Numerical results confirm our theoretical results and show that the proposed CIs still work for correlated data. We illustrate the advantage of the proposed procedures by analyzing the microarray data from a HIV study.

 
more » « less
NSF-PAR ID:
10455755
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
76
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1098-1108
Size(s):
["p. 1098-1108"]
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The reproducibility debate has caused a renewed interest in changing how one reports uncertainty, from 𝑝-value for testing a null hypothesis to a confidence interval (CI) for the corresponding parameter. When CIs for multiple selected parameters are being reported, the analog of the false discovery rate (FDR) is the false coverage rate (FCR), which is the expected ratio of number of reported CIs failing to cover their respective parameters to the total number of reported CIs. Here, we consider the general problem of FCR control in the online setting, where one encounters an infinite sequence of fixed unknown parameters ordered by time. We propose a novel solution to the problem which only requires the scientist to be able to construct marginal CIs. As special cases, our framework yields algorithms for online FDR control and online sign-classification procedures that control the false sign rate (FSR). All of our methodology applies equally well to prediction intervals, having particular implications for selective conformal inference. 
    more » « less
  2. Summary

    Benjamini and Yekutieli suggested that it is important to account for multiplicity correction for confidence intervals when only some of the selected intervals are reported. They introduced the concept of the false coverage rate (FCR) for confidence intervals which is parallel to the concept of the false discovery rate in the multiple-hypothesis testing problem and they developed confidence intervals for selected parameters which control the FCR. Their approach requires the FCR to be controlled in the frequentist’s sense, i.e. controlled for all the possible unknown parameters. In modern applications, the number of parameters could be large, as large as tens of thousands or even more, as in microarray experiments. We propose a less conservative criterion, the Bayes FCR, and study confidence intervals controlling it for a class of distributions. The Bayes FCR refers to the average FCR with respect to a distribution of parameters. Under such a criterion, we propose some confidence intervals, which, by some analytic and numerical calculations, are demonstrated to have the Bayes FCR controlled at level q for a class of prior distributions, including mixtures of normal distributions and zero, where the mixing probability is unknown. The confidence intervals are shrinkage-type procedures which are more efficient for the θis that have a sparsity structure, which is a common feature of microarray data. More importantly, the centre of the proposed shrinkage intervals reduces much of the bias due to selection. Consequently, the proposed empirical Bayes intervals are always shorter in average length than the intervals of Benjamini and Yekutieli and can be only 50% or 60% as long in some cases. We apply these procedures to the data of Choe and colleagues and obtain similar results.

     
    more » « less
  3. Abstract

    With advances in biomedical research, biomarkers are becoming increasingly important prognostic factors for predicting overall survival, while the measurement of biomarkers is often censored due to instruments' lower limits of detection. This leads to two types of censoring: random censoring in overall survival outcomes and fixed censoring in biomarker covariates, posing new challenges in statistical modeling and inference. Existing methods for analyzing such data focus primarily on linear regression ignoring censored responses or semiparametric accelerated failure time models with covariates under detection limits (DL). In this paper, we propose a quantile regression for survival data with covariates subject to DL. Comparing to existing methods, the proposed approach provides a more versatile tool for modeling the distribution of survival outcomes by allowing covariate effects to vary across conditional quantiles of the survival time and requiring no parametric distribution assumptions for outcome data. To estimate the quantile process of regression coefficients, we develop a novel multiple imputation approach based on another quantile regression for covariates under DL, avoiding stringent parametric restrictions on censored covariates as often assumed in the literature. Under regularity conditions, we show that the estimation procedure yields uniformly consistent and asymptotically normal estimators. Simulation results demonstrate the satisfactory finite‐sample performance of the method. We also apply our method to the motivating data from a study of genetic and inflammatory markers of Sepsis.

     
    more » « less
  4. Abstract

    Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision-making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper was to construct confidence intervals (CIs) for a policy’s value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient’s health status. A Python implementation of the proposed procedure is available at https://github.com/shengzhang37/SAVE.

     
    more » « less
  5. Abstract

    Many problems that appear in biomedical decision‐making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The support vector machine (SVM) is a popular classification technique that is robust to model misspecification and effectively handles high‐dimensional data. The relative costs of false positives and false negatives can vary across application domains. The receiving operating characteristic (ROC) curve provides a visual representation of the trade‐off between these two types of errors. Because the SVM does not produce a predicted probability, an ROC curve cannot be constructed in the traditional way of thresholding a predicted probability. However, a sequence of weighted SVMs can be used to construct an ROC curve. Although ROC curves constructed using weighted SVMs have great potential for allowing ROC curves analyses that cannot be done by thresholding predicted probabilities, their theoretical properties have heretofore been underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method using simulation studies. We present a predictive model for treatment response in breast cancer as an illustrative example.

     
    more » « less