skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: Empirical Bayes False Coverage Rate Controlling Confidence Intervals
Summary

Benjamini and Yekutieli suggested that it is important to account for multiplicity correction for confidence intervals when only some of the selected intervals are reported. They introduced the concept of the false coverage rate (FCR) for confidence intervals which is parallel to the concept of the false discovery rate in the multiple-hypothesis testing problem and they developed confidence intervals for selected parameters which control the FCR. Their approach requires the FCR to be controlled in the frequentist’s sense, i.e. controlled for all the possible unknown parameters. In modern applications, the number of parameters could be large, as large as tens of thousands or even more, as in microarray experiments. We propose a less conservative criterion, the Bayes FCR, and study confidence intervals controlling it for a class of distributions. The Bayes FCR refers to the average FCR with respect to a distribution of parameters. Under such a criterion, we propose some confidence intervals, which, by some analytic and numerical calculations, are demonstrated to have the Bayes FCR controlled at level q for a class of prior distributions, including mixtures of normal distributions and zero, where the mixing probability is unknown. The confidence intervals are shrinkage-type procedures which are more efficient for the θis that have a sparsity structure, which is a common feature of microarray data. More importantly, the centre of the proposed shrinkage intervals reduces much of the bias due to selection. Consequently, the proposed empirical Bayes intervals are always shorter in average length than the intervals of Benjamini and Yekutieli and can be only 50% or 60% as long in some cases. We apply these procedures to the data of Choe and colleagues and obtain similar results.

 
more » « less
NSF-PAR ID:
10401216
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
74
Issue:
5
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 871-891
Size(s):
["p. 871-891"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In large‐scale problems, it is common practice to select important parameters by a procedure such as the Benjamini and Hochberg procedure and construct confidence intervals (CIs) for further investigation while the false coverage‐statement rate (FCR) for the CIs is controlled at a desired level. Although the well‐known BY CIs control the FCR, they are uniformly inflated. In this paper, we propose two methods to construct shorter selective CIs. The first method produces shorter CIs by allowing a reduced number of selective CIs. The second method produces shorter CIs by allowing a prefixed proportion of CIs containing the values of uninteresting parameters. We theoretically prove that the proposed CIs are uniformly shorter than BY CIs and control the FCR asymptotically for independent data. Numerical results confirm our theoretical results and show that the proposed CIs still work for correlated data. We illustrate the advantage of the proposed procedures by analyzing the microarray data from a HIV study.

     
    more » « less
  2. null (Ed.)
    The reproducibility debate has caused a renewed interest in changing how one reports uncertainty, from 𝑝-value for testing a null hypothesis to a confidence interval (CI) for the corresponding parameter. When CIs for multiple selected parameters are being reported, the analog of the false discovery rate (FDR) is the false coverage rate (FCR), which is the expected ratio of number of reported CIs failing to cover their respective parameters to the total number of reported CIs. Here, we consider the general problem of FCR control in the online setting, where one encounters an infinite sequence of fixed unknown parameters ordered by time. We propose a novel solution to the problem which only requires the scientist to be able to construct marginal CIs. As special cases, our framework yields algorithms for online FDR control and online sign-classification procedures that control the false sign rate (FSR). All of our methodology applies equally well to prediction intervals, having particular implications for selective conformal inference. 
    more » « less
  3. Abstract

    In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${\boldsymbol \theta }$ under the normal mean model ${\boldsymbol X}\sim N({\boldsymbol \theta },\sigma ^{2}\bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${\boldsymbol \theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${\boldsymbol \theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${\boldsymbol \theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of $\{1,...,d\}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $\theta _{j}$’s. To exploit the sparsity of ${\boldsymbol \theta }$, we allow the confidence interval for $\theta _{j}$ to degenerate to a single point 0 for any $j\notin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.

     
    more » « less
  4. Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the p-values to ensure differential privacy and to select an approximately smallest p-value serving as a promising candidate at each iteration; the selected p-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step- up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor. 
    more » « less
  5. Abstract Background

    No versatile web app exists that allows epidemiologists and managers around the world to comprehensively analyze the impacts of COVID-19 mitigation. Thehttp://covid-webapp.numerusinc.com/web app presented here fills this gap.

    Methods

    Our web app uses a model that explicitly identifies susceptible, contact, latent, asymptomatic, symptomatic and recovered classes of individuals, and a parallel set of response classes, subject to lower pathogen-contact rates. The user inputs a CSV file of incidence and, if of interest, mortality rate data. A default set of parameters is available that can be overwritten through input or online entry, and a user-selected subset of these can be fitted to the model using maximum-likelihood estimation (MLE). Model fitting and forecasting intervals are specifiable and changes to parameters allow counterfactual and forecasting scenarios. Confidence or credible intervals can be generated using stochastic simulations, based on MLE values, or on an inputted CSV file containing Markov chain Monte Carlo (MCMC) estimates of one or more parameters.

    Results

    We illustrate the use of our web app in extracting social distancing, social relaxation, surveillance or virulence switching functions (i.e., time varying drivers) from the incidence and mortality rates of COVID-19 epidemics in Israel, South Africa, and England. The Israeli outbreak exhibits four distinct phases: initial outbreak, social distancing, social relaxation, and a second wave mitigation phase. An MCMC projection of this latter phase suggests the Israeli epidemic will continue to produce into late November an average of around 1500 new case per day, unless the population practices social-relaxation measures at least 5-fold below the level in August, which itself is 4-fold below the level at the start of July. Our analysis of the relatively late South African outbreak that became the world’s fifth largest COVID-19 epidemic in July revealed that the decline through late July and early August was characterised by a social distancing driver operating at more than twice the per-capita applicable-disease-class (pc-adc) rate of the social relaxation driver. Our analysis of the relatively early English outbreak, identified a more than 2-fold improvement in surveillance over the course of the epidemic. It also identified a pc-adc social distancing rate in early August that, though nearly four times the pc-adc social relaxation rate, appeared to barely contain a second wave that would break out if social distancing was further relaxed.

    Conclusion

    Our web app provides policy makers and health officers who have no epidemiological modelling or computer coding expertise with an invaluable tool for assessing the impacts of different outbreak mitigation policies and measures. This includes an ability to generate an epidemic-suppression or curve-flattening index that measures the intensity with which behavioural responses suppress or flatten the epidemic curve in the region under consideration.

     
    more » « less