skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 17 until 8:00 AM ET on Saturday, May 18 due to maintenance. We apologize for the inconvenience.


Title: Robust Empirical Bayes Confidence Intervals
We construct robust empirical Bayes confidence intervals (EBCIs) in a normal means problem. The intervals are centered at the usual linear empirical Bayes estimator, but use a critical value accounting for shrinkage. Parametric EBCIs that assume a normal distribution for the means (Morris (1983b)) may substantially undercover when this assumption is violated. In contrast, our EBCIs control coverage regardless of the means distribution, while remaining close in length to the parametric EBCIs when the means are indeed Gaussian. If the means are treated as fixed, our EBCIs have an average coverage guarantee: the coverage probability is at least 1 −  α on average across the n EBCIs for each of the means. Our empirical application considers the effects of U.S. neighborhoods on intergenerational mobility.  more » « less
Award ID(s):
2049356 1851665
NSF-PAR ID:
10389999
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Econometrica
Volume:
90
Issue:
6
ISSN:
0012-9682
Page Range / eLocation ID:
2567 to 2602
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Benjamini and Yekutieli suggested that it is important to account for multiplicity correction for confidence intervals when only some of the selected intervals are reported. They introduced the concept of the false coverage rate (FCR) for confidence intervals which is parallel to the concept of the false discovery rate in the multiple-hypothesis testing problem and they developed confidence intervals for selected parameters which control the FCR. Their approach requires the FCR to be controlled in the frequentist’s sense, i.e. controlled for all the possible unknown parameters. In modern applications, the number of parameters could be large, as large as tens of thousands or even more, as in microarray experiments. We propose a less conservative criterion, the Bayes FCR, and study confidence intervals controlling it for a class of distributions. The Bayes FCR refers to the average FCR with respect to a distribution of parameters. Under such a criterion, we propose some confidence intervals, which, by some analytic and numerical calculations, are demonstrated to have the Bayes FCR controlled at level q for a class of prior distributions, including mixtures of normal distributions and zero, where the mixing probability is unknown. The confidence intervals are shrinkage-type procedures which are more efficient for the θis that have a sparsity structure, which is a common feature of microarray data. More importantly, the centre of the proposed shrinkage intervals reduces much of the bias due to selection. Consequently, the proposed empirical Bayes intervals are always shorter in average length than the intervals of Benjamini and Yekutieli and can be only 50% or 60% as long in some cases. We apply these procedures to the data of Choe and colleagues and obtain similar results.

     
    more » « less
  2. Summary

    We construct empirical Bayes intervals for a large number p of means. The existing intervals in the literature assume that variances σi2 are either equal or unequal but known. When the variances are unequal and unknown, the suggestion is typically to replace them by unbiased estimators Si2. However, when p is large, there would be advantage in ‘borrowing strength’ from each other. We derive double-shrinkage intervals for means on the basis of our empirical Bayes estimators that shrink both the means and the variances. Analytical and simulation studies and application to a real data set show that, compared with the t-intervals, our intervals have higher coverage probabilities while yielding shorter lengths on average. The double-shrinkage intervals are on average shorter than the intervals from shrinking the means alone and are always no longer than the intervals from shrinking the variances alone. Also, the intervals are explicitly defined and can be computed immediately.

     
    more » « less
  3. Abstract

    Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models.

     
    more » « less
  4. Summary Since the introduction of fiducial inference by Fisher in the 1930s, its application has been largely confined to relatively simple, parametric problems. In this paper, we present what might be the first time fiducial inference is systematically applied to estimation of a nonparametric survival function under right censoring. We find that the resulting fiducial distribution gives rise to surprisingly good statistical procedures applicable to both one-sample and two-sample problems. In particular, we use the fiducial distribution of a survival function to construct pointwise and curvewise confidence intervals for the survival function, and propose tests based on the curvewise confidence interval. We establish a functional Bernstein–von Mises theorem, and perform thorough simulation studies in scenarios with different levels of censoring. The proposed fiducial-based confidence intervals maintain coverage in situations where asymptotic methods often have substantial coverage problems. Furthermore, the average length of the proposed confidence intervals is often shorter than the length of confidence intervals for competing methods that maintain coverage. Finally, the proposed fiducial test is more powerful than various types of log-rank tests and sup log-rank tests in some scenarios. We illustrate the proposed fiducial test by comparing chemotherapy against chemotherapy combined with radiotherapy, using data from the treatment of locally unresectable gastric cancer. 
    more » « less
  5. Abstract

    When the dimension of data is comparable to or larger than the number of data samples, principal components analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer–Wolfowitz non-parametric maximum likelihood estimator for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs and iterative refinement using an approximate message passing (AMP) algorithm. In theoretical ‘spiked’ models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.

     
    more » « less