skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leveraging the Fisher Randomization Test using Confidence Distributions: Inference, Combination and Fusion Learning
Abstract The flexibility and wide applicability of the Fisher randomization test (FRT) make it an attractive tool for assessment of causal effects of interventions from modern-day randomized experiments that are increasing in size and complexity. This paper provides a theoretical inferential framework for FRT by establishing its connection with confidence distributions. Such a connection leads to development’s of (i) an unambiguous procedure for inversion of FRTs to generate confidence intervals with guaranteed coverage, (ii) new insights on the effect of size of the Monte Carlo sample on the estimation of a p-value curve and (iii) generic and specific methods to combine FRTs from multiple independent experiments with theoretical guarantees. Our developments pertain to finite sample settings but have direct extensions to large samples. Simulations and a case example demonstrate the benefit of these new developments.  more » « less
Award ID(s):
1812048 2015373 2027855 1737857
PAR ID:
10398634
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
83
Issue:
4
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 777-797
Size(s):
p. 777-797
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract A critical task to better quantify changes in precipitation (P) mean and extreme statistics due to global warming is to gain insights into the underlying physical generating mechanisms (GMs). Here, the dominant GMs associated with daily P recorded at 2861 gauges in the Conterminous United States from 1980 to 2018 were identified from atmospheric reanalyses and publicly available datasets. The GMs include fronts (FRT), extratropical cyclones (ETC), atmospheric rivers (AR), tropical cyclones (TC), and North American Monsoon (NAM). Climatologies of the GM occurrences were developed for the nonzero P (NZP) and annual P maxima (APM) samples, characterizing the marginal and extreme P distributions, respectively. FRT is everywhere the most frequent (45-75%) GM of NZP followed by ETC (12-33%). The FRT contribution declines for APM (19-66%), which are dominated by AR (50-65%) in western regions and affected by TC (10-18%) in southern and eastern regions. The GM frequencies exhibit trends with the same signs over large regions, which are not statistically significant except for an increase in FRT (TC) frequency in the Northeast (central region). Two-sample tests showed well-defined spatial patterns with regions where (1) both the marginal and extreme P distributions of the two dominant GMs likely belong to different statistical populations, and (2) only the marginal or the extreme distributions could be considered statistically different. These results were interpreted throughL-moments and parametric distributions that adequately model NZP and APM frequency. This work provides useful insights to incorporate mixed populations and nonstationarity in P frequency analyses. 
    more » « less
  2. Summary A popular method for variance reduction in causal inference is propensity-based trimming, the practice of removing units with extreme propensities from the sample. This practice has theoretical grounding when the data are homoscedastic and the propensity model is parametric (Crump et al., 2009; Yang & Ding, 2018), but in modern settings where heteroscedastic data are analysed with nonparametric models, existing theory fails to support current practice. In this work, we address this challenge by developing new methods and theory for sample trimming. Our contributions are three-fold. First, we describe novel procedures for selecting which units to trim. Our procedures differ from previous works in that we trim, not only units with small propensities, but also units with extreme conditional variances. Second, we give new theoretical guarantees for inference after trimming. In particular, we show how to perform inference on the trimmed subpopulation without requiring that our regressions converge at parametric rates. Instead, we make only fourth-root rate assumptions like those in the double machine learning literature. This result applies to conventional propensity-based trimming as well, and thus may be of independent interest. Finally, we propose a bootstrap-based method for constructing simultaneously valid confidence intervals for multiple trimmed subpopulations, which are valuable for navigating the trade-off between sample size and variance reduction inherent in trimming. We validate our methods in simulation, on the 2007–2008 National Health and Nutrition Examination Survey and on a semisynthetic Medicare dataset, and find promising results in all settings. 
    more » « less
  3. null (Ed.)
    It is important to collect credible training samples $(x,y)$ for building data-intensive learning systems (e.g., a deep learning system). Asking people to report complex distribution $p(x)$, though theoretically viable, is challenging in practice. This is primarily due to the cognitive loads required for human agents to form the report of this highly complicated information. While classical elicitation mechanisms apply to eliciting a complex and generative (and continuous) distribution $p(x)$, we are interested in eliciting samples $$x_i \sim p(x)$$ from agents directly. We coin the above problem sample elicitation. This paper introduces a deep learning aided method to incentivize credible sample contributions from self-interested and rational agents. We show that with an accurate estimation of a certain $$f$$-divergence function we can achieve approximate incentive compatibility in eliciting truthful samples. We then present an efficient estimator with theoretical guarantees via studying the variational forms of the $$f$$-divergence function. We also show a connection between this sample elicitation problem and $$f$$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples. Experiments on synthetic data, MNIST, and CIFAR-10 datasets demonstrate that our mechanism elicits truthful samples. Our implementation is available at https://github.com/weijiaheng/Credible-sample-elicitation.git. 
    more » « less
  4. We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized α-investing framework which enables control of the marginal false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of α-wealth which motivates a consideration of sample size in the α-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected α-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where n is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests. 
    more » « less
  5. In many species, demographic assessments of population viability require an estimate of the number or proportion of breeding adults in a population that are male (the breeding sex ratio). However, this estimate is often difficult to obtain directly in species with multiple paternity when males are difficult to sample. Parentage analysis of breeding females and offspring can produce this estimate by identifying the number of unique males that contribute genetic information to (i.e., sired) a given cohort. There is an added challenge of choosing a sample design with the desired level of confidence to identify all the fathers contributing to a cohort, either at the scale of individual clutches or an entire nesting season, given limited resources. Sampling effort can be defined as the number of offspring sampled per clutch, or the number of clutches sampled per breeding season, depending on the analysis. The minimum number of samples required may depend on the proportions of eggs that different fathers fertilize in a clutch (the paternal contribution mode), the total number of fathers fertilizing a clutch, the proportion of adults available for breeding that are male (the operational sex ratio), and population size. We conducted power analyses to quantify the confidence in identifying all fathers in animal populations with multiple paternity. We simulated sampling a theoretical sea turtle population with a range of population demographics, mating systems, and sampling effort, and used the proportion of 10,000 simulations in which all fathers were identified as a proxy for confidence. At the clutch level, confidence was strongly dependent on the paternal contribution mode, and when it was skewed, it also depended on the total number of fathers contributing and the number of offspring sampled. However, sampling about one third of a clutch was sufficient to identify all fathers with high confidence for most scenarios, unless the paternal contribution mode was extremely skewed and there were many contributing fathers, such that some fathers fertilized very few eggs and were difficult to detect. At the scale of an entire nesting season, confidence was more strongly affected by the operational sex ratio, the proportion of clutches sampled, and the presence or absence of polygyny than by the lesser effects of paternal contribution mode and within-clutch sample size. Sampling fewer offspring from more clutches increased confidence compared to sampling more offspring from fewer clutches. Relaxing the minimum required proportion of fathers identified from 100% to 90% led to high confidence while sampling 50% to a maximum of 75% of clutches, depending on the mating system, even as the population size increased by an order of magnitude. Our approach and results can be widely informative for sample design as well as quantifying uncertainty in existing and future estimates of the number of breeding males in populations with multiple paternity. 
    more » « less