skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Generalized Fisher's Combination and Accurate P -Value Calculation under Dependence
Abstract Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.  more » « less
Award ID(s):
2113570 1812082
PAR ID:
10485788
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
2
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1159-1172
Size(s):
p. 1159-1172
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT This article introduces a comprehensive framework to adjust a discrete test statistic for improving its hypothesis testing procedure. The adjustment minimizes the Wasserstein distance to a null‐approximating continuous distribution, tackling some fundamental challenges inherent in combining statistical significances derived from discrete distributions. The related theory justifies Lancaster's mid‐p and mean‐value chi‐squared statistics for Fisher's combination as special cases. To counter the conservative nature of Lancaster's testing procedures, we propose an updated null‐approximating distribution. It is achieved by further minimizing the Wasserstein distance to the adjusted statistics within an appropriate distribution family. Specifically, in the context of Fisher's combination, we propose an optimal gamma distribution as a substitute for the traditionally used chi‐squared distribution. This new approach yields an asymptotically consistent test that significantly improves Type I error control and enhances statistical power. 
    more » « less
  2. Two-sample testing is a fundamental problem in statistics. While many powerful nonparametric methods exist for both the univariate and multivariate context, it is comparatively less common to see a framework for determining which data features lead to rejection of the null. In this paper, we propose a new nonparametric two-sample test named AUGUST, which incorporates a framework for interpretation while maintaining power comparable to existing methods. AUGUST tests for inequality in distribution up to a predetermined resolution using symmetry statistics from binary expansion. Designed for univariate and low to moderate-dimensional multivariate data, this construction allows us to understand distributional differences as a combination of fundamental orthogonal signals. Asymptotic theory for the test statistic facilitates p-value computation and power analysis, and an efficient algorithm enables computation on large data sets. In empirical studies, we show that our test has power comparable to that of popular existing methods, as well as greater power in some circumstances. We illustrate the interpretability of our method using NBA shooting data. 
    more » « less
  3. Schwartz, Russell (Ed.)
    Abstract Summary PERMANOVA (permutational multivariate analysis of variance based on distances) has been widely used for testing the association between the microbiome and a covariate of interest. Statistical significance is established by permutation, which is computationally intensive for large sample sizes. As large-scale microbiome studies, such as American Gut Project (AGP), become increasingly popular, a computationally efficient version of PERMANOVA is much needed. To achieve this end, we derive the asymptotic distribution of the PERMANOVA pseudo-F statistic and provide analytical P-value calculation based on chi-square approximation. We show that the asymptotic P-value is close to the PERMANOVA P-value even under a moderate sample size. Moreover, it is more accurate and an order-of-magnitude faster than the permutation-free method MDMR. We demonstrated the use of our procedure D-MANOVA on the AGP dataset. Availability and implementation D-MANOVA is implemented by the dmanova function in the CRAN package GUniFrac. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Summary Combining dependent $ p $-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, $ p $-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean $ p $-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of $ p $-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the $ p $-values. First, when $ p $-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided $ p $-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate $ t $-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where $ p $-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions. 
    more » « less
  5. Abstract This paper presents a new statistical method that enables the use of systematic errors in the maximum-likelihood regression of integer-count Poisson data to a parametric model. The method is primarily aimed at the characterization of the goodness-of-fit statistic in the presence of the over-dispersion that is induced by sources of systematic error, and is based on a quasi-maximum-likelihood method that retains the Poisson distribution of the data. We show that the Poisson deviance, which is the usual goodness-of-fit statistic and that is commonly referred to in astronomy as the Cash statistics, can be easily generalized in the presence of systematic errors, under rather general conditions. The method and the associated statistics are first developed theoretically, and then they are tested with the aid of numerical simulations and further illustrated with real-life data from astronomical observations. The statistical methods presented in this paper are intended as a simple general-purpose framework to include additional sources of uncertainty for the analysis of integer-count data in a variety of practical data analysis situations. 
    more » « less