skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A unified combination framework for dependent tests with applications to microbiome association studies
ABSTRACT We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating P-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing P-value combination methods, including the vanilla Cauchy combination method and other methods, the proposed combination framework is flexible and can be adapted to handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to the microbiome association studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.  more » « less
Award ID(s):
2319260 2311064
PAR ID:
10569297
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
81
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Hence, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This article introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. The proposed tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using weighted sum and maximum approaches to capture both dense and sparse signals. The significance level of the test statistics is determined by calibration through a nearest-neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when these are available. The practical advantages of the proposed framework are demonstrated through numerical experiments using both simulated and real microbiome datasets. 
    more » « less
  2. Summary Combining dependent $ p $-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, $ p $-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean $ p $-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of $ p $-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the $ p $-values. First, when $ p $-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided $ p $-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate $ t $-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where $ p $-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions. 
    more » « less
  3. Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm. 
    more » « less
  4. Abstract Standard protocols for meta-analysis of association studies are inadequate for microbiome data due to their complex compositional structure, leading to inaccurate and unstable microbial signature selection. To address this issue, we introduce Melody, a framework that generates, harmonizes, and combines study-specific summary association statistics to powerfully and robustly identify microbial signatures in meta-analysis. Comprehensive and realistic simulations demonstrate that Melody substantially outperforms existing approaches in prioritizing true signatures. In the meta-analyses of five studies on colorectal cancer and eight studies on the gut metabolome, we showcase the superior stability, reliability, and predictive performance of Melody-identified signatures. 
    more » « less
  5. A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators M from noisy vector measurements y=Mx+w, where the covariates x may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, 𝙳𝚎-𝚋𝚒𝚊𝚜 & 𝙵𝚎𝚊𝚝𝚞𝚛𝚎-𝚆𝚑𝚒𝚝𝚎𝚗 (𝙳𝙵𝚆), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the total source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of 𝙳𝙵𝚆 on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems. 
    more » « less