skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: D-MANOVA: fast distance-based multivariate analysis of variance for large-scale microbiome association studies
Abstract Summary PERMANOVA (permutational multivariate analysis of variance based on distances) has been widely used for testing the association between the microbiome and a covariate of interest. Statistical significance is established by permutation, which is computationally intensive for large sample sizes. As large-scale microbiome studies, such as American Gut Project (AGP), become increasingly popular, a computationally efficient version of PERMANOVA is much needed. To achieve this end, we derive the asymptotic distribution of the PERMANOVA pseudo-F statistic and provide analytical P-value calculation based on chi-square approximation. We show that the asymptotic P-value is close to the PERMANOVA P-value even under a moderate sample size. Moreover, it is more accurate and an order-of-magnitude faster than the permutation-free method MDMR. We demonstrated the use of our procedure D-MANOVA on the AGP dataset. Availability and implementation D-MANOVA is implemented by the dmanova function in the CRAN package GUniFrac. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1830392 2113360 2113359
PAR ID:
10282829
Author(s) / Creator(s):
;
Editor(s):
Schwartz, Russell
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract In microbiome analysis, researchers often seek to identify taxonomic features associated with an outcome of interest. However, microbiome features are intercorrelated and linked by phylogenetic relationships, making it challenging to assess the association between an individual feature and an outcome. This paper proposes a novel conditional association test, CAT, that can account for other features and phylogenetic relatedness when testing the association between a feature and an outcome. CAT adopts a permutation approach, measuring the importance of a feature in predicting the outcome by permuting operational taxonomic unit/amplicon sequence variant counts belonging to that feature from the data and quantifying how much the association with the outcome is weakened through the change in the coefficient of determination $$R^{2}$$. Compared with marginal association tests, it focuses on the added value of a feature in explaining outcome variation that is not captured by other features. By leveraging global tests including PERMANOVA and MiRKAT-based methods, CAT allows association testing for continuous, binary, categorical, count, survival, and correlated outcomes. We demonstrate through simulation studies that CAT can provide a direct quantification of feature importance that is distinct from that of marginal association tests, and illustrate CAT with applications to two real-world studies on the microbiome in melanoma patients: one examining the role of the microbiome in shaping immunotherapy response, and one investigating the association between the microbiome and survival outcomes. Our results illustrate the potential of CAT to inform the design of microbiome interventions aimed at improving clinical outcomes. 
    more » « less
  2. Abstract SummaryDue to the sparsity and high dimensionality, microbiome data are routinely summarized into pairwise distances capturing the compositional differences. Many biological insights can be gained by analyzing the distance matrix in relation to some covariates. A microbiome sampling method that characterizes the inter-sample relationship more reproducibly is expected to yield higher statistical power. Traditionally, the intraclass correlation coefficient (ICC) has been used to quantify the degree of reproducibility for a univariate measurement using technical replicates. In this work, we extend the traditional ICC to distance measures and propose a distance-based ICC (dICC). We derive the asymptotic distribution of the sample-based dICC to facilitate statistical inference. We illustrate dICC using a real dataset from a metagenomic reproducibility study. Availability and implementationdICC is implemented in the R CRAN package GUniFrac. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. We are interested in testing general linear hypotheses in a high-dimensional multivariate linear regression model. The framework includes many well-studied problems such as two-sample tests for equality of population means, MANOVA and others as special cases. A family of rotation-invariant tests is proposed that involves a flexible spectral shrinkage scheme applied to the sample error covariance matrix. The asymptotic normality of the test statistic under the null hypothesis is derived in the setting where dimensionality is comparable to sample sizes, assuming the existence of certain moments for the observations. The asymptotic power of the proposed test is studied under various local alternatives. The power characteristics are then utilized to propose a data-driven selection of the spectral shrinkage function. As an illustration of the general theory, we construct a family of tests involving ridge-type regularization and suggest possible extensions to more complex regularizers. A simulation study is carried out to examine the numerical performance of the proposed tests. 
    more » « less
  4. Abstract BackgroundChildren are less susceptible to SARS-CoV-2 infection and typically have milder illness courses than adults, but the factors underlying these age-associated differences are not well understood. The upper respiratory microbiome undergoes substantial shifts during childhood and is increasingly recognized to influence host defense against respiratory pathogens. Thus, we sought to identify upper respiratory microbiome features associated with SARS-CoV-2 infection susceptibility and illness severity. MethodsWe collected clinical data and nasopharyngeal swabs from 285 children, adolescents, and young adults (<21 years) with documented SARS-CoV-2 exposure. We used 16S ribosomal RNA gene sequencing to characterize the nasopharyngeal microbiome and evaluated for age-adjusted associations between microbiome characteristics and SARS-CoV-2 infection status and respiratory symptoms. ResultsNasopharyngeal microbiome composition varied with age (PERMANOVA, P < .001; R2 = 0.06) and between SARS-CoV-2–infected individuals with and without respiratory symptoms (PERMANOVA, P  = .002; R2 = 0.009). SARS-CoV-2–infected participants with Corynebacterium/Dolosigranulum-dominant microbiome profiles were less likely to have respiratory symptoms than infected participants with other nasopharyngeal microbiome profiles (OR: .38; 95% CI: .18–.81). Using generalized joint attributed modeling, we identified 9 bacterial taxa associated with SARS-CoV-2 infection and 6 taxa differentially abundant among SARS-CoV-2–infected participants with respiratory symptoms; the magnitude of these associations was strongly influenced by age. ConclusionsWe identified interactive relationships between age and specific nasopharyngeal microbiome features that are associated with SARS-CoV-2 infection susceptibility and symptoms in children, adolescents, and young adults. Our data suggest that the upper respiratory microbiome may be a mechanism by which age influences SARS-CoV-2 susceptibility and illness severity. 
    more » « less
  5. Blow flies (Lucilia sericataandPhormia regina) are necrophagous insects that interact with dense microbial reservoirs and are opportunistic vectors of human and animal pathogens. Despite constant exposure to diverse environmental microbes, it is unclear whether their bacterial communities are primarily acquired stochastically or shaped by host factors that could influence pathogen carriage. We conducted a systematic comparison of wildL. sericataandP. reginacollected from seven cities across an urban-rural gradient to determine whether microbiome composition is structured by host species identity or environmental variables. Using 16S rRNA gene sequencing of individual flies, we profiled bacterial communities and applied alpha- and beta-diversity analyses, PERMANOVA, and Random Forest classification to quantify species-level microbiome differentiation. Species identity was the strongest predictor of microbiome composition (PERMANOVA,p = 0.001), while location, land cover type, sampling month, and sex had no significant effects. Random Forest modeling identified multiple bacterial taxa that consistently distinguished the two species, includingIgnatzschineriaandDysgonomonas, which were enriched inP. regina, andVagococcusandEscherichia-Shigella, which were enriched inL. sericata. These taxa are of clinical relevance, withIgnatzschineriain particular increasingly reported from human myiasis and soft-tissue infections, sometimes exhibiting antimicrobial resistance. Our findings demonstrate that wild blow flies maintain species-specific microbiomes despite shared environments, suggesting that host identity strongly filters microbial communities. The presence of opportunistic pathogens within these structured microbiomes underscores the need to understand how blow fly–microbe associations contribute to pathogen persistence and dissemination. By revealing predictable, species-dependent microbiome patterns, this study highlights potential targets for microbiome-based strategies aimed at mitigating blow fly–associated disease risks. 
    more » « less