skip to main content

Title: Exact variance component tests for longitudinal microbiome studies

In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV‐infected patients and reveal two interesting generaPrevotellaandVeillonellaassociated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open‐source, high‐performance computing languageJuliaand is freely available at

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Genetic Epidemiology
Page Range / eLocation ID:
p. 250-262
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The quantification of Hutchinson's n‐dimensional hypervolume has enabled substantial progress in community ecology, species niche analysis and beyond. However, most existing methods do not support a partitioning of the different components of hypervolume. Such a partitioning is crucial to address the ‘curse of dimensionality’ in hypervolume measures and interpret the metrics on the original niche axes instead of principal components. Here, we propose the use of multivariate normal distributions for the comparison of niche hypervolumes and introduce this as the multivariate‐normal hypervolume (MVNH) framework (R package available on

    The framework provides parametric measures of the size and dissimilarity of niche hypervolumes, each of which can be partitioned into biologically interpretable components. Specifically, the determinant of the covariance matrix (i.e. the generalized variance) of a MVNH is a measure of total niche size, which can be partitioned into univariate niche variance components and a correlation component (a measure of dimensionality, i.e. the effective number of independent niche axes standardized by the number of dimensions). The Bhattacharyya distance (BD; a function of the geometric mean of two probability distributions) between two MVNHs is a measure of niche dissimilarity. The BD partitions total dissimilarity into the components of Mahalanobis distance (standardized Euclidean distance with correlated variables) between hypervolume centroids and the determinant ratio which measures hypervolume size difference. The Mahalanobis distance and determinant ratio can be further partitioned into univariate divergences and a correlation component.

    We use empirical examples of community‐ and species‐level analysis to demonstrate the new insights provided by these metrics. We show that the newly proposed framework enables us to quantify the relative contributions of different hypervolume components and to connect these analyses to the ecological drivers of functional diversity and environmental niche variation.

    Our approach overcomes several operational and computational limitations of popular nonparametric methods and provides a partitioning framework that has wide implications for understanding functional diversity, niche evolution, niche shifts and expansion during biotic invasions, etc.

    more » « less
  2. Background

    Cognitive training may partially reverse cognitive deficits in people with HIV (PWH). Previous functional MRI (fMRI) studies demonstrate that working memory training (WMT) alters brain activity during working memory tasks, but its effects on resting brain network organization remain unknown.


    To test whether WMT affects PWH brain functional connectivity in resting‐state fMRI (rsfMRI).

    Study Type



    A total of 53 PWH (ages 50.7 ± 1.5 years, two women) and 53HIV‐seronegative controls (SN, ages 49.5 ± 1.6 years, six women).

    Field Strength/Sequence

    Axial single‐shot gradient‐echo echo‐planar imaging at 3.0 T was performed at baseline (TL1), at 1‐month (TL2), and at 6‐months (TL3), after WMT.


    All participants had rsfMRI and clinical assessments (including neuropsychological tests) at TL1 before randomization to Cogmed WMT (adaptive training,n = 58: 28 PWH, 30 SN; nonadaptive training,n = 48: 25 PWH, 23 SN), 25 sessions over 5–8 weeks. All assessments were repeated at TL2 and at TL3. The functional connectivity estimated by independent component analysis (ICA) or graph theory (GT) metrics (eigenvector centrality, etc.) for different link densities (LDs) were compared between PWH and SN groups at TL1 and TL2.

    Statistical Tests

    Two‐way analyses of variance (ANOVA) on GT metrics and two‐samplet‐tests on FC or GT metrics were performed. Cognitive (eg memory) measures were correlated with eigenvector centrality (eCent) using Pearson's correlations. The significance level was set atP < 0.05 after false discovery rate correction.


    The ventral default mode network (vDMN) eCent differed between PWH and SN groups at TL1 but not at TL2 (P = 0.28). In PWH, vDMN eCent changes significantly correlated with changes in the memory ability in PWH (r = −0.62 at LD = 50%) and vDMN eCent before training significantly correlated with memory performance changes (r = 0.53 at LD = 50%).

    Data Conclusion

    ICA and GT analyses showed that adaptive WMT normalized graph properties of the vDMN in PWH.

    Evidence Level


    Technical Efficacy


    more » « less
  3. Abstract

    One of the Grand Challenges in Science is the construction of theTree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics forNP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees isNP-hard. Exact-RFS-2 is available in open source form on Github at

    more » « less
  4. Summary Open Research Badges

    This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at;

    more » « less
  5. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.


    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, or single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.


    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at

    more » « less