skip to main content


Title: Hypothesis testing for phylogenetic composition: a minimum-cost flow perspective
Summary Quantitative comparison of microbial composition from different populations is a fundamental task in various microbiome studies. We consider two-sample testing for microbial compositional data by leveraging phylogenetic information. Motivated by existing phylogenetic distances, we take a minimum-cost flow perspective to study such testing problems. We first show that multivariate analysis of variance with permutation using phylogenetic distances, one of the most commonly used methods in practice, is essentially a sum-of-squares type of test and has better power for dense alternatives. However, empirical evidence from real datasets suggests that the phylogenetic microbial composition difference between two populations is usually sparse. Motivated by this observation, we propose a new maximum type test, detector of active flow on a tree, and investigate its properties. We show that the proposed method is particularly powerful against sparse phylogenetic composition difference and enjoys certain optimality. The practical merit of the proposed method is demonstrated by simulation studies and an application to a human intestinal biopsy microbiome dataset on patients with ulcerative colitis.  more » « less
Award ID(s):
2015259
NSF-PAR ID:
10340068
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biometrika
Volume:
108
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
17 to 36
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV‐infected patients and reveal two interesting generaPrevotellaandVeillonellaassociated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open‐source, high‐performance computing languageJuliaand is freely available athttps://github.com/JingZhai63/VCmicrobiome.

     
    more » « less
  2. Abstract

    A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine‐mapping of the microbiome, we propose a two‐step compositional knockoff filter to provide the effective finite‐sample false discovery rate (FDR) control in high‐dimensional linear log‐contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum‐to‐zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high‐dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two‐step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.

     
    more » « less
  3. Abstract Background and Aims

    Sphagnum (peatmoss) comprises a moss (Bryophyta) clade with ~300–500 species. The genus has unparalleled ecological importance because Sphagnum-dominated peatlands store almost a third of the terrestrial carbon pool and peatmosses engineer the formation and microtopography of peatlands. Genomic resources for Sphagnum are being actively expanded, but many aspects of their biology are still poorly known. Among these are the degree to which Sphagnum species reproduce asexually, and the relative frequencies of male and female gametophytes in these haploid-dominant plants. We assess clonality and gametophyte sex ratios and test hypotheses about the local-scale distribution of clones and sexes in four North American species of the S. magellanicum complex. These four species are difficult to distinguish morphologically and are very closely related. We also assess microbial communities associated with Sphagnum host plant clones and sexes at two sites.

    Methods

    Four hundred and five samples of the four species, representing 57 populations, were subjected to restriction site-associated DNA sequencing (RADseq). Analyses of population structure and clonality based on the molecular data utilized both phylogenetic and phenetic approaches. Multi-locus genotypes (genets) were identified using the RADseq data. Sexes of sampled ramets were determined using a molecular approach that utilized coverage of loci on the sex chromosomes after the method was validated using a sample of plants that expressed sex phenotypically. Sex ratios were estimated for each species, and populations within species. Difference in fitness between genets was estimated as the numbers of ramets each genet comprised. Degrees of clonality [numbers of genets/numbers of ramets (samples)] within species, among sites, and between gametophyte sexes were estimated. Sex ratios were estimated for each species, and populations within species. Sphagnum-associated microbial communities were assessed at two sites in relation to Sphagnum clonality and sex.

    Key Results

    All four species appear to engage in a mixture of sexual and asexual (clonal) reproduction. A single ramet represents most genets but two to eight ramets were dsumbers ansd text etected for some genets. Only one genet is represented by ramets in multiple populations; all other genets are restricted to a single population. Within populations ramets of individual genets are spatially clustered, suggesting limited dispersal even within peatlands. Sex ratios are male-biased in S. diabolicum but female-biased in the other three species, although significantly so only in S. divinum. Neither species nor males/females differ in levels of clonal propagation. At St Regis Lake (NY) and Franklin Bog (VT), microbial community composition is strongly differentiated between the sites, but differences between species, genets and sexes were not detected. Within S. divinum, however, female gametophytes harboured two to three times the number of microbial taxa as males.

    Conclusions

    These four Sphagnum species all exhibit similar reproductive patterns that result from a mixture of sexual and asexual reproduction. The spatial patterns of clonally replicated ramets of genets suggest that these species fall between the so-called phalanx patterns, where genets abut one another but do not extensively mix because of limited ramet fragmentation, and the guerrilla patterns, where extensive genet fragmentation and dispersal result in greater mixing of different genets. Although sex ratios in bryophytes are most often female-biased, both male and female biases occur in this complex of closely related species. The association of far greater microbial diversity for female gametophytes in S. divinum, which has a female-biased sex ratio, suggests additional research to determine if levels of microbial diversity are consistently correlated with differing patterns of sex ratio biases.

     
    more » « less
  4. We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Importance Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA amplicon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies. 
    more » « less
  5. Abstract

    Research on animal microbiomes is increasingly aimed at determining the evolutionary and ecological factors that govern host–microbiome dynamics, which are invariably intertwined and potentially synergistic. We present three empirical studies related to this topic, each of which relies on the diversity of Malagasy lemurs (representing a total of 19 species) and the comparative approach applied across scales of analysis. In Study 1, we compare gut microbial membership across 14 species in the wild to test the relative importance of host phylogeny and feeding strategy in mediating microbiome structure. Whereas host phylogeny strongly predicted community composition, the same feeding strategies shared by distant relatives did not produce convergent microbial consortia, but rather shaped microbiomes in host lineage‐specific ways, particularly in folivores. In Study 2, we compare 14 species of wild and captive folivores, frugivores, and omnivores, to highlight the importance of captive populations for advancing gut microbiome research. We show that the perturbational effect of captivity is mediated by host feeding strategy and can be mitigated, in part, by modified animal management. In Study 3, we examine various scent‐gland microbiomes across three species in the wild or captivity and show them to vary by host species, sex, body site, and a proxy of social status. These rare data provide support for the bacterial fermentation hypothesis in olfactory signal production and implicate steroid hormones as mediators of microbial community structure. We conclude by discussing the role of scale in comparative microbial studies, the links between feeding strategy and host–microbiome coadaptation, the underappreciated benefits of captive populations for advancing conservation research, and the need to consider the entirety of an animal's microbiota. Ultimately, these studies will help move the field from exploratory to hypothesis‐driven research.

     
    more » « less