skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Multisample estimation of bacterial composition matrices in metagenomics data
Summary Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity. This paper takes a multisample approach to estimation of bacterial abundances in order to borrow information across samples and across species. Empirical results from real datasets suggest that the composition matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient and Euclidean projection onto simplex space is developed. Theoretical upper bounds and the minimax lower bounds of the estimation errors, measured by the Kullback–Leibler divergence and the Frobenius norm, are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is applied to an analysis of a human gut microbiome dataset.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
75 to 92
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Host-associated microbiomes play important roles in host health and pathogen defense. In amphibians, the skin-associated microbiota can contribute to innate immunity with potential implications for disease management. Few studies have examined season-long temporal variation in the amphibian skin-associated microbiome, and the interactions between bacteria and fungi on amphibian skin remain poorly understood. We characterize season-long temporal variation in the skin-associated microbiome of the western tiger salamander ( Ambystoma mavortium ) for both bacteria and fungi between sites and across salamander life stages. Two hundred seven skin-associated microbiome samples were collected from salamanders at two Rocky Mountain lakes throughout the summer and fall of 2018, and 127 additional microbiome samples were collected from lake water and lake substrate. We used 16S rRNA and ITS amplicon sequencing with Bayesian Dirichlet-multinomial regression to estimate the relative abundances of bacterial and fungal taxa, test for differential abundance, examine microbial selection, and derive alpha diversity. We predicted the ability of bacterial communities to inhibit the amphibian chytrid fungus Batrachochytrium dendrobatidis ( Bd ), a cutaneous fungal pathogen, using stochastic character mapping and a database of Bd -inhibitory bacterial isolates. For both bacteria and fungi, we observed variation in community composition through time, between sites, and with salamander age and life stage. We further found that temporal trends in community composition were specific to each combination of salamander age, life stage, and lake. We found salamander skin to be selective for microbes, with many taxa disproportionately represented relative to the environment. Salamander skin appeared to select for predicted Bd -inhibitory bacteria, and we found a negative relationship between the relative abundances of predicted Bd -inhibitory bacteria and Bd . We hope these findings will assist in the conservation of amphibian species threatened by chytridiomycosis and other emerging diseases. 
    more » « less
  2. Abstract Objectives

    Establishment and development of the infant gastrointestinal microbiome (GIM) varies cross‐culturally and is thought to be influenced by factors such as gestational age, birth mode, diet, and antibiotic exposure. However, there is little data as to how the composition of infants' households may play a role, particularly from a cross‐cultural perspective. Here, we examined relationships between infant fecal microbiome (IFM) diversity/composition and infants' household size, number of siblings, and number of other household members.

    Materials and methods

    We analyzed 377 fecal samples from healthy, breastfeeding infants across 11 sites in eight different countries (Ethiopia, The Gambia, Ghana, Kenya, Peru, Spain, Sweden, and the United States). Fecal microbial community structure was determined by amplifying, sequencing, and classifying (to the genus level) the V1–V3 region of the bacterial 16S rRNA gene. Surveys administered to infants' mothers identified household members and composition.


    Our results indicated that household composition (represented by the number of cohabitating siblings and other household members) did not have a measurable impact on the bacterial diversity, evenness, or richness of the IFM. However, we observed that variation in household composition categories did correspond to differential relative abundances of specific taxa, namely:Lactobacillus,Clostridium,Enterobacter, andKlebsiella.


    This study, to our knowledge, is the largest cross‐cultural study to date examining the association between household composition and the IFM. Our results indicate that the social environment of infants (represented here by the proxy of household composition) may influence the bacterial composition of the infant GIM, although the mechanism is unknown. A higher number and diversity of cohabitants and potential caregivers may facilitate social transmission of beneficial bacteria to the infant gastrointestinal tract, by way of shared environment or through direct physical and social contact between the maternal–infant dyad and other household members. These findings contribute to the discussion concerning ways by which infants are influenced by their social environments and add further dimensionality to the ongoing exploration of social transmission of gut microbiota and the “old friends” hypothesis.

    more » « less
  3. Summary In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. We introduce a surprisingly simple, interpretable and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies. 
    more » « less
  4. Coelho, Luis Pedro (Ed.)
    It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles. 
    more » « less
  5. Hird, Sarah M. (Ed.)
    The gut microbiome provides vital functions for mammalian hosts, yet research on its variability and function across adult life spans and multiple generations is limited in large mammalian carnivores. Here, we used 16S rRNA gene and metagenomic high-throughput sequencing to profile the bacterial taxonomic composition, genomic diversity, and metabolic function of fecal samples collected from 12 wild spotted hyenas ( Crocuta crocuta ) residing in the Masai Mara National Reserve, Kenya, over a 23-year period spanning three generations. The metagenomic data came from four of these hyenas and spanned two 2-year periods. With these data, we determined the extent to which host factors predicted variation in the gut microbiome and identified the core microbes present in the guts of hyenas. We also investigated novel genomic diversity in the mammalian gut by reporting the first metagenome-assembled genomes (MAGs) for hyenas. We found that gut microbiome taxonomic composition varied temporally, but despite this, a core set of 14 bacterial genera were identified. The strongest predictors of the microbiome were host identity and age, suggesting that hyenas possess individualized microbiomes and that these may change with age during adulthood. The gut microbiome functional profiles of the four adult hyenas were also individual specific and were associated with prey abundance, indicating that the functions of the gut microbiome vary with host diet. We recovered 149 high-quality MAGs from the hyenas’ guts; some MAGs were classified as taxa previously reported for other carnivores, but many were novel and lacked species-level matches to genomes in existing reference databases. IMPORTANCE There is a gap in knowledge regarding the genomic diversity and variation of the gut microbiome across a host’s life span and across multiple generations of hosts in wild mammals. Using two types of sequencing approaches, we found that although gut microbiomes were individualized and temporally variable among hyenas, they correlated similarly to large-scale changes in the ecological conditions experienced by their hosts. We also recovered 149 high-quality MAGs from the hyena gut, greatly expanding the microbial genome repertoire known for hyenas, carnivores, and wild mammals in general. Some MAGs came from genera abundant in the gastrointestinal tracts of canid species and other carnivores, but over 80% of MAGs were novel and from species not previously represented in genome databases. Collectively, our novel body of work illustrates the importance of surveying the gut microbiome of nonmodel wild hosts, using multiple sequencing methods and computational approaches and at distinct scales of analysis. 
    more » « less