skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multivariable association discovery in population-scale meta-omics studies
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.  more » « less
Award ID(s):
2109688 2028280
PAR ID:
10314019
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Editor(s):
Coelho, Luis Pedro
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
11
ISSN:
1553-7358
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Latent Interacting Variable Effects (LIVE) modeling is a framework to integrate different types of microbiome multi-omics data by combining latent variables from single-omic models into a structured meta-model to determine discriminative, interacting multi-omics features driving disease status. We implemented and tested LIVE modeling in publicly available metagenomics and metabolomics datasets from Crohn’s Disease and Ulcerative Colitis patients. Here, LIVE modeling reduced the number of feature correlations from the original data set for CD and UC to tractable numbers and facilitated prioritization of biological associations between microbes, metabolites, enzymes and IBD status through the application of stringent thresholds on generated inferential statistics. We determined LIVE modeling confirmed previously reported IBD biomarkers and uncovered potentially novel disease mechanisms in IBD. LIVE modeling makes a distinct and complementary contribution to the current methods to integrate microbiome data to predict IBD status because of its flexibility to adapt to different types of microbiome multi-omics data, scalability for large and small cohort studies via reliance on latent variables and dimensionality reduction, and the intuitive interpretability of the linear meta-model integrating -omic data types. The results of LIVE modeling and the biological relationships can be represented in networks that connect local correlation structure of single omic data types with global community and omic structure in the latent variable VIP scores. This model arises as novel tool that allows researchers to be more selective about omic feature interaction without disrupting the structural correlation framework provided by sPLS-DA interaction effects modeling. It will lead to form testable hypothesis by identifying potential and unique interactions between metabolome and microbiome that must be considered for future studies. 
    more » « less
  2. Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth’s environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment. 
    more » « less
  3. Beiko, Robert G (Ed.)
    ABSTRACT Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke’sR2of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets. IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures. 
    more » « less
  4. The integration of multiple ‘omics’ datasets is a promising avenue for answering many important and challenging questions in biology, particularly those relating to complex ecological systems. Whereas, multi-omics was developed using data from model organisms with significant prior knowledge and resources, its application to non-model organisms, such as coral holobionts, is less clear-cut. We explore, in the emerging rice coral model Montipora capitata, the intersection of holobiont transcriptomic, proteomic, metabolomic, and microbiome amplicon data and investigate how well they correlate under high temperature treatment. Using a typical thermal stress regime, we show that transcriptomic and proteomic data broadly capture the stress response of the coral, whereas the metabolome and microbiome datasets show patterns that likely reflect stochastic and homeostatic processes associated with each sample. These results provide a framework for interpreting multi-omics data generated from non-model systems, particularly those with complex biotic interactions among microbial partners. 
    more » « less
  5. Auchtung, Jennifer M (Ed.)
    ABSTRACT Studies have suggested that phytochemicals in green tea have systemic anti-inflammatory and neuroprotective effects. However, the mechanisms behind these effects are poorly understood, possibly due to the differential metabolism of phytochemicals resulting from variations in gut microbiome composition. To unravel this complex relationship, our team utilized a novel combined microbiome analysis and metabolomics approach applied to low complexity microbiome (LCM) and human colonized (HU) gnotobiotic mice treated with an acute dose of powdered matcha green tea. A total of 20 LCM mice received 10 distinct human fecal slurries for ann= 2 mice per human gut microbiome; 9 LCM mice remained un-colonized with human slurries throughout the experiment. We performed untargeted metabolomics on green tea and plasma to identify green tea compounds that were found in the plasma of LCM and HU mice that had consumed green tea. 16S ribosomal RNA gene sequencing was performed on feces of all mice at study end to assess microbiome composition. We found multiple green tea compounds in plasma associated with microbiome presence and diversity (including acetylagmatine, lactiflorin, and aspartic acid negatively associated with diversity). Additionally, we detected strong associations between bioactive green tea compounds in plasma and specific gut bacteria, including associations between spiramycin andGemmigerand between wildforlide andAnaerorhabdus. Notably, some of the physiologically relevant green tea compounds are likely derived from plant-associated microbes, highlighting the importance of considering foods and food products as meta-organisms. Overall, we describe a novel workflow for discovering relationships between individual food compounds and the composition of the gut microbiome. IMPORTANCEFoods contain thousands of unique and biologically important compounds beyond the macro- and micro-nutrients listed on nutrition facts labels. In mammals, many of these compounds are metabolized or co-metabolized by the community of microbes in the colon. These microbes may impact the thousands of biologically important compounds we consume; therefore, understanding microbial metabolism of food compounds will be important for understanding how foods impact health. We used metabolomics to track green tea compounds in plasma of mice with and without complex microbiomes. From this, we can start to recognize certain groups of green tea-derived compounds that are impacted by mammalian microbiomes. This research presents a novel technique for understanding microbial metabolism of food-derived compounds in the gut, which can be applied to other foods. 
    more » « less