Microbial communities are often composed of taxa from different taxonomic groups. The associations among the constituent members in a microbial community play an important role in determining the functional characteristics of the community, and these associations can be modeled using an edge weighted graph (microbial network). A microbial network is typically inferred from a sample–taxa matrix that is obtained by sequencing multiple biological samples and identifying the taxa abundance in each sample. Motivated by microbiome studies that involve a large number of samples collected across a range of study parameters, here we consider the computational problem of identifying the number of microbial networks underlying the observed sample-taxa abundance matrix. Specifically, we consider the problem of determing the number of sparse microbial networks in this setting. We use a mixture model framework to address this problem, and present formulations to model both count data and proportion data. We propose several variational approximation based algorithms that allow the incorporation of the sparsity constraint while estimating the number of components in the mixture model. We evaluate these algorithms on a large number of simulated datasets generated using a collection of different graph structures (band, hub, cluster, random, and scale-free).
more »
« less
Variational Approximation based Model Selection for Microbial Network Inference
Microbial associations are characterized by both direct and indirect interactions between the constituent taxa in a microbial community, and play an important role in determining the structure, organization, and function of the community. Microbial associations can be represented using a weighted graph (microbial network) whose nodes represent taxa and edges represent pairwise associations. A microbial network is typically inferred from a sample-taxa matrix that is obtained by sequencing multiple biological samples and identifying the taxa counts in each sample. However, it is known that microbial associations are impacted by environmental and/or host factors. Thus, a sample-taxa matrix generated in a microbiome study involving a wide range of values for the environmental and/or clinical metadata variables may in fact be associated with more than one microbial network. Here we consider the problem of inferring multiple microbial networks from a given sample-taxa count matrix. Each sample is a count vector assumed to be generated by a mixture model consisting of component distributions that are Multivariate Poisson Log-Normal. We present a variational Expectation Maximization algorithm for the model selection problem to infer the correct number of components of this mixture model. Our approach involves reframing the mixture model as a latent variable model, treating only the mixing coefficients as parameters, and subsequently approximating the marginal likelihood using an evidence lower bound framework. Our algorithm is evaluated on a large simulated dataset generated using a collection of different graph structures (band, hub, cluster, random, and scale-free).
more »
« less
- Award ID(s):
- 2051283
- PAR ID:
- 10323782
- Editor(s):
- Singh, Mona
- Date Published:
- Journal Name:
- Journal of computational biology
- ISSN:
- 1066-5277
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Host-associated microbial communities are shaped by extrinsic and intrinsic factors to the holobiont organism. Environmental factors and microbe-microbe interactions act simultaneously on the microbial community structure, making the microbiome dynamics challenging to predict. The coral microbiome is essential to the health of coral reefs and sensitive to environmental changes. Here, we develop a dynamic model to determine the microbial community structure associated with the surface mucus layer (SML) of corals using temperature as an extrinsic factor and microbial network as an intrinsic factor. The model was validated by comparing the predicted relative abundances of microbial taxa to the relative abundances of microbial taxa from the sample data. The SML microbiome from Pseudodiploria strigosa was collected across reef zones in Bermuda, where inner and outer reefs are exposed to distinct thermal profiles. A shotgun metagenomics approach was used to describe the taxonomic composition and the microbial network of the coral SML microbiome. By simulating the annual temperature fluctuations at each reef zone, the model output is statistically identical to the observed data. The model was further applied to six scenarios that combined different profiles of temperature and microbial network to investigate the influence of each of these two factors on the model accuracy. The SML microbiome was best predicted by model scenarios with the temperature profile that was closest to the local thermal environment, regardless of the microbial network profile. Our model shows that the SML microbiome of P. strigosa in Bermuda is primarily structured by seasonal fluctuations in temperature at a reef scale, while the microbial network is a secondary driver. IMPORTANCE Coral microbiome dysbiosis (i.e., shifts in the microbial community structure or complete loss of microbial symbionts) caused by environmental changes is a key player in the decline of coral health worldwide. Multiple factors in the water column and the surrounding biological community influence the dynamics of the coral microbiome. However, by including only temperature as an external factor, our model proved to be successful in describing the microbial community associated with the surface mucus layer (SML) of the coral P. strigosa . The dynamic model developed and validated in this study is a potential tool to predict the coral microbiome under different temperature conditions.more » « less
-
Abstract Linking sequence-derived microbial taxa abundances to host (patho-)physiology or habitat characteristics in a reproducible and interpretable manner has remained a formidable challenge for the analysis of microbiome survey data. Here, we introduce a flexible probabilistic modeling framework, VI-MIDAS (variational inference for microbiome survey data analysis), that enables joint estimation of context-dependent drivers and broad patterns of associations of microbial taxon abundances from microbiome survey data. VI-MIDAS comprises mechanisms for direct coupling of taxon abundances with covariates and taxa-specific latent coupling, which can incorporate spatio-temporal information and taxon–taxon interactions. We leverage mean-field variational inference for posterior VI-MIDAS model parameter estimation and illustrate model building and analysis using Tara Ocean Expedition survey data. Using VI-MIDAS’ latent embedding model and tools from network analysis, we show that marine microbial communities can be broadly categorized into five modules, including SAR11-, nitrosopumilus-, and alteromondales-dominated communities, each associated with specific environmental and spatiotemporal signatures. VI-MIDAS also finds evidence for largely positive taxon–taxon associations in SAR11 or Rhodospirillales clades, and negative associations with Alteromonadales and Flavobacteriales classes. Our results indicate that VI-MIDAS provides a powerful integrative statistical analysis framework for discovering broad patterns of associations between microbial taxa and context-specific covariate data from microbiome survey data.more » « less
-
We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.more » « less
-
We study a matrix completion problem that lever-ages a hierarchical structure of social similarity graphs as side information in the context of recommender systems. We assume that users are categorized into clusters, each of which comprises sub-clusters (or what we call “groups”). We consider a low-rank matrix model for the rating matrix, and a hierarchical stochastic block model that well respects practically-relevant social graphs.Under this setting, we characterize the information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) as a function of the quality of graph side information (to be detailed) by proving sharp upper and lower bounds on the sample complexity. Furthermore, we develop a matrix completion algorithm and empirically demonstrate via extensive experiments that the proposed algorithm achieves the optimal sample complexity.more » « less
An official website of the United States government

