skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on December 1, 2025

Title: Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations
Abstract The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.  more » « less
Award ID(s):
2317079 2200052 1914792 2246707 2019589
PAR ID:
10523256
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Communications Biology
Volume:
7
Issue:
1
ISSN:
2399-3642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Behavior is shaped by genes, environment, and evolutionary history in different ways. Nest architecture is an extended phenotype that results from the interaction between the behavior of animals and their environment. Nests built by ants are extended phenotypes that differ in structure among species and among colonies within a species, but the source of these differences remains an open question. To investigate the impact of colony identity (genetics), evolutionary history (species), and the environment on nest architecture, we compared how two species of harvester ants, Pogonomyrmex californicus and Veromessor andrei, construct their nests under different environmental conditions. For each species, we allowed workers from four colonies to excavate nests in environments that differed in temperature and humidity for seven days. We then created casts of each nest to compare nest structures among colonies, between species, and across environmental conditions. We found differences in nest structure among colonies of the same species and between species. Interestingly, however, environmental conditions did not have a strong influence on nest structure in either species. Our results suggest that extended phenotypes are shaped more strongly by internal factors, such as genes and evolutionary history, and are less plastic in response to the abiotic environment, like many physical and physiological phenotypes. 
    more » « less
  2. Abstract PremiseEndophytic plant‐microbe interactions range from mutualistic relationships that confer important ecological and agricultural traits to neutral or quasi‐parasitic relationships. In contrast to root‐associated endophytes, the role of environmental and host‐related factors in the acquisition of leaf endophyte communities at broad spatial and phylogenetic scales remains sparsely studied. We assessed endofoliar diversity to test the hypothesis that membership in these microbial communities is driven primarily by abiotic environment and host phylogeny. MethodsWe used a broad geographic coverage of North America in the genusHeucheraL. (Saxifragaceae), representing 32 species and varieties across 161 populations. Bacterial and fungal communities were characterized using 16S and ITS amplicon sequencing, respectively, and standard diversity metrics were calculated. We assembled environmental predictors for microbial diversity at collection sites, including latitude, elevation, temperature, precipitation, and soil parameters. ResultsAssembly patterns differed between bacterial and fungal endophytes. Host phylogeny was significantly associated with bacteria, while geographic distance was the best predictor of fungal community composition. Species richness and phylogenetic diversity were consistent across sites and species, with only fungi showing a response to aridity and precipitation for some metrics. Unlike what has been observed with root‐associated microbial communities, in this system microbes show no relationship with pH or other soil factors. ConclusionsOverall, this work improves our understanding of the large‐scale patterns of diversity and community composition in leaf endophytes and highlights the relative significance of environmental and host‐related factors in driving different microbial communities within the leaf microbiome. 
    more » « less
  3. ABSTRACT In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors. 
    more » « less
  4. Abstract A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine‐mapping of the microbiome, we propose a two‐step compositional knockoff filter to provide the effective finite‐sample false discovery rate (FDR) control in high‐dimensional linear log‐contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum‐to‐zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high‐dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two‐step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions. 
    more » « less
  5. Klassen, Jonathan L. (Ed.)
    ABSTRACT Omnivorous animals, including humans, harbor diverse, species-rich gut communities that impact their growth, development, and homeostasis. Model invertebrates are broadly accessible experimental platforms that enable linking specific species or species groups to host phenotypes, yet often their specialized diets and distinct gut microbiota make them less comparable to human and other mammalian and gut communities. The omnivorous cockroach Periplaneta americana harbors ∼4 × 10 2 bacterial genera within its digestive tract and is enriched with taxa commonly found in omnivorous mammals (i.e., Proteobacteria, Bacteroidetes , and Firmicutes ). These features make P. americana a valuable platform for identifying microbe-mediated host phenotypes with potential translations to mammals. Rearing P. americana insects under germfree conditions resulted in prolonging development time by ∼30% and an up to ∼8% reduction in body size along three dimensions. Germfree rearing resulted in downregulation of gene networks involved in growth, energy homeostasis, and nutrient availability. Reintroduction of a defined microbiota comprised of a subset of P. americana commensals to germfree insects did not recover normal growth and developmental phenotypes or transcriptional profiles observed in conventionally reared insects. These results are in contrast with specialist-feeding model insects (e.g., Drosophila ), where introduction of a single endemic bacterial species to germfree condition-reared specimens recovered normal host phenotypes. These data suggest that understanding microbe-mediated host outcomes in animals with species-rich communities should include models that typically maintain similarly diverse microbiomes. The dramatic transcriptional, developmental, and morphological phenotypes linked to gut microbiome status in this study illustrates how microbes are key players in animal growth and evolution. IMPORTANCE Broadly accessible model organisms are essential for illustrating how microbes are engaged in the growth, development, and evolution of animals. We report that germfree rearing of omnivorous Periplaneta americana cockroaches resulted in growth defects and severely disrupted gene networks that regulate development, which highlights the importance of gut microbiota in these host processes. Absence of gut microbiota elicited a starvation-like transcriptional response in which growth and development were inhibited while nutrient scavenging was enhanced. Additionally, reintroduction of a subset of cockroach gut bacterial commensals did not broadly recover normal expression patterns, illustrating that a particular microbiome composition may be necessary for normal host development. Invertebrate microbiota model systems that enable disentangling complex, species-rich communities are essential for linking microbial taxa to specific host phenotypes. 
    more » « less