The built environment provides an excellent setting for interdisciplinary research on the dynamics of microbial communities. The system is simplified compared to many natural settings, and to some extent the entire environment can be manipulated, from architectural design to materials use, air flow, human traffic, and capacity to disrupt microbial communities through cleaning. Here, we provide an overview of the ecology of the microbiome in the built environment. We address niche space and refugia, population, and community (metagenomic) dynamics, spatial ecology within a building, including the major microbial transmission mechanisms, as well as evolution. We also address landscape ecology, connecting microbiomes between physically separated buildings. At each stage, we pay particular attention to the actual and potential interface between disciplines, such as ecology, epidemiology, materials science, and human social behavior. We end by identifying some opportunities for future interdisciplinary research on the microbiome of the built environment.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Abstract Functional metagenomics enables the study of unexplored bacterial diversity, gene families, and pathways essential to microbial communities. However, discovering biological insights with these data is impeded by the scarcity of quality annotations. Here, we use a co-occurrence-based analysis of predicted microbial protein functions to uncover pathways in genomic and metagenomic biological systems. Our approach, based on phylogenetic profiles, improves the identification of functional relationships, or participation in the same biochemical pathway, between enzymes over a comparable homology-based approach. We optimized the design of our profiles to identify potential pathways using minimal data, clustered functionally related enzyme pairs into multi-enzymatic pathways, and evaluated our predictions against reference pathways in the KEGG database. We then demonstrated a novel extension of this approach to predict inter-bacterial protein interactions amongst members of a marine microbiome. Most significantly, we show our method predicts emergent biochemical pathways between known and unknown functions. Thus, our work establishes a basis for identifying the potential functional capacities of the entire metagenome, capturing previously unknown and abstract functions into discrete putative pathways.
-
Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Canmachines learn the language of life fromthe unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted.We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.more » « lessFree, publicly-accessible full text available July 1, 2025
-
Abstract Motivation Protein language models based on the transformer architecture are increasingly improving performance on protein prediction tasks, including secondary structure, subcellular localization, and more. Despite being trained only on protein sequences, protein language models appear to implicitly learn protein structure. This paper investigates whether sequence representations learned by protein language models encode structural information and to what extent.
Results We address this by evaluating protein language models on remote homology prediction, where identifying remote homologs from sequence information alone requires structural knowledge, especially in the “twilight zone” of very low sequence identity. Through rigorous testing at progressively lower sequence identities, we profile the performance of protein language models ranging from millions to billions of parameters in a zero-shot setting. Our findings indicate that while transformer-based protein language models outperform traditional sequence alignment methods, they still struggle in the twilight zone. This suggests that current protein language models have not sufficiently learned protein structure to address remote homology prediction when sequence signals are weak.
Availability and implementation We believe this opens the way for further research both on remote homology prediction and on the broader goal of learning sequence- and structure-rich representations of protein molecules. All code, data, and models are made publicly available.
-
Abstract Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.
-
Abstract Purpose of Review Preparing for pandemics requires a degree of interdisciplinary work that is challenging under the current paradigm. This review summarizes the challenges faced by the field of pandemic science and proposes how to address them.
Recent Findings The structure of current siloed systems of research organizations hinders effective interdisciplinary pandemic research. Moreover, effective pandemic preparedness requires stakeholders in public policy and health to interact and integrate new findings rapidly, relying on a robust, responsive, and productive research domain. Neither of these requirements are well supported under the current system.
Summary We propose a new paradigm for pandemic preparedness wherein interdisciplinary research and close collaboration with public policy and health practitioners can improve our ability to prevent, detect, and treat pandemics through tighter integration among domains, rapid and accurate integration, and translation of science to public policy, outreach and education, and improved venues and incentives for sustainable and robust interdisciplinary work.
-
Abstract Background The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN.
Results By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension.
Conclusion Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.
-
The past two decades of analytical efforts have highlighted how much more remains to be learned about the human genome and, particularly, its complex involvement in promoting disease development and progression. While numerous computational tools exist for the assessment of the functional and pathogenic effects of genome variants, their precision is far from satisfactory, particularly for clinical use. Accumulating evidence also suggests that the human microbiome's interaction with the human genome plays a critical role in determining health and disease states. While numerous microbial taxonomic groups and molecular functions of the human microbiome have been associated with disease, the reproducibility of these findings is lacking. The human microbiome–genome interaction in healthy individuals is even less well understood. This review summarizes the available computational methods built to analyze the effect of variation in the human genome and microbiome. We address the applicability and precision of these methods across their possible uses. We also briefly discuss the exciting, necessary, and now possible integration of the two types of data to improve the understanding of pathogenicity mechanisms.
-
Abstract Microbes active in extreme cold are not as well explored as those of other extreme environments. Studies have revealed a substantial microbial diversity and identified cold‐specific microbiome molecular functions. We analyzed the metagenomes and metatranscriptomes of 20 snow samples collected in early and late spring in Svalbard, Norway using mi‐faser, our read‐based computational microbiome function annotation tool. Our results reveal a more diverse microbiome functional capacity and activity in the early‐ vs. late‐spring samples. We also find that functional dissimilarity between the same‐sample metagenomes and metatranscriptomes is significantly higher in early than late spring samples. These findings suggest that early spring samples may contain a larger fraction of DNA of dormant (or dead) organisms, while late spring samples reflect a new, metabolically active community. We further show that the abundance of sequencing reads mapping to the fatty acid synthesis‐related microbial pathways in late spring metagenomes and metatranscriptomes is significantly correlated with the organic acid levels measured in these samples. Similarly, the organic acid levels correlate with the pathway read abundances of geraniol degradation and inversely correlate with those of styrene degradation, suggesting a possible nutrient change. Our study thus highlights the activity of microbial degradation pathways of complex organic compounds previously unreported at low temperatures.