skip to main content


Title: Using high-abundance proteins as guides for fast and effective peptide/protein identification from human gut metaproteomic data
Abstract Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.  more » « less
Award ID(s):
2025451
NSF-PAR ID:
10297311
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Microbiome
Volume:
9
Issue:
1
ISSN:
2049-2618
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The human cervical-vaginal area contains proteins derived from microorganisms that may prevent or predispose women to gynecological conditions. The liquid Pap test fixative is an unexplored resource for analysis of microbial communities and the microbe-host interaction. Previously, we showed that the residual cell-free fixative from discarded Pap tests of healthy women could be used for mass spectrometry (MS) based proteomic identification of cervical-vaginal proteins. In this study, we reprocessed these MS raw data files for metaproteomic analysis to characterize the microbial community composition and function of microbial proteins in the cervical-vaginal region. This was accomplished by developing a customized protein sequence database encompassing microbes likely present in the vagina. High-mass accuracy data were searched against the protein FASTA database using a two-step search method within the Galaxy for proteomics platform. Data was analyzed by MEGAN6 (MetaGenomeAnalyzer) for phylogenetic and functional characterization. We identified over 300 unique peptides from a variety of bacterial phyla andCandida. Peptides corresponding to proteins involved in carbohydrate metabolism, oxidation-reduction, and transport were identified. By identifying microbial peptides in Pap test supernatants it may be possible to acquire a functional signature of these microbes, as well as detect specific proteins associated with cervical health and disease.

     
    more » « less
  2. Sangwan, Naseer (Ed.)
    ABSTRACT Bacterially secreted proteins play an important role in microbial physiology and ecology in many environments, including the mammalian gut. While gut microbes have been extensively studied over the past decades, little is known about the proteins that they secrete into the gastrointestinal tract. In this study, we developed and applied a computational pipeline to a comprehensive catalog of human-associated metagenome-assembled genomes in order to predict and analyze the bacterial metasecretome of the human gut, i.e., the collection of proteins secreted out of the cytoplasm by human gut bacteria. We identified the presence of large and diverse families of secreted carbohydrate-active enzymes and assessed their phylogenetic distributions across different taxonomic groups, which revealed an enrichment in Bacteroidetes and Verrucomicrobia . By mapping secreted proteins to available metagenomic data from endoscopic sampling of the human gastrointestinal tract, we specifically pinpointed regions in the upper and lower intestinal tract along the lumen and mucosa where specific glycosidases are secreted by resident microbes. The metasecretome analyzed in this study constitutes the most comprehensive list of secreted proteins produced by human gut bacteria reported to date and serves as a useful resource for the microbiome research community. IMPORTANCE Bacterially secreted proteins are necessary for the proper functioning of bacterial cells and communities. Secreted proteins provide bacterial cells with the ability to harvest resources from the exterior, import these resources into the cell, and signal to other bacteria. In the human gut microbiome, these actions impact host health and allow the maintenance of a healthy gut bacterial community. We utilized computational tools to identify the major components of human gut bacterially secreted proteins and determined their spatial distribution in the gastrointestinal tract. Our analysis of human gut bacterial secreted proteins will allow a better understanding of the impact of gut bacteria on human health and represents a step toward identifying new protein functions with interesting applications in biomedicine and industry. 
    more » « less
  3. Coelho, Luis Pedro (Ed.)
    Host-microbiome interactions and the microbial community have broad impact in human health and diseases. Most microbiome based studies are performed at the genome level based on next-generation sequencing techniques, but metaproteomics is emerging as a powerful technique to study microbiome functional activity by characterizing the complex and dynamic composition of microbial proteins. We conducted a large-scale survey of human gut microbiome metaproteomic data to identify generalist species that are ubiquitously expressed across all samples and specialists that are highly expressed in a small subset of samples associated with a certain phenotype. We were able to utilize the metaproteomic mass spectrometry data to reveal the protein landscapes of these species, which enables the characterization of the expression levels of proteins of different functions and underlying regulatory mechanisms, such as operons. Finally, we were able to recover a large number of open reading frames (ORFs) with spectral support, which were missed by de novo protein-coding gene predictors. We showed that a majority of the rescued ORFs overlapped with de novo predicted protein-coding genes, but on opposite strands or in different frames. Together, these demonstrate applications of metaproteomics for the characterization of important gut bacterial species. 
    more » « less
  4. Hird, Sarah M. (Ed.)
    The gut microbiome provides vital functions for mammalian hosts, yet research on its variability and function across adult life spans and multiple generations is limited in large mammalian carnivores. Here, we used 16S rRNA gene and metagenomic high-throughput sequencing to profile the bacterial taxonomic composition, genomic diversity, and metabolic function of fecal samples collected from 12 wild spotted hyenas ( Crocuta crocuta ) residing in the Masai Mara National Reserve, Kenya, over a 23-year period spanning three generations. The metagenomic data came from four of these hyenas and spanned two 2-year periods. With these data, we determined the extent to which host factors predicted variation in the gut microbiome and identified the core microbes present in the guts of hyenas. We also investigated novel genomic diversity in the mammalian gut by reporting the first metagenome-assembled genomes (MAGs) for hyenas. We found that gut microbiome taxonomic composition varied temporally, but despite this, a core set of 14 bacterial genera were identified. The strongest predictors of the microbiome were host identity and age, suggesting that hyenas possess individualized microbiomes and that these may change with age during adulthood. The gut microbiome functional profiles of the four adult hyenas were also individual specific and were associated with prey abundance, indicating that the functions of the gut microbiome vary with host diet. We recovered 149 high-quality MAGs from the hyenas’ guts; some MAGs were classified as taxa previously reported for other carnivores, but many were novel and lacked species-level matches to genomes in existing reference databases. IMPORTANCE There is a gap in knowledge regarding the genomic diversity and variation of the gut microbiome across a host’s life span and across multiple generations of hosts in wild mammals. Using two types of sequencing approaches, we found that although gut microbiomes were individualized and temporally variable among hyenas, they correlated similarly to large-scale changes in the ecological conditions experienced by their hosts. We also recovered 149 high-quality MAGs from the hyena gut, greatly expanding the microbial genome repertoire known for hyenas, carnivores, and wild mammals in general. Some MAGs came from genera abundant in the gastrointestinal tracts of canid species and other carnivores, but over 80% of MAGs were novel and from species not previously represented in genome databases. Collectively, our novel body of work illustrates the importance of surveying the gut microbiome of nonmodel wild hosts, using multiple sequencing methods and computational approaches and at distinct scales of analysis. 
    more » « less
  5. Ercolini, Danilo (Ed.)
    ABSTRACT Dietary polyphenols can significantly benefit human health, but their bioavailability is metabolically controlled by human gut microbiota. To facilitate the study of polyphenol metabolism for human gut health, we have manually curated experimentally characterized polyphenol utilization proteins (PUPs) from published literature. This resulted in 60 experimentally characterized PUPs (named seeds) with various metadata, such as species and substrate. Further database search found 107,851 homologs of the seeds from UniProt and UHGP (unified human gastrointestinal protein) databases. All PUP seeds and homologs were classified into protein classes, families, and subfamilies based on Enzyme Commission (EC) numbers, Pfam (protein family) domains, and sequence similarity networks. By locating PUP homologs in the genomes of UHGP, we have identified 1,074 physically linked PUP gene clusters (PGCs), which are potentially involved in polyphenol metabolism in the human gut. The gut microbiome of Africans was consistently ranked the top in terms of the abundance and prevalence of PUP homologs and PGCs among all geographical continents. This reflects the fact that dietary polyphenols are consumed by the African population more commonly than by other populations, such as Europeans and North Americans. A case study of the Hadza hunter-gatherer microbiome verified the feasibility of using dbPUP to profile metagenomic data for biologically meaningful discovery, suggesting an association between diet and PUP abundance. A Pfam domain enrichment analysis of PGCs identified a number of putatively novel PUP families. Lastly, a user-friendly web interface ( https://bcb.unl.edu/dbpup/ ) provides all the data online to facilitate the research of polyphenol metabolism for improved human health. IMPORTANCE Long-term consumption of polyphenol-rich foods has been shown to lower the risk of various human diseases, such as cardiovascular diseases, cancers, and metabolic diseases. Raw polyphenols are often enzymatically processed by gut microbiome, which contains various polyphenol utilization proteins (PUPs) to produce metabolites with much higher bioaccessibility to gastrointestinal cells. This study delivered dbPUP as an online database for experimentally characterized PUPs and their homologs in human gut microbiome. This work also performed a systematic classification of PUPs into enzyme classes, families, and subfamilies. The signature Pfam domains were identified for PUP families, enabling conserved domain-based PUP annotation. This standardized sequence similarity-based PUP classification system offered a guideline for the future inclusion of new experimentally characterized PUPs and the creation of new PUP families. An in-depth data analysis was further conducted on PUP homologs and physically linked PUP gene clusters (PGCs) in gut microbiomes of different human populations. 
    more » « less