The human cervical-vaginal area contains proteins derived from microorganisms that may prevent or predispose women to gynecological conditions. The liquid Pap test fixative is an unexplored resource for analysis of microbial communities and the microbe-host interaction. Previously, we showed that the residual cell-free fixative from discarded Pap tests of healthy women could be used for mass spectrometry (MS) based proteomic identification of cervical-vaginal proteins. In this study, we reprocessed these MS raw data files for metaproteomic analysis to characterize the microbial community composition and function of microbial proteins in the cervical-vaginal region. This was accomplished by developing a customized protein sequence database encompassing microbes likely present in the vagina. High-mass accuracy data were searched against the protein FASTA database using a two-step search method within the Galaxy for proteomics platform. Data was analyzed by MEGAN6 (MetaGenomeAnalyzer) for phylogenetic and functional characterization. We identified over 300 unique peptides from a variety of bacterial phyla and
- Award ID(s):
- 2025451
- PAR ID:
- 10297311
- Date Published:
- Journal Name:
- Microbiome
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2049-2618
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Candida . Peptides corresponding to proteins involved in carbohydrate metabolism, oxidation-reduction, and transport were identified. By identifying microbial peptides in Pap test supernatants it may be possible to acquire a functional signature of these microbes, as well as detect specific proteins associated with cervical health and disease. -
Abstract Motivation Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification.
Results We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%–2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%–15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%–12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder’s potential to enhance peptide identification for proteomic data analyses.
Availability and Implementation The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.
-
Coelho, Luis Pedro (Ed.)Host-microbiome interactions and the microbial community have broad impact in human health and diseases. Most microbiome based studies are performed at the genome level based on next-generation sequencing techniques, but metaproteomics is emerging as a powerful technique to study microbiome functional activity by characterizing the complex and dynamic composition of microbial proteins. We conducted a large-scale survey of human gut microbiome metaproteomic data to identify generalist species that are ubiquitously expressed across all samples and specialists that are highly expressed in a small subset of samples associated with a certain phenotype. We were able to utilize the metaproteomic mass spectrometry data to reveal the protein landscapes of these species, which enables the characterization of the expression levels of proteins of different functions and underlying regulatory mechanisms, such as operons. Finally, we were able to recover a large number of open reading frames (ORFs) with spectral support, which were missed by de novo protein-coding gene predictors. We showed that a majority of the rescued ORFs overlapped with de novo predicted protein-coding genes, but on opposite strands or in different frames. Together, these demonstrate applications of metaproteomics for the characterization of important gut bacterial species.more » « less
-
Abstract De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we present
PepNet , a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data. -
Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2’s database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs.more » « less