skip to main content


Title: Evaluating the potential of residual Pap test fluid as a resource for the metaproteomic analysis of the cervical-vaginal microbiome
Abstract

The human cervical-vaginal area contains proteins derived from microorganisms that may prevent or predispose women to gynecological conditions. The liquid Pap test fixative is an unexplored resource for analysis of microbial communities and the microbe-host interaction. Previously, we showed that the residual cell-free fixative from discarded Pap tests of healthy women could be used for mass spectrometry (MS) based proteomic identification of cervical-vaginal proteins. In this study, we reprocessed these MS raw data files for metaproteomic analysis to characterize the microbial community composition and function of microbial proteins in the cervical-vaginal region. This was accomplished by developing a customized protein sequence database encompassing microbes likely present in the vagina. High-mass accuracy data were searched against the protein FASTA database using a two-step search method within the Galaxy for proteomics platform. Data was analyzed by MEGAN6 (MetaGenomeAnalyzer) for phylogenetic and functional characterization. We identified over 300 unique peptides from a variety of bacterial phyla andCandida. Peptides corresponding to proteins involved in carbohydrate metabolism, oxidation-reduction, and transport were identified. By identifying microbial peptides in Pap test supernatants it may be possible to acquire a functional signature of these microbes, as well as detect specific proteins associated with cervical health and disease.

 
more » « less
NSF-PAR ID:
10154107
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
8
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding the molecular profile of every human cell type is essential for understanding its role in normal physiology and disease. Technological advancements in DNA sequencing, mass spectrometry, and computational methods allow us to carry out multiomics analyses although such approaches are not routine yet. Human umbilical vein endothelial cells (HUVECs) are a widely used model system to study pathological and physiological processes associated with the cardiovascular system. In this study, next‐generation sequencing and high‐resolution mass spectrometry to profile the transcriptome and proteome of primary HUVECs is employed. Analysis of 145 million paired‐end reads from next‐generation sequencing confirmed expression of 12 186 protein‐coding genes (FPKM ≥0.1), 439 novel long non‐coding RNAs, and revealed 6089 novel isoforms that were not annotated in GENCODE. Proteomics analysis identifies 6477 proteins including confirmation ofN‐termini for 1091 proteins, isoforms for 149 proteins, and 1034 phosphosites. A database search to specifically identify other post‐translational modifications provide evidence for a number of modification sites on 117 proteins which include ubiquitylation, lysine acetylation, and mono‐, di‐ and tri‐methylation events. Evidence for 11 “missing proteins,” which are proteins for which there was insufficient or no protein level evidence, is provided. Peptides supporting missing protein and novel events are validated by comparison of MS/MS fragmentation patterns with synthetic peptides. Finally, 245 variant peptides derived from 207 expressed proteins in addition to alternate translational start sites for seven proteins and evidence for novel proteoforms for five proteins resulting from alternative splicing are identified. Overall, it is believed that the integrated approach employed in this study is widely applicable to study any primary cell type for deeper molecular characterization.

     
    more » « less
  2. null (Ed.)
    Abstract Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the high-abundance proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data. 
    more » « less
  3. Abstract Background

    Stable isotope probing (SIP) approaches are a critical tool in microbiome research to determine associations between species and substrates, as well as the activity of species. The application of these approaches ranges from studying microbial communities important for global biogeochemical cycling to host-microbiota interactions in the intestinal tract. Current SIP approaches, such as DNA-SIP or nanoSIMS allow to analyze incorporation of stable isotopes with high coverage of taxa in a community and at the single cell level, respectively, however they are limited in terms of sensitivity, resolution or throughput.

    Results

    Here, we present an ultra-sensitive, high-throughput protein-based stable isotope probing approach (Protein-SIP), which cuts cost for labeled substrates by 50–99% as compared to other SIP and Protein-SIP approaches and thus enables isotope labeling experiments on much larger scales and with higher replication. The approach allows for the determination of isotope incorporation into microbiome members with species level resolution using standard metaproteomics liquid chromatography-tandem mass spectrometry (LC–MS/MS) measurements. At the core of the approach are new algorithms to analyze the data, which have been implemented in an open-source software (https://sourceforge.net/projects/calis-p/). We demonstrate sensitivity, precision and accuracy using bacterial cultures and mock communities with different labeling schemes. Furthermore, we benchmark our approach against two existing Protein-SIP approaches and show that in the low labeling range used our approach is the most sensitive and accurate. Finally, we measure translational activity using18O heavy water labeling in a 63-species community derived from human fecal samples grown on media simulating two different diets. Activity could be quantified on average for 27 species per sample, with 9 species showing significantly higher activity on a high protein diet, as compared to a high fiber diet. Surprisingly, among the species with increased activity on high protein were severalBacteroidesspecies known as fiber consumers. Apparently, protein supply is a critical consideration when assessing growth of intestinal microbes on fiber, including fiber-based prebiotics.

    Conclusions

    We demonstrate that our Protein-SIP approach allows for the ultra-sensitive (0.01 to 10% label) detection of stable isotopes of elements found in proteins, using standard metaproteomics data.

     
    more » « less
  4. Abstract

    De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we presentPepNet, a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data.

     
    more » « less
  5. Rationale

    Purification of recombinant proteins is a necessary step for functional or structural studies and other applications. Immobilized metal affinity chromatography is a common recombinant protein purification method. Mass spectrometry (MS) allows for confirmation of identity of expressed proteins and unambiguous detection of enzymatic substrates and reaction products. We demonstrate the detection of enzymes purified on immobilized metal affinity surfaces by direct or ambient ionization MS, and follow their enzymatic reactions by direct electrospray ionization (ESI) or desorption electrospray ionization (DESI).

    Methods

    A protein standard, His‐Ubq, and two recombinant proteins, His‐SHAN and His‐CS, expressed inEscherichia coliwere immobilized on two immobilized metal affinity systems, Cu–nitriloacetic acid (Cu‐NTA) and Ni‐NTA. The proteins were purified on surface, and released in the ESI spray solvent for direct infusion, when using the 96‐well plate form factor, or analyzed directly from immobilized metal affinity‐coated microscope slides by DESI‐MS. Enzyme activity was followed by incubating the substrates in wells or by depositing substrate on immobilized protein on coated slides for analysis.

    Results

    Small proteins (His‐Ubq) and medium proteins (His‐SAHN) could readily be detected from 96‐well plates by direct infusion ESI, or from microscope slides by DESI‐MS after purification on surface from clarifiedE. colicell lysate. Protein oxidation was observed for immobilized proteins on both Cu‐NTA and Ni‐NTA; however, this did not hamper the enzymatic reactions of these proteins. Both the nucleosidase reaction products for His‐SAHN and the methylation product of His‐CS (theobromine to caffeine) were detected.

    Conclusions

    The immobilization, purification, release and detection of His‐tagged recombinant proteins using immobilized metal affinity surfaces for direct infusion ESI‐MS or ambient DESI‐MS analyses were successfully demonstrated. Recombinant proteins were purified to allow identification directly out of clarified cell lysate. Biological activities of the recombinant proteins were preserved allowing the investigation of enzymatic activity via MS.

     
    more » « less