skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search

Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.

more » « less
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes. 
    more » « less
  2. Traxler, Matthew F. (Ed.)
    ABSTRACT Marine sponge holobionts are prolific sources of natural products. One of the most geographically widespread classes of sponge-derived natural products is the bromotyrosine alkaloids. A distinguishing feature of bromotyrosine alkaloids is that they are present in phylogenetically disparate sponges. In this study, using sponge specimens collected from Guam, the Solomon Islands, the Florida Keys, and Puerto Rico, we queried whether the presence of bromotyrosine alkaloids potentiates metabolomic and microbiome conservation among geographically distant and phylogenetically different marine sponges. A multi-omic characterization of sponge holobionts revealed vastly different metabolomic and microbiome architectures among different bromotyrosine alkaloid-harboring sponges. However, we find statistically significant correlations between the microbiomes and metabolomes, signifying that the microbiome plays an important role in shaping the overall metabolome, even in low-microbial-abundance sponges. Molecules mined from the polar metabolomes of these sponges revealed conservation of biosynthetic logic between bromotyrosine alkaloids and brominated pyrrole-imidazole alkaloids, another class of marine sponge-derived natural products. In light of prior findings postulating the sponge host itself to be the biosynthetic source of bromotyrosine alkaloids, our data now set the stage for investigating the causal relationships that dictate the microbiome-metabolome interconnectedness for marine sponges in which the microbiome may not contribute to natural product biogenesis. IMPORTANCE Our work demonstrates that phylogenetically and geographically distant sponges with very different microbiomes can harbor natural product chemical classes that are united in their core chemical structures and biosynthetic logic. Furthermore, we show that independent of geographical dispersion, natural product chemistry, and microbial abundance, overall sponge metabolomes tightly correlate with their microbiomes. 
    more » « less
  3. Abstract With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at 
    more » « less
  4. Abstract Background Halogenation is a recurring feature in natural products, especially those from marine organisms. The selectivity with which halogenating enzymes act on their substrates renders halogenases interesting targets for biocatalyst development. Recently, CylC – the first predicted dimetal-carboxylate halogenase to be characterized – was shown to regio- and stereoselectively install a chlorine atom onto an unactivated carbon center during cylindrocyclophane biosynthesis. Homologs of CylC are also found in other characterized cyanobacterial secondary metabolite biosynthetic gene clusters. Due to its novelty in biological catalysis, selectivity and ability to perform C-H activation, this halogenase class is of considerable fundamental and applied interest. The study of CylC-like enzymes will provide insights into substrate scope, mechanism and catalytic partners, and will also enable engineering these biocatalysts for similar or additional C-H activating functions. Still, little is known regarding the diversity and distribution of these enzymes. Results In this study, we used both genome mining and PCR-based screening to explore the genetic diversity of CylC homologs and their distribution in bacteria. While we found non-cyanobacterial homologs of these enzymes to be rare, we identified a large number of genes encoding CylC-like enzymes in publicly available cyanobacterial genomes and in our in-house culture collection of cyanobacteria. Genes encoding CylC homologs are widely distributed throughout the cyanobacterial tree of life, within biosynthetic gene clusters of distinct architectures (combination of unique gene groups). These enzymes are found in a variety of biosynthetic contexts, which include fatty-acid activating enzymes, type I or type III polyketide synthases, dialkylresorcinol-generating enzymes, monooxygenases or Rieske proteins. Our study also reveals that dimetal-carboxylate halogenases are among the most abundant types of halogenating enzymes in the phylum Cyanobacteria. Conclusions Our data show that dimetal-carboxylate halogenases are widely distributed throughout the Cyanobacteria phylum and that BGCs encoding CylC homologs are diverse and mostly uncharacterized. This work will help guide the search for new halogenating biocatalysts and natural product scaffolds. 
    more » « less
  5. ABSTRACT Small molecules are the primary communication media of the microbial world. Recent bioinformatic studies, exploring the biosynthetic gene clusters (BGCs) which produce many small molecules, have highlighted the incredible biochemical potential of the signaling molecules encoded by the human microbiome. Thus far, most research efforts have focused on understanding the social language of the gut microbiome, leaving crucial signaling molecules produced by oral bacteria and their connection to health versus disease in need of investigation. In this study, a total of 4,915 BGCs were identified across 461 genomes representing a broad taxonomic diversity of oral bacteria. Sequence similarity networking provided a putative product class for more than 100 unclassified novel BGCs. The newly identified BGCs were cross-referenced against 254 metagenomes and metatranscriptomes derived from individuals either with good oral health or with dental caries or periodontitis. This analysis revealed 2,473 BGCs, which were differentially represented across the oral microbiomes associated with health versus disease. Coabundance network analysis identified numerous inverse correlations between BGCs and specific oral taxa. These correlations were present in healthy individuals but greatly reduced in individuals with dental caries, which may suggest a defect in colonization resistance. Finally, corroborating mass spectrometry identified several compounds with homology to products of the predicted BGC classes. Together, these findings greatly expand the number of known biosynthetic pathways present in the oral microbiome and provide an atlas for experimental characterization of these abundant, yet poorly understood, molecules and socio-chemical relationships, which impact the development of caries and periodontitis, two of the world’s most common chronic diseases. IMPORTANCE The healthy oral microbiome is symbiotic with the human host, importantly providing colonization resistance against potential pathogens. Dental caries and periodontitis are two of the world’s most common and costly chronic infectious diseases and are caused by a localized dysbiosis of the oral microbiome. Bacterially produced small molecules, often encoded by BGCs, are the primary communication media of bacterial communities and play a crucial, yet largely unknown, role in the transition from health to dysbiosis. This study provides a comprehensive mapping of the BGC repertoire of the human oral microbiome and identifies major differences in health compared to disease. Furthermore, BGC representation and expression is linked to the abundance of particular oral bacterial taxa in health versus dental caries and periodontitis. Overall, this study provides a significant insight into the chemical communication network of the healthy oral microbiome and how it devolves in the case of two prominent diseases. 
    more » « less