skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Nerpa: A Tool for Discovering Biosynthetic Gene Clusters of Bacterial Nonribosomal Peptides
Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes.  more » « less
Award ID(s):
2117640
PAR ID:
10330330
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Metabolites
Volume:
11
Issue:
10
ISSN:
2218-1989
Page Range / eLocation ID:
693
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Simmons, Lyle A.; Bush, Karen (Ed.)
    ABSTRACT Unique DNA repair enzymes that provide self-resistance against therapeutically important, genotoxic natural products have been discovered in bacterial biosynthetic gene clusters (BGCs). Among these, the DNA glycosylase AlkZ is essential for azinomycin B production and belongs to the HTH_42 superfamily of uncharacterized proteins. Despite their widespread existence in antibiotic producers and pathogens, the roles of these proteins in production of other natural products are unknown. Here, we determine the evolutionary relationship and genomic distribution of all HTH_42 proteins from Streptomyces and use a resistance-based genome mining approach to identify homologs associated with known and uncharacterized BGCs. We find that AlkZ-like (AZL) proteins constitute one distinct HTH_42 subfamily and are highly enriched in BGCs and variable in sequence, suggesting each has evolved to protect against a specific secondary metabolite. As a validation of the approach, we show that the AZL protein, HedH4, associated with biosynthesis of the alkylating agent hedamycin, excises hedamycin-DNA adducts with exquisite specificity and provides resistance to the natural product in cells. We also identify a second, phylogenetically and functionally distinct subfamily whose proteins are never associated with BGCs, are highly conserved with respect to sequence and genomic neighborhood, and repair DNA lesions not associated with a particular natural product. This work delineates two related families of DNA repair enzymes—one specific for complex alkyl-DNA lesions and involved in self-resistance to antimicrobials and the other likely involved in protection against an array of genotoxins—and provides a framework for targeted discovery of new genotoxic compounds with therapeutic potential. IMPORTANCE Bacteria are rich sources of secondary metabolites that include DNA-damaging genotoxins with antitumor/antibiotic properties. Although Streptomyces produce a diverse number of therapeutic genotoxins, efforts toward targeted discovery of biosynthetic gene clusters (BGCs) producing DNA-damaging agents is lacking. Moreover, work on toxin-resistance genes has lagged behind our understanding of those involved in natural product synthesis. Here, we identified over 70 uncharacterized BGCs producing potentially novel genotoxins through resistance-based genome mining using the azinomycin B-resistance DNA glycosylase AlkZ. We validate our analysis by characterizing the enzymatic activity and cellular resistance of one AlkZ ortholog in the BGC of hedamycin, a potent DNA alkylating agent. Moreover, we uncover a second, phylogenetically distinct family of proteins related to Escherichia coli YcaQ, a DNA glycosylase capable of unhooking interstrand DNA cross-links, which differs from the AlkZ-like family in sequence, genomic location, proximity to BGCs, and substrate specificity. This work defines two families of DNA glycosylase for specialized repair of complex genotoxic natural products and generalized repair of a broad range of alkyl-DNA adducts and provides a framework for targeted discovery of new compounds with therapeutic potential. 
    more » « less
  2. Reguera, Gemma (Ed.)
    ABSTRACT Polycyclic tetramate macrolactams (PTMs) are bioactive natural products commonly associated with certain actinobacterial and proteobacterial lineages. These molecules have been the subject of numerous structure-activity investigations since the 1970s. New members continue to be pursued in wild and engineered bacterial strains, and advances in PTM biosynthesis suggest their outwardly simplistic biosynthetic gene clusters (BGCs) belie unexpected product complexity. To address the origins of this complexity and understand its influence on PTM discovery, we engaged in a combination of bioinformatics to systematically classify PTM BGCs and PTM-targeted metabolomics to compare the products of select BGC types. By comparing groups of producers and BGC mutants, we exposed knowledge gaps that complicate bioinformatics-driven product predictions. In sum, we provide new insights into the evolution of PTM BGCs while systematically accounting for the PTMs discovered thus far. The combined computational and metabologenomic findings presented here should prove useful for guiding future discovery.<bold>IMPORTANCE</bold>Polycyclic tetramate macrolactam (PTM) pathways are frequently found within the genomes of biotechnologically important bacteria, includingStreptomycesandLysobacterspp.Their molecular products are typically bioactive, having substantial agricultural and therapeutic interest. Leveraging bacterial genomics for the discovery of new related molecules is thus desirable, but drawing accurate structural predictions from bioinformatics alone remains challenging. This difficulty stems from a combination of previously underappreciated biosynthetic complexity and remaining knowledge gaps, compounded by a stream of yet-uncharacterized PTM biosynthetic loci gleaned from recently sequenced bacterial genomes. We engaged in the following study to create a useful framework for cataloging historic PTM clusters, identifying new cluster variations, and tracing evolutionary paths for these molecules. Our data suggest new PTM chemistry remains discoverable in nature. However, our metabolomic and mutational analyses emphasize the practical limitations of genomics-based discovery by exposing hidden complexity. 
    more » « less
  3. Davies, Julian E. (Ed.)
    ABSTRACT Bacteria isolated from soils are major sources of specialized metabolites, including antibiotics and other compounds with clinical value that likely shape interactions among microbial community members and impact biogeochemical cycles. Yet, isolated lineages represent a small fraction of all soil bacterial diversity. It remains unclear how the production of specialized metabolites varies across the phylogenetic diversity of bacterial species in soils and whether the genetic potential for production of these metabolites differs with soil depth and vegetation type within a geographic region. We sampled soils and saprolite from three sites in a northern California Critical Zone Observatory with various vegetation and bedrock characteristics and reconstructed 1,334 metagenome-assembled genomes containing diverse biosynthetic gene clusters (BGCs) for secondary metabolite production. We obtained genomes for prolific producers of secondary metabolites, including novel groups within the Actinobacteria , Chloroflexi , and candidate phylum “ Candidatus Dormibacteraeota.” Surprisingly, one genome of a candidate phyla radiation (CPR) bacterium coded for a ribosomally synthesized linear azole/azoline-containing peptide, a capacity we found in other publicly available CPR bacterial genomes. Overall, bacteria with higher biosynthetic potential were enriched in shallow soils and grassland soils, with patterns of abundance of BGC type varying by taxonomy. IMPORTANCE Microbes produce specialized compounds to compete or communicate with one another and their environment. Some of these compounds, such as antibiotics, are also useful in medicine and biotechnology. Historically, most antibiotics have come from soil bacteria which can be isolated and grown in the lab. Though the vast majority of soil bacteria cannot be isolated, we can extract their genetic information and search it for genes which produce these specialized compounds. These understudied soil bacteria offer a wealth of potential for the discovery of new and important microbial products. Here, we identified the ability to produce these specialized compounds in diverse and novel bacteria in a range of soil environments. This information will be useful to other researchers who wish to isolate certain products. Beyond their use to humans, understanding the distribution and function of microbial products is key to understanding microbial communities and their effects on biogeochemical cycles. 
    more » « less
  4. Streptomyces genomes harbor numerous, biosynthetic gene clusters (BGCs) encoding for drug-like compounds. While some of these BGCs readily yield expected products, many do not. Biosynthetic crypticity represents a significant hurdle to drug discovery, and the biological mechanisms that underpin it remain poorly understood. Polycyclic tetramate macrolactam (PTM) antibiotic production is widespread within the Streptomyces genus, and examples of active and cryptic PTM BGCs are known. To reveal further insights into the causes of biosynthetic crypticity, we employed a PTM-targeted comparative metabologenomics approach to analyze a panel of S. griseus clade strains that included both poor and robust PTM producers. By comparing the genomes and PTM production profiles of these strains, we systematically mapped the PTM promoter architecture within the group, revealed that these promoters are directly activated via the global regulator AdpA, and discovered that small promoter insertion–deletion lesions (indels) differentiate weaker PTM producers from stronger ones. We also revealed an unexpected link between robust PTM expression and griseorhodin pigment coproduction, with weaker S. griseus –clade PTM producers being unable to produce the latter compound. This study highlights promoter indels and biosynthetic interactions as important, genetically encoded factors that impact BGC outputs, providing mechanistic insights that will undoubtedly extend to other Streptomyces BGCs. We highlight comparative metabologenomics as a powerful approach to expose genomic features that differentiate strong, antibiotic producers from weaker ones. This should prove useful for rational discovery efforts and is orthogonal to current engineering and molecular signaling approaches now standard in the field. 
    more » « less
  5. ABSTRACT Small molecules are the primary communication media of the microbial world. Recent bioinformatic studies, exploring the biosynthetic gene clusters (BGCs) which produce many small molecules, have highlighted the incredible biochemical potential of the signaling molecules encoded by the human microbiome. Thus far, most research efforts have focused on understanding the social language of the gut microbiome, leaving crucial signaling molecules produced by oral bacteria and their connection to health versus disease in need of investigation. In this study, a total of 4,915 BGCs were identified across 461 genomes representing a broad taxonomic diversity of oral bacteria. Sequence similarity networking provided a putative product class for more than 100 unclassified novel BGCs. The newly identified BGCs were cross-referenced against 254 metagenomes and metatranscriptomes derived from individuals either with good oral health or with dental caries or periodontitis. This analysis revealed 2,473 BGCs, which were differentially represented across the oral microbiomes associated with health versus disease. Coabundance network analysis identified numerous inverse correlations between BGCs and specific oral taxa. These correlations were present in healthy individuals but greatly reduced in individuals with dental caries, which may suggest a defect in colonization resistance. Finally, corroborating mass spectrometry identified several compounds with homology to products of the predicted BGC classes. Together, these findings greatly expand the number of known biosynthetic pathways present in the oral microbiome and provide an atlas for experimental characterization of these abundant, yet poorly understood, molecules and socio-chemical relationships, which impact the development of caries and periodontitis, two of the world’s most common chronic diseases. IMPORTANCE The healthy oral microbiome is symbiotic with the human host, importantly providing colonization resistance against potential pathogens. Dental caries and periodontitis are two of the world’s most common and costly chronic infectious diseases and are caused by a localized dysbiosis of the oral microbiome. Bacterially produced small molecules, often encoded by BGCs, are the primary communication media of bacterial communities and play a crucial, yet largely unknown, role in the transition from health to dysbiosis. This study provides a comprehensive mapping of the BGC repertoire of the human oral microbiome and identifies major differences in health compared to disease. Furthermore, BGC representation and expression is linked to the abundance of particular oral bacterial taxa in health versus dental caries and periodontitis. Overall, this study provides a significant insight into the chemical communication network of the healthy oral microbiome and how it devolves in the case of two prominent diseases. 
    more » « less