skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MIBiG 2.0: a repository for biosynthetic gene clusters of known function
Abstract Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.  more » « less
Award ID(s):
1652424
PAR ID:
10130405
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; « less
Date Published:
Journal Name:
Nucleic Acids Research
ISSN:
0305-1048
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/. 
    more » « less
  2. Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs. However, biosynthetic gene clusters (BGCs) encoding them are known only for a few hundred compounds. Here, we developed Nerpa, a computational method for the high-throughput discovery of novel BGCs responsible for producing known NRPs. After searching 13,399 representative bacterial genomes from the RefSeq repository against 8368 known NRPs, Nerpa linked 117 BGCs to their products. We further experimentally validated the predicted BGC of ngercheumicin from Photobacterium galatheae via mass spectrometry. Nerpa supports searching new genomes against thousands of known NRP structures, and novel molecular structures against tens of thousands of bacterial genomes. The availability of these tools can enhance our understanding of NRP synthesis and the function of their biosynthetic enzymes. 
    more » « less
  3. Abstract Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/. 
    more » « less
  4. Abstract  Secondary metabolites (SMs) are biologically active small molecules, many of which are medically valuable. Fungal genomes contain vast numbers of SM biosynthetic gene clusters (BGCs) with unknown products, suggesting that huge numbers of valuable SMs remain to be discovered. It is challenging, however, to identify SM BGCs, among the millions present in fungi, that produce useful compounds. One solution is resistance gene-guided genome mining, which takes advantage of the fact that some BGCs contain a gene encoding a resistant version of the protein targeted by the compound produced by the BGC. The bioinformatic signature of such BGCs is that they contain an allele of an essential gene with no SM biosynthetic function, and there is a second allele elsewhere in the genome. We have developed a computer-assisted approach to resistance gene-guided genome mining that allows users to query large databases for BGCs that putatively make compounds that have targets of therapeutic interest. Working with the MycoCosm genome database, we have applied this approach to look for SM BGCs that target the proteasome β6 subunit, the target of the proteasome inhibitor fellutamide B, or HMG-CoA reductase, the target of cholesterol reducing therapeutics such as lovastatin. Our approach proved effective, finding known fellutamide and lovastatin BGCs as well as fellutamide- and lovastatin-related BGCs with variations in the SM genes that suggest they may produce structural variants of fellutamides and lovastatin. Gratifyingly, we also found BGCs that are not closely related to lovastatin BGCs but putatively produce novel HMG-CoA reductase inhibitors. One-Sentence SummaryA new computer-assisted approach to resistance gene-directed genome mining is reported along with its use to identify fungal biosynthetic gene clusters that putatively produce proteasome and HMG-CoA reductase inhibitors. 
    more » « less
  5. Abstract Streptomycesbacteria are known for their prolific production of secondary metabolites, many of which have been widely used in human medicine, agriculture and animal health. To guide the effective prioritization of specific biosynthetic gene clusters (BGCs) for drug development and targeting the most prolific producer strains, knowledge about phylogenetic relationships ofStreptomycesspecies, genome-wide diversity and distribution patterns of BGCs is critical. We used genomic and phylogenetic methods to elucidate the diversity of major classes of BGCs in 1,110 publicly availableStreptomycesgenomes. Genome mining ofStreptomycesreveals high diversity of BGCs and variable distribution patterns in theStreptomycesphylogeny, even among very closely related strains. The most common BGCs are non-ribosomal peptide synthetases, type 1 polyketide synthases, terpenes, and lantipeptides. We also found that numerousStreptomycesspecies harbor BGCs known to encode antitumor compounds. We observed that strains that are considered the same species can vary tremendously in the BGCs they carry, suggesting that strain-level genome sequencing can uncover high levels of BGC diversity and potentially useful derivatives of any one compound. These findings suggest that a strain-level strategy for exploring secondary metabolites for clinical use provides an alternative or complementary approach to discovering novel pharmaceutical compounds from microbes. 
    more » « less