skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters
Abstract With an ever-increasing amount of (meta)genomic data being deposited in sequence databases, (meta)genome mining for natural product biosynthetic pathways occupies a critical role in the discovery of novel pharmaceutical drugs, crop protection agents and biomaterials. The genes that encode these pathways are often organised into biosynthetic gene clusters (BGCs). In 2015, we defined the Minimum Information about a Biosynthetic Gene cluster (MIBiG): a standardised data format that describes the minimally required information to uniquely characterise a BGC. We simultaneously constructed an accompanying online database of BGCs, which has since been widely used by the community as a reference dataset for BGCs and was expanded to 2021 entries in 2019 (MIBiG 2.0). Here, we describe MIBiG 3.0, a database update comprising large-scale validation and re-annotation of existing entries and 661 new entries. Particular attention was paid to the annotation of compound structures and biological activities, as well as protein domain selectivities. Together, these new features keep the database up-to-date, and will provide new opportunities for the scientific community to use its freely available data, e.g. for the training of new machine learning models to predict sequence-structure-function relationships for diverse natural products. MIBiG 3.0 is accessible online at https://mibig.secondarymetabolites.org/.  more » « less
Award ID(s):
2032243
PAR ID:
10389667
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
Nucleic Acids Research
ISSN:
0305-1048
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/. 
    more » « less
  2. Abstract Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/. 
    more » « less
  3. Abstract  Secondary metabolites (SMs) are biologically active small molecules, many of which are medically valuable. Fungal genomes contain vast numbers of SM biosynthetic gene clusters (BGCs) with unknown products, suggesting that huge numbers of valuable SMs remain to be discovered. It is challenging, however, to identify SM BGCs, among the millions present in fungi, that produce useful compounds. One solution is resistance gene-guided genome mining, which takes advantage of the fact that some BGCs contain a gene encoding a resistant version of the protein targeted by the compound produced by the BGC. The bioinformatic signature of such BGCs is that they contain an allele of an essential gene with no SM biosynthetic function, and there is a second allele elsewhere in the genome. We have developed a computer-assisted approach to resistance gene-guided genome mining that allows users to query large databases for BGCs that putatively make compounds that have targets of therapeutic interest. Working with the MycoCosm genome database, we have applied this approach to look for SM BGCs that target the proteasome β6 subunit, the target of the proteasome inhibitor fellutamide B, or HMG-CoA reductase, the target of cholesterol reducing therapeutics such as lovastatin. Our approach proved effective, finding known fellutamide and lovastatin BGCs as well as fellutamide- and lovastatin-related BGCs with variations in the SM genes that suggest they may produce structural variants of fellutamides and lovastatin. Gratifyingly, we also found BGCs that are not closely related to lovastatin BGCs but putatively produce novel HMG-CoA reductase inhibitors. One-Sentence SummaryA new computer-assisted approach to resistance gene-directed genome mining is reported along with its use to identify fungal biosynthetic gene clusters that putatively produce proteasome and HMG-CoA reductase inhibitors. 
    more » « less
  4. Abstract Immune checkpoint inhibitors (ICIs) have revolutionized melanoma treatment, yet patient responses remain highly variable, underscoring the need for predictive biomarkers. Emerging evidence suggests that gut microbiome composition influences ICI efficacy, though findings remain inconsistent across studies. Here, we present a meta-analysis of seven melanoma-associated microbiome cohorts (N=678) using a standardized computational pipeline to integrate microbial species, biosynthetic gene clusters (BGCs), and functional pathways. We identifyFaecalibacteriumSGB15346 as a key species enriched in responders, alongside RiPP biosynthetic class and pathways involved in short-chain fatty acid fermentation. Conversely, dTDP-sugar biosynthesis correlates with non-response. Our results highlight microbial signatures and metabolic pathways associated with ICI outcomes, offering potential targets for microbiome-based interventions in personalized immunotherapy. 
    more » « less
  5. Abstract Carbohydrate active enzymes (CAZymes) are made by various organisms for complex carbohydrate metabolism. Genome mining of CAZymes has become a routine data analysis in (meta-)genome projects, owing to the importance of CAZymes in bioenergy, microbiome, nutrition, agriculture, and global carbon recycling. In 2012, dbCAN was provided as an online web server for automated CAZyme annotation. dbCAN2 (https://bcb.unl.edu/dbCAN2) was further developed in 2018 as a meta server to combine multiple tools for improved CAZyme annotation. dbCAN2 also included CGC-Finder, a tool for identifying CAZyme gene clusters (CGCs) in (meta-)genomes. We have updated the meta server to dbCAN3 with the following new functions and components: (i) dbCAN-sub as a profile Hidden Markov Model database (HMMdb) for substrate prediction at the CAZyme subfamily level; (ii) searching against experimentally characterized polysaccharide utilization loci (PULs) with known glycan substates of the dbCAN-PUL database for substrate prediction at the CGC level; (iii) a majority voting method to consider all CAZymes with substrate predicted from dbCAN-sub for substrate prediction at the CGC level; (iv) improved data browsing and visualization of substrate prediction results on the website. In summary, dbCAN3 not only inherits all the functions of dbCAN2, but also integrates three new methods for glycan substrate prediction. 
    more » « less