Abstract Carbohydrate active enzymes (CAZymes) are made by various organisms for complex carbohydrate metabolism. Genome mining of CAZymes has become a routine data analysis in (meta-)genome projects, owing to the importance of CAZymes in bioenergy, microbiome, nutrition, agriculture, and global carbon recycling. In 2012, dbCAN was provided as an online web server for automated CAZyme annotation. dbCAN2 (https://bcb.unl.edu/dbCAN2) was further developed in 2018 as a meta server to combine multiple tools for improved CAZyme annotation. dbCAN2 also included CGC-Finder, a tool for identifying CAZyme gene clusters (CGCs) in (meta-)genomes. We have updated the meta server to dbCAN3 with the following new functions and components: (i) dbCAN-sub as a profile Hidden Markov Model database (HMMdb) for substrate prediction at the CAZyme subfamily level; (ii) searching against experimentally characterized polysaccharide utilization loci (PULs) with known glycan substates of the dbCAN-PUL database for substrate prediction at the CGC level; (iii) a majority voting method to consider all CAZymes with substrate predicted from dbCAN-sub for substrate prediction at the CGC level; (iv) improved data browsing and visualization of substrate prediction results on the website. In summary, dbCAN3 not only inherits all the functions of dbCAN2, but also integrates three new methods for glycan substrate prediction.
more »
« less
dbCAN-seq update: CAZyme gene clusters and substrates in microbiomes
Abstract Carbohydrate Active EnZymes (CAZymes) are significantly important for microbial communities to thrive in carbohydrate rich environments such as animal guts, agricultural soils, forest floors, and ocean sediments. Since 2017, microbiome sequencing and assembly have produced numerous metagenome assembled genomes (MAGs). We have updated our dbCAN-seq database (https://bcb.unl.edu/dbCAN_seq) to include the following new data and features: (i) ∼498 000 CAZymes and ∼169 000 CAZyme gene clusters (CGCs) from 9421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments; (ii) Glycan substrates for 41 447 (24.54%) CGCs inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting) (the two approaches agreed on 4183 CGCs for substrate assignments); (iii) A redesigned CGC page to include the graphical display of CGC gene compositions, the alignment of query CGC and subject PUL (polysaccharide utilization loci) of dbCAN-PUL, and the eCAMI subfamily table to support the predicted substrates; (iv) A statistics page to organize all the data for easy CGC access according to substrates and taxonomic phyla; and (v) A batch download page. In summary, this updated dbCAN-seq database highlights glycan substrates predicted for CGCs from microbiomes. Future work will implement the substrate prediction function in our dbCAN2 web server.
more »
« less
- Award ID(s):
- 1933521
- PAR ID:
- 10380857
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 51
- Issue:
- D1
- ISSN:
- 0305-1048
- Page Range / eLocation ID:
- p. D557-D563
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract PULs (polysaccharide utilization loci) are discrete gene clusters of CAZymes (Carbohydrate Active EnZymes) and other genes that work together to digest and utilize carbohydrate substrates. While PULs have been extensively characterized in Bacteroidetes, there exist PULs from other bacterial phyla, as well as archaea and metagenomes, that remain to be catalogued in a database for efficient retrieval. We have developed an online database dbCAN-PUL (http://bcb.unl.edu/dbCAN_PUL/) to display experimentally verified CAZyme-containing PULs from literature with pertinent metadata, sequences, and annotation. Compared to other online CAZyme and PUL resources, dbCAN-PUL has the following new features: (i) Batch download of PUL data by target substrate, species/genome, genus, or experimental characterization method; (ii) Annotation for each PUL that displays associated metadata such as substrate(s), experimental characterization method(s) and protein sequence information, (iii) Links to external annotation pages for CAZymes (CAZy), transporters (UniProt) and other genes, (iv) Display of homologous gene clusters in GenBank sequences via integrated MultiGeneBlast tool and (v) An integrated BLASTX service available for users to query their sequences against PUL proteins in dbCAN-PUL. With these features, dbCAN-PUL will be an important repository for CAZyme and PUL research, complementing our other web servers and databases (dbCAN2, dbCAN-seq).more » « less
-
Abstract Chemoenzymatic approaches using carbohydrate‐active enzymes (CAZymes) offer a promising avenue for the synthesis of glycans like oligosaccharides. Here, we report a novel chemoenzymatic route for cellodextrins synthesis employed by chimeric CAZymes, akin to native glycosyltransferases, involving the unprecedented participation of a “non‐catalytic” lectin‐like domain or carbohydrate‐binding modules (CBMs) in the catalytic step for glycosidic bond synthesis using β‐cellobiosyl donor sugars as activated substrates. CBMs are often thought to play a passive substrate targeting role in enzymatic glycosylation reactions mostly via overcoming substrate diffusion limitations for tethered catalytic domains (CDs) but are not known to participate directly in any nucleophilic substitution mechanisms that impact the actual glycosyl transfer step. This study provides evidence for the direct participation of CBMs in the catalytic reaction step for β‐glucan glycosidic bonds synthesis enhancing activity for CBM‐based CAZyme chimeras by >140‐fold over CDs alone. Dynamic intradomain interactions that facilitate this poorly understood reaction mechanism were further revealed by small‐angle X‐ray scattering structural analysis along with detailed mutagenesis studies to shed light on our current limited understanding of similar transglycosylation‐type reaction mechanisms. In summary, our study provides a novel strategy for engineering similar CBM‐based CAZyme chimeras for the synthesis of bespoke oligosaccharides using simple activated sugar monomers.more » « less
-
Abstract Fungi play pivotal roles in terrestrial ecosystems as decomposers, pathogens, and endophytes, yet their significance in marine environments is often understudied. Seagrasses, as globally distributed marine flowering plants, have critical ecological functions, but knowledge about their associated fungal communities remains relatively limited. Previous amplicon surveys of the fungal community associated with the seagrass,Zostera marinahave revealed an abundance of potentially novel chytrids. In this study, we employed deep metagenomic sequencing to extract metagenome-assembled genomes (MAGs) from these chytrids and other microbial eukaryotes associated withZ. marinaleaves. Our efforts resulted in the recovery of five eukaryotic MAGs, including a single fungal MAG in the order Loubulomycetales (65% BUSCO completeness), three MAGs representing diatoms in the family Bacillariaceae (93%, 70% and 31% BUSCO completeness) and a single MAG representing a haptophyte algae in the genusPrymnesium(40% BUSCO completeness). Whole-genome phylogenomic assessment of these MAGs suggests they all largely represent under sequenced, and possibly novel eukaryotic lineages. Of particular interest, the chytrid MAG was placed within the order Lobulomycetales, consistent with the identity of the dominant chytrid from previousZ. marinaamplicon survey results. Annotation of this MAG yielded 5,650 gene models of which 77% shared homology to current databases. With-in these gene models, we predicted 121 carbohydrate-active enzymes and 393 secreted proteins (103 cytoplasmic effectors, 30 apoplastic effectors). Exploration of orthologs between the Lobulomycetales MAG and existing Chytridiomycota genomes have revealed a landscape of high-copy gene families related to host recognition and interaction. Further machine learning analyses based on carbohydrate-active enzyme composition predict that this MAG is a symbiont. Overall, these five eukaryotic MAGs represent substantial genomic novelty and valuable community resources, contributing to a deeper understanding of the roles of fungi and other microbial eukaryotes in the larger seagrass ecosystem.more » « less
-
Birol, Inanc (Ed.)Abstract MotivationSingle-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge. ResultsHere, we introduce CellMeSH—a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene–cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene–cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches. Availability and implementationWeb server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
An official website of the United States government
