skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: BPPRC database: a web-based tool to access and analyse bacterial pesticidal proteins
Abstract Pesticidal proteins derived from the bacterium Bacillus thuringiensis, have provided the bases for a diverse array of pest management tools ranging from natural products used in organic agriculture, to modern biotechnological approaches. With advances in genome sequencing technologies and protein structure determination, an increasing number of pesticidal proteins from myriad bacterial species have been identified. The Bacterial Pesticidal Protein Resource Center (BPPRC) has been established to provide informational and analytical resources on the wide range of pesticidal proteins derived from bacteria that have potential utility for arthropod management. In association with a revised nomenclature for these proteins, BPPRC contains a database that allows users to browse and download sequences. Users can search the database for the best matches to sequences of interest and can incorporate their own sequences into basic informatic analyses. These analyses include the ability to draw and export guide trees from either whole protein sequences or, in the case of the three-domain Cry proteins, from individual domains. The associated website also provides a portal for users to submit protein sequences for naming. The BPPRC provides a single authoritative source of information to which all stakeholders can be referred including academics, government regulatory bodies and research and development personnel in the industrial sector. The database provides information on more than 1060 pesticidal proteins derived from 13 species of bacteria, including insecticidal activities for a subset of these proteins. Database URL: www.bpprc.org and www.bpprc-db.org/  more » « less
Award ID(s):
1821914
PAR ID:
10327622
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Database
Volume:
2022
ISSN:
1758-0463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nojiri, Hideaki (Ed.)
    ABSTRACT Bacterial mobile genetic elements (MGEs) encode functional modules that perform both core and accessory functions for the element, the latter of which are often only transiently associated with the element. The presence of these accessory genes, which are often close homologs to primarily immobile genes, incur high rates of false positives and, therefore, limits the usability of these databases for MGE annotation. To overcome this limitation, we analyzed 10,776,849 protein sequences derived from eight MGE databases to compile a comprehensive set of 6,140 manually curated protein families that are linked to the “life cycle” (integration/excision, replication/recombination/repair, transfer, stability/transfer/defense, and phage-specific processes) of plasmids, phages, integrative, transposable, and conjugative elements. We overlay experimental information where available to create a tiered annotation scheme of high-quality annotations and annotations inferred exclusively through bioinformatic evidence. We additionally provide an MGE-class label for each entry (e.g., plasmid or integrative element), and assign to each entry a major and minor category. The resulting database, mobileOG-db (for mobile orthologous groups), comprises over 700,000 deduplicated sequences encompassing five major mobileOG categories and more than 50 minor categories, providing a structured language and interpretable basis for an array of MGE-centered analyses. mobileOG-db can be accessed at mobileogdb.flsi.cloud.vt.edu/, where users can select, refine, and analyze custom subsets of the dynamic mobilome. IMPORTANCE The analysis of bacterial mobile genetic elements (MGEs) in genomic data is a critical step toward profiling the root causes of antibiotic resistance, phenotypic or metabolic diversity, and the evolution of bacterial genera. Existing methods for MGE annotation pose high barriers of biological and computational expertise to properly harness. To bridge this gap, we systematically analyzed 10,776,849 proteins derived from eight databases of MGEs to identify 6,140 MGE protein families that can serve as candidate hallmarks, i.e., proteins that can be used as “signatures” of MGEs to aid annotation. The resulting resource, mobileOG-db, provides a multilevel classification scheme that encompasses plasmid, phage, integrative, and transposable element protein families categorized into five major mobileOG categories and more than 50 minor categories. mobileOG-db thus provides a rich resource for simple and intuitive element annotation that can be integrated seamlessly into existing MGE detection pipelines and colocalization analyses. 
    more » « less
  2. Faust, Karoline (Ed.)
    ABSTRACT Much of our knowledge of bacterial transcription initiation has been derived from studying the promoters of Escherichia coli and Bacillus subtilis . Given the expansive diversity across the bacterial phylogeny, it is unclear how much of this knowledge can be applied to other organisms. Here, we report on bioinformatic analyses of promoter sequences of the primary σ factor (σ 70 ) by leveraging publicly available transcription start site (TSS) sequencing data sets for nine bacterial species spanning five phyla. This analysis identifies previously unreported differences in the −35 and −10 elements of σ 70 -dependent promoters in several groups of bacteria. We found that Actinobacteria and Betaproteobacteria σ 70 -dependent promoters lack the TTG triad in their −35 element, which is predicted to be conserved across the bacterial phyla. In addition, the majority of the Alphaproteobacteria σ 70 -dependent promoters analyzed lacked the thymine at position −7 that is highly conserved in other phyla. Bioinformatic examination of the Alphaproteobacteria σ 70 -dependent promoters identifies a significant overrepresentation of essential genes and ones encoding proteins with common cellular functions downstream of promoters containing an A, C, or G at position −7. We propose that transcription of many σ 70 -dependent promoters in Alphaproteobacteria depends on the transcription factor CarD, which is an essential protein in several members of this phylum. Our analysis expands the knowledge of promoter architecture across the bacterial phylogeny and provides new information that can be used to engineer bacteria for use in medical, environmental, agricultural, and biotechnological processes. IMPORTANCE Transcription of DNA to RNA by RNA polymerase is essential for cells to grow, develop, and respond to stress. Understanding the process and control of transcription is important for health, disease, the environment, and biotechnology. Decades of research on a few bacteria have identified promoter DNA sequences that are recognized by the σ subunit of RNA polymerase. We used bioinformatic analyses to reveal previously unreported differences in promoter DNA sequences across the bacterial phylogeny. We found that many Actinobacteria and Betaproteobacteria promoters lack a sequence in their −35 DNA recognition element that was previously assumed to be conserved and that Alphaproteobacteria lack a thymine residue at position −7, also previously assumed to be conserved. Our work reports important new information about bacterial transcription, illustrates the benefits of studying bacteria across the phylogenetic tree, and proposes new lines of future investigation. 
    more » « less
  3. null (Ed.)
    Abstract PULs (polysaccharide utilization loci) are discrete gene clusters of CAZymes (Carbohydrate Active EnZymes) and other genes that work together to digest and utilize carbohydrate substrates. While PULs have been extensively characterized in Bacteroidetes, there exist PULs from other bacterial phyla, as well as archaea and metagenomes, that remain to be catalogued in a database for efficient retrieval. We have developed an online database dbCAN-PUL (http://bcb.unl.edu/dbCAN_PUL/) to display experimentally verified CAZyme-containing PULs from literature with pertinent metadata, sequences, and annotation. Compared to other online CAZyme and PUL resources, dbCAN-PUL has the following new features: (i) Batch download of PUL data by target substrate, species/genome, genus, or experimental characterization method; (ii) Annotation for each PUL that displays associated metadata such as substrate(s), experimental characterization method(s) and protein sequence information, (iii) Links to external annotation pages for CAZymes (CAZy), transporters (UniProt) and other genes, (iv) Display of homologous gene clusters in GenBank sequences via integrated MultiGeneBlast tool and (v) An integrated BLASTX service available for users to query their sequences against PUL proteins in dbCAN-PUL. With these features, dbCAN-PUL will be an important repository for CAZyme and PUL research, complementing our other web servers and databases (dbCAN2, dbCAN-seq). 
    more » « less
  4. SARSNTdb offers a curated, nucleotide-centric database for users of varying levels of SARS-CoV-2 knowledge. Its user-friendly interface enables querying coding regions and coordinate intervals to find out the various functional and selective constraints that act upon the corresponding nucleotides and amino acids. Users can easily obtain information about viral genes and proteins, functional domains, repeats, secondary structure formation, intragenomic interactions, and mutation prevalence. Currently, many databases are focused on the phylogeny and amino acid substitutions, mainly in the spike protein. We took a novel, more nucleotide-focused approach as RNA does more than just code for proteins and many insights can be gleaned from its study. For example, RNA-targeted drug therapies for SARS-CoV-2 are currently being developed and it is essential to understand the features only visible at that level. This database enables the user to identify regions that are more prone to forming secondary structures that drugs can target. SARSNTdb also provides illustrative mutation data from a subset of ~25,000 patient samples with a reliable read coverage across the whole genome (from different locations and time points in the pandemic. Finally, the database allows for comparing SARS-CoV-2 and SARS-CoV domains and sequences. SARSNTdb can serve the research community by being a curated repository for information that gives a jump start to analyze a mutation’s effect far beyond just determining synonymous/non-synonymous substitutions in protein sequences. 
    more » « less
  5. Ruby, Edward G. (Ed.)
    ABSTRACT Iron acquisition is essential for almost all living organisms. In certain environments, ferrous iron is the most prevalent form of this element. Feo is the most widespread system for ferrous iron uptake in bacteria and is critical for virulence in some species. The canonical architecture of Feo consists of a large transmembrane nucleoside triphosphatase (NTPase) protein, FeoB, and two accessory cytoplasmic proteins, FeoA and FeoC. The role of the latter components and the mechanism by which Feo orchestrates iron transport are unclear. In this study, we conducted a comparative analysis of Feo protein sequences to gain insight into the evolutionary history of this transporter. We identified instances of how horizontal gene transfer contributed to the evolution of Feo. Also, we found that FeoC, while absent in most lineages, is largely present in the Gammaproteobacteria group, although its sequence is poorly conserved. We propose that FeoC, which may couple FeoB NTPase activity with pore opening, was an ancestral element that has been dispensed with through mutations in FeoA and FeoB in some lineages. We provide experimental evidence supporting this hypothesis by isolating and characterizing FeoC-independent mutants of the Vibrio cholerae Feo system. Also, we confirmed that the closely related species Shewanella oneidensis does not require FeoC; thus, Vibrio FeoC sequences may resemble transitional forms on an evolutionary pathway toward FeoC-independent transporters. Finally, by combining data from our bioinformatic analyses with this experimental evidence, we propose an evolutionary model for the Feo system in bacteria. IMPORTANCE Feo, a ferrous iron transport system composed of three proteins (FeoA, -B, and -C), is the most prevalent bacterial iron transporter. It plays an important role in iron acquisition in low-oxygen environments and some host-pathogen interactions. The large transmembrane protein FeoB provides the channel for the transport of iron into the bacterial cell, but the functions of the two small, required accessory proteins FeoA and FeoC are not well understood. Analysis of the evolution of this transporter shows that FeoC is poorly conserved and has been lost from many bacterial lineages. Experimental evidence indicates that FeoC may have different functions in different species that retain this protein, and the loss of FeoC is promoted by mutations in FeoA or by the fusion of FeoA and FeoB. 
    more » « less