skip to main content


Title: Bioinformatics Investigations of Universal Stress Proteins from Mercury-Methylating Desulfovibrionaceae
The presence of methylmercury in aquatic environments and marine food sources is of global concern. The chemical reaction for the addition of a methyl group to inorganic mercury occurs in diverse bacterial taxonomic groups including the Gram-negative, sulfate-reducing Desulfovibrionaceae family that inhabit extreme aquatic environments. The availability of whole-genome sequence datasets for members of the Desulfovibrionaceae presents opportunities to understand the microbial mechanisms that contribute to methylmercury production in extreme aquatic environments. We have applied bioinformatics resources and developed visual analytics resources to categorize a collection of 719 putative universal stress protein (USP) sequences predicted from 93 genomes of Desulfovibrionaceae. We have focused our bioinformatics investigations on protein sequence analytics by developing interactive visualizations to categorize Desulfovibrionaceae universal stress proteins by protein domain composition and functionally important amino acids. We identified 651 Desulfovibrionaceae universal stress protein sequences, of which 488 sequences had only one USP domain and 163 had two USP domains. The 488 single USP domain sequences were further categorized into 340 sequences with ATP-binding motif and 148 sequences without ATP-binding motif. The 163 double USP domain sequences were categorized into (1) both USP domains with ATP-binding motif (3 sequences); (2) both USP domains without ATP-binding motif (138 sequences); and (3) one USP domain with ATP-binding motif (21 sequences). We developed visual analytics resources to facilitate the investigation of these categories of datasets in the presence or absence of the mercury-methylating gene pair (hgcAB). Future research could utilize these functional categories to investigate the participation of universal stress proteins in the bacterial cellular uptake of inorganic mercury and methylmercury production, especially in anaerobic aquatic environments.  more » « less
Award ID(s):
2029363 1901377
NSF-PAR ID:
10295331
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Microorganisms
Volume:
9
Issue:
8
ISSN:
2076-2607
Page Range / eLocation ID:
1780
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    The development of proteomic methods for the characterization of domain/motif interactions has greatly expanded our understanding of signal transduction. However, proteomics-based binding screens have limitations including that the queried tissue or cell type may not harbor all potential interacting partners or post-translational modifications (PTMs) required for the interaction. Therefore, we sought a generalizable, complementary in silico approach to identify potentially novel motif and PTM-dependent binding partners of high priority.

    Results

    We used as an initial example the interaction between the Src homology 2 (SH2) domains of the adaptor proteins CT10 regulator of kinase (CRK) and CRK-like (CRKL) and phosphorylated-YXXP motifs. Employing well-curated, publicly-available resources, we scored and prioritized potential CRK/CRKL–SH2 interactors possessing signature characteristics of known interacting partners. Our approach gave high priority scores to 102 of the >9000 YXXP motif-containing proteins. Within this 102 were 21 of the 25 curated CRK/CRKL–SH2-binding partners showing a more than 80-fold enrichment. Several predicted interactors were validated biochemically. To demonstrate generalized applicability, we used our workflow to predict protein–protein interactions dependent upon motif-specific arginine methylation. Our data demonstrate the applicability of our approach to, conceivably, any modular binding domain that recognizes a specific post-translationally modified motif.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Due to the nature of experimental annotation, most protein function prediction methods operate at the protein-level, where functions are assigned to full-length proteins based on overall similarities. However, most proteins function by interacting with other proteins or molecules, and many functional associations should be limited to specific regions rather than the entire protein length. Most domain-centric function prediction methods depend on accurate domain family assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain-family left out of functional evaluation. Given the abundance of residue-level annotations currently available, we present a function prediction methodology that automatically infers function labels of specific protein regions using protein-level annotations and multiple types of region-specific features.

    Results

    We apply this method to local features obtained from InterPro, UniProtKB and amino acid sequences and show that this method improves both the accuracy and region-specificity of protein function transfer and prediction. We compare region-level predictive performance of our method against that of a whole-protein baseline method using proteins with structurally verified binding sites and also compare protein-level temporal holdout predictive performances to expand the variety and specificity of GO terms we could evaluate. Our results can also serve as a starting point to categorize GO terms into region-specific and whole-protein terms and select prediction methods for different classes of GO terms.

    Availability and implementation

    The code and features are freely available at: https://github.com/ek1203/rsfp.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Ubiquitin-like proteins (Ubls) share some features with ubiquitin (Ub) such as their globular 3D structure and the ability to attach covalently to other proteins. Interferon Stimulated Gene 15 (ISG15) is an abundant Ubl that similar to Ub, marks many hundreds of cellular proteins, altering their fate. In contrast to Ub, , ISG15 requires interferon (IFN) induction to conjugate efficiently to other proteins. Moreover, despite the multitude of E3 ligases for Ub-modified targets, a single E3 ligase termed HERC5 (in humans) is responsible for the bulk of ISG15 conjugation. Targets include both viral and cellular proteins spanning an array of cellular compartments and metabolic pathways. So far, no common structural or biochemical feature has been attributed to these diverse substrates, raising questions about how and why they are selected. Conjugation of ISG15 mitigates some viral and bacterial infections and is linked to a lower viral load pointing to the role of ISG15 in the cellular immune response. In an apparent attempt to evade the immune response, some viruses try to interfere with the ISG15 pathway. For example, deconjugation of ISG15 appears to be an approach taken by coronaviruses to interfere with ISG15 conjugates. Specifically, coronaviruses such as SARS-CoV, MERS-CoV, and SARS-CoV-2, encode papain-like proteases (PL1pro) that bear striking structural and catalytic similarities to the catalytic core domain of eukaryotic deubiquitinating enzymes of the Ubiquitin-Specific Protease (USP) sub-family. The cleavage specificity of these PLpro enzymes is for flexible polypeptides containing a consensus sequence (R/K)LXGG, enabling them to function on two seemingly unrelated categories of substrates: (i) the viral polyprotein 1 (PP1a, PP1ab) and (ii) Ub- or ISG15-conjugates. As a result, PLpro enzymes process the viral polyprotein 1 into an array of functional proteins for viral replication (termed non-structural proteins; NSPs), and it can remove Ub or ISG15 units from conjugates. However, by de-conjugating ISG15, the virus also creates free ISG15, which in turn may affect the immune response in two opposite pathways: free ISG15 negatively regulates IFN signaling in humans by binding non-catalytically to USP18, yet at the same time free ISG15 can be secreted from the cell and induce the IFN pathway of the neighboring cells. A deeper understanding of this protein-modification pathway and the mechanisms of the enzymes that counteract it will bring about effective clinical strategies related to viral and bacterial infections 
    more » « less
  4. The CRISPR-associated protein 9 (Cas9) has been engineered as a precise gene editing tool to make double-strand breaks. CRISPR-associated protein 9 binds the folded guide RNA (gRNA) that serves as a binding scaffold to guide it to the target DNA duplex via a RecA-like strand-displacement mechanism but without ATP binding or hydrolysis. The target search begins with the protospacer adjacent motif or PAM-interacting domain, recognizing it at the major groove of the duplex and melting its downstream duplex where an RNA-DNA heteroduplex is formed at nanomolar affinity. The rate-limiting step is the formation of an R-loop structure where the HNH domain inserts between the target heteroduplex and the displaced non-target DNA strand. Once the R-loop structure is formed, the non-target strand is rapidly cleaved by RuvC and ejected from the active site. This event is immediately followed by cleavage of the target DNA strand by the HNH domain and product release. Within CRISPR-associated protein 9, the HNH domain is inserted into the RuvC domain near the RuvC active site via two linker loops that provide allosteric communication between the two active sites. Due to the high flexibility of these loops and active sites, biophysical techniques have been instrumental in characterizing the dynamics and mechanism of the CRISPR-associated protein 9 nucleases, aiding structural studies in the visualization of the complete active sites and relevant linker structures. Here, we review biochemical, structural, and biophysical studies on the underlying mechanism with emphasis on how CRISPR-associated protein 9 selects the target DNA duplex and rejects non-target sequences. 
    more » « less
  5. The CRISPR-associated protein 9 (Cas9) has been engineered as a precise gene editing tool to make double-strand breaks. CRISPR-associated protein 9 binds the folded guide RNA (gRNA) that serves as a binding scaffold to guide it to the target DNA duplex via a RecA-like strand-displacement mechanism but without ATP binding or hydrolysis. The target search begins with the protospacer adjacent motif or PAM-interacting domain, recognizing it at the major groove of the duplex and melting its downstream duplex where an RNA-DNA heteroduplex is formed at nanomolar affinity. The rate-limiting step is the formation of an R-loop structure where the HNH domain inserts between the target heteroduplex and the displaced non-target DNA strand. Once the R-loop structure is formed, the non-target strand is rapidly cleaved by RuvC and ejected from the active site. This event is immediately followed by cleavage of the target DNA strand by the HNH domain and product release. Within CRISPR-associated protein 9, the HNH domain is inserted into the RuvC domain near the RuvC active site via two linker loops that provide allosteric communication between the two active sites. Due to the high flexibility of these loops and active sites, biophysical techniques have been instrumental in characterizing the dynamics and mechanism of the CRISPR-associated protein 9 nucleases, aiding structural studies in the visualization of the complete active sites and relevant linker structures. Here, we review biochemical, structural, and biophysical studies on the underlying mechanism with emphasis on how CRISPR-associated protein 9 selects the target DNA duplex and rejects non-target sequences. 
    more » « less