skip to main content

Title: Negatively charged, intrinsically disordered regions can accelerate target search by DNA-binding proteins

In eukaryotes, many DNA/RNA-binding proteins possess intrinsically disordered regions (IDRs) with large negative charge, some of which involve a consecutive sequence of aspartate (D) or glutamate (E) residues. We refer to them as D/E repeats. The functional role of D/E repeats is not well understood, though some of them are known to cause autoinhibition through intramolecular electrostatic interaction with functional domains. In this work, we investigated the impacts of D/E repeats on the target DNA search kinetics for the high-mobility group box 1 (HMGB1) protein and the artificial protein constructs of the Antp homeodomain fused with D/E repeats of varied lengths. Our experimental data showed that D/E repeats of particular lengths can accelerate the target association in the overwhelming presence of non-functional high-affinity ligands (‘decoys’). Our coarse-grained molecular dynamics (CGMD) simulations showed that the autoinhibited proteins can bind to DNA and transition into the uninhibited complex with DNA through an electrostatically driven induced-fit process. In conjunction with the CGMD simulations, our kinetic model can explain how D/E repeats can accelerate the target association process in the presence of decoys. This study illuminates an unprecedented role of the negatively charged IDRs in the target search process.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Page Range / eLocation ID:
p. 4701-4712
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Marshall, Christopher W. (Ed.)
    ABSTRACT Identification of genes encoding β-lactamases (BLs) from short-read sequences remains challenging due to the high frequency of shared amino acid functional domains and motifs in proteins encoded by BL genes and related non-BL gene sequences. Divergent BL homologs can be frequently missed during similarity searches, which has important practical consequences for monitoring antibiotic resistance. To address this limitation, we built ROCker models that targeted broad classes (e.g., class A, B, C, and D) and individual families (e.g., TEM) of BLs and challenged them with mock 150-bp- and 250-bp-read data sets of known composition. ROCker identifies most-discriminant bit score thresholds in sliding windows along the sequence of the target protein sequence and hence can account for nondiscriminative domains shared by unrelated proteins. BL ROCker models showed a 0% false-positive rate (FPR), a 0% to 4% false-negative rate (FNR), and an up-to-50-fold-higher F1 score [2 × precision × recall/(precision + recall)] compared to alternative methods, such as similarity searches using BLASTx with various e-value thresholds and BL hidden Markov models, or tools like DeepARG, ShortBRED, and AMRFinder. The ROCker models and the underlying protein sequence reference data sets and phylogenetic trees for read placement are freely available through . Application of these BL ROCker models to metagenomics, metatranscriptomics, and high-throughput PCR gene amplicon data should facilitate the reliable detection and quantification of BL variants encoded by environmental or clinical isolates and microbiomes and more accurate assessment of the associated public health risk, compared to the current practice. IMPORTANCE Resistance genes encoding β-lactamases (BLs) confer resistance to the widely prescribed antibiotic class β-lactams. Therefore, it is important to assess the prevalence of BL genes in clinical or environmental samples for monitoring the spreading of these genes into pathogens and estimating public health risk. However, detecting BLs in short-read sequence data is technically challenging. Our ROCker model-based bioinformatics approach showcases the reliable detection and typing of BLs in complex data sets and thus contributes toward solving an important problem in antibiotic resistance surveillance. The ROCker models developed substantially expand the toolbox for monitoring antibiotic resistance in clinical or environmental settings. 
    more » « less
  2. Background

    Sequence‐specific binding by transcription factors (TFs) plays a significant role in the selection and regulation of target genes. At the protein:DNA interface, amino acid side‐chains construct a diverse physicochemical network of specific and non‐specific interactions, and seemingly subtle changes in amino acid identity at certain positions may dramatically impact TF:DNA binding. Variation of these specificity‐determining residues (SDRs) is a major mechanism of functional divergence between TFs with strong structural or sequence homology.


    In this study, we employed a combination of high‐throughput specificity profiling by SELEX and Spec‐seq, structural modeling, and evolutionary analysis to probe the binding preferences of winged helix‐turn‐helix TFs belonging to the OmpR sub‐family inEscherichia coli.


    We found thatE. coliOmpR paralogs recognize tandem, variably spaced repeats composed of “GT‐A” or “GCT”‐containing half‐sites. Some divergent sequence preferences observed within the “GT‐A” mode correlate with amino acid similarity; conversely, “GCT”‐based motifs were observed for a subset of paralogs with low sequence homology. Direct specificity profiling of a subset of OmpR homologues (CpxR, RstA, and OmpR) as well as predicted “SDR‐swap” variants revealed that individual SDRs may impact sequence preferences locally through direct contact with DNA bases or distally via the DNA backbone.


    Overall, our work provides evidence for a common structural “code” for sequence‐specific wHTH‐DNA interactions, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Further examination of SDR predictions will likely reveal additional mechanisms controlling the evolutionary divergence of this important class of transcriptional regulators.

    more » « less
  3. Naturally occurring amino acids have been broadly used as additives to improve protein solubility and inhibit aggregation. In this study, improvements in protein signal intensity obtained with the addition of l -serine, and structural analogs, to the desorption electrospray ionization mass spectrometry (DESI-MS) spray solvent were measured. The results were interpreted at the hand of proposed mechanisms of solution additive effects on protein solubility and dissolution. DESI-MS allows for these processes to be studied efficiently using dilute concentrations of additives and small amounts of proteins, advantages that represent real benefits compared to classical methods of studying protein stability and aggregation. We show that serine significantly increases the protein signal in DESI-MS when native proteins are undergoing unfolding during the dissolution process with an acidic solvent system ( p -value = 0.0001), or with ammonium bicarbonate under denaturing conditions for proteins with high isoelectric points ( p -value = 0.001). We establish that a similar increase in the protein signal cannot be observed with direct ESI-MS, and the observed increase is therefore not related to ionization processes or changes in the physical properties of the bulk solution. The importance of the presence of serine during protein conformational changes while undergoing dissolution is demonstrated through comparisons between the analyses of proteins deposited in native or unfolded states and by using native state-preserving and denaturing desorption solvents. We hypothesize that direct, non-covalent interactions involving all three functional groups of serine are involved in the beneficial effect on protein solubility and dissolution. Supporting evidence for a direct interaction include a reduction in efficacy with d -serine or the racemic mixture, indicating a non-bulk-solution physical property effect; insensitivity to the sample surface type or relative placement of serine addition; and a reduction in efficacy with any modifications to the serine structure, most notably the carboxyl functional group. An alternative hypothesis, also supported by some of our observations, could involve the role of serine clusters in the mechanism of solubility enhancement. Our study demonstrates the capability of DESI-MS together with complementary ESI-MS experiments as a novel tool for understanding protein solubility and dissolution and investigating the mechanism of action for solubility-enhancing additives. 
    more » « less
  4. Ubiquitin and ubiquitin like proteins (UBLs) play key roles in eukaryotes. These proteins are attached to their target proteins through an E1-E2-E3 cascade and modify the functions of these proteins. Since the discovery of ubiquitin, several UBLs have been identified, including Nedd8, SUMO, ISG15, and Atg8. Ubiquitin and UBLs share a similar three-dimensional structure: β -grasp fold and an X-X-[R/A/E/K]-X-X-[G/X]-G motif at the C-terminus. We have previously reported that ubiquitin, Nedd8, and SUMO mimicking peptides which all contain the conserved motif X-X-[R/A/E/K]-X-X-[G/X]-G still retained their reactivity toward their corresponding E1, E2, and E3 enzymes. In our current study, we investigated whether such C-terminal peptides could still be transferred onto related pathway enzymes to probe the function of these enzymes when they are fused with a protein. By bioinformatic search of protein databases, we selected eight proteins carrying the X-X-[R/A/E/K]-X-X-[G/X]-G motif at the C-terminus of the β -grasp fold. We synthesized the C-terminal sequences of these candidates as short peptides and found that three of them showed significant reactivity with the ubiquitin E1 enzyme Ube1. We next fused the three reactive short peptides to three different protein frames, including their respective native protein frames, a ubiquitin frame and a peptidyl carrier protein (PCP) frame, and measured the reactivities of these peptide-fused proteins with Ube1. Peptide-fused proteins on ubiquitin and PCP frames showed obvious reactivity with Ube1. However, when we measured E2 UbcH7 transfer, we found that the PCP-peptide fusions lost their reactivity with UbcH7. Taken together, these results suggested that the recognition of E2 enzymes with peptide-fused proteins depended not only on the C-terminal sequences of the ubiquitin-mimicking peptides, but also on the overall structures of the protein frames. 
    more » « less
  5. Protein-DNA interactions are critical for the successful functioning of all natural systems. The key role in these interactions is played by processes of protein search for specific sites on DNA. Although it has been studied for many years, only recently microscopic aspects of these processes became more clear. In this work, we present a review on current theoretical understanding of the molecular mechanisms of the protein target search. A comprehensive discrete-state stochastic method to explain the dynamics of the protein search phenomena is introduced and explained. Our theoretical approach utilizes a first-passage analysis and it takes into account the most relevant physical-chemical processes. It is able to describe many fascinating features of the protein search, including unusually high effective association rates, high selectivity and specificity, and the robustness in the presence of crowders and sequence heterogeneity. 
    more » « less