skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Structural predictions of protein–DNA binding: MELD-DNA
Abstract

Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein–DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein–DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein–DNA interactions).

 
more » « less
Award ID(s):
2235785
NSF-PAR ID:
10394947
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
51
Issue:
4
ISSN:
0305-1048
Format(s):
Medium: X Size: p. 1625-1636
Size(s):
p. 1625-1636
Sponsoring Org:
National Science Foundation
More Like this
  1. Parent, Kristin N (Ed.)
    ABSTRACT

    Most icosahedral DNA viruses package and condense their genomes into pre-formed, volumetrically constrained capsids. However, concurrent genome biosynthesis and packaging are specific to single-stranded (ss) DNA micro- and parvoviruses. Before packaging, ~120 copies of the øX174 DNA-binding protein J interact with double-stranded DNA. 60 J proteins enter the procapsid with the ssDNA genome, guiding it between 60 icosahedrally ordered DNA-binding pockets formed by the capsid proteins. Although J proteins are small, 28–37 residues in length, they have two domains. The basic, positively charged N-terminus guides the genome between binding pockets, whereas the C-terminus acts as an anchor to the capsid’s inner surface. Three C-terminal aromatic residues, W30, Y31, and F37, interact most extensively with the coat protein. Their corresponding codons were mutated, and the resulting strains were biochemically and genetically characterized. Depending on the mutation, the substitutions produced unstable packaging complexes, unstable virions, infectious progeny, or particles packaged with smaller genomes, the latter being a novel phenomenon. The smaller genomes contained internal deletions. The juncture sequences suggest that the unessential A* (A star) protein mediates deletion formation.

    <sc>IMPORTANCE</sc>

    Unessential but strongly conserved gene products are understudied, especially when mutations do not confer discernable phenotypes or the protein’s contribution to fitness is too small to reliably determine in laboratory-based assays. Consequently, their functions and evolutionary impact remain obscure. The data presented herein suggest that microvirus A* proteins, discovered over 40 years ago, may hasten the termination of non-productive packaging events. Thus, performing a salvage function by liberating the reusable components of the failed packaging complexes, such as DNA templates and replication enzymes.

     
    more » « less
  2. D’Auria, Sabato (Ed.)
    Biolayer interferometry (BLI) is a widely utilized technique for determining macromolecular interaction dynamics in real time. Using changes in the interference pattern of white light reflected off a biosensor tip, BLI can determine binding parameters for protein-protein ( e . g ., antibody-substrate kinetics) or protein-small molecule ( e . g ., drug discovery) interactions. However, a less-appreciated application for BLI analysis is DNA-protein interactions. DNA-binding proteins play an immense role in cellular biology, controlling critical processes including transcription, DNA replication, and DNA repair. Understanding how proteins interact with DNA often provides important insight into their biological function, and novel technologies to assay DNA-protein interactions are of broad interest. Currently, a detailed protocol utilizing BLI for DNA-protein interactions is lacking. In the following protocol, we describe the use of BLI and biotinylated-DNA probes to determine the binding kinetics of a transcription factor to a specific DNA sequence. The experimental steps include the generation of biotinylated-DNA probes, the execution of the BLI experiment, and data analysis by scientific graphing and statistical software ( e . g ., GraphPad Prism). Although the example experiment used throughout this protocol involves a prokaryotic transcription factor, this technique can be easily translated to any DNA-binding protein. Pitfalls and potential solutions for investigating DNA-binding proteins by BLI are also presented. 
    more » « less
  3. Protein–DNA interactions play an important role in various biological processes such as gene expression, replication, and transcription. Understanding the important features that dictate the binding affinity of protein-DNA complexes and predicting their affinities is important for elucidating their recognition mechanisms. In this work, we have collected the experimental binding free energy (ΔG) for a set of 391 Protein-DNA complexes and derived several structure-based features such as interaction energy, contact potentials, volume and surface area of binding site residues, base step parameters of the DNA and contacts between different types of atoms. Our analysis on relationship between binding affinity and structural features revealed that the important factors mainly depend on the number of DNA strands as well as functional and structural classes of proteins. Specifically, binding site properties such as number of atom contacts between the DNA and protein, volume of protein binding sites and interaction-based features such as interaction energies and contact potentials are important to understand the binding affinity. Further, we developed multiple regression equations for predicting the binding affinity of protein-DNA complexes belonging to different structural and functional classes. Our method showed an average correlation and mean absolute error of 0.78 and 0.98 kcal/mol, respectively, between the experimental and predicted binding affinities on a jack-knife test. We have developed a webserver, PDA-PreD (Protein-DNA Binding affinity predictor), for predicting the affinity of protein-DNA complexes and it is freely available at https://web.iitm.ac.in/bioinfo2/pdapred/ 
    more » « less
  4. Abstract Motivation

    Accurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use.

    Results

    We propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins.

    Availability and implementation

    SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. The BEN domain is a recently recognized DNA binding module that is present in diverse metazoans and certain viruses. Several BEN domain factors are known as transcriptional repressors, but, overall, relatively little is known of how BEN factors identify their targets in humans. In particular, X-ray structures of BEN domain:DNA complexes are only known for Drosophila factors bearing a single BEN domain, which lack direct vertebrate orthologs. Here, we characterize several mammalian BEN domain (BD) factors, including from two NACC family BTB-BEN proteins and from BEND3, which has four BDs. In vitro selection data revealed sequence-specific binding activities of isolated BEN domains from all of these factors. We conducted detailed functional, genomic, and structural studies of BEND3. We show that BD4 is a major determinant for in vivo association and repression of endogenous BEND3 targets. We obtained a high-resolution structure of BEND3-BD4 bound to its preferred binding site, which reveals how BEND3 identifies cognate DNA targets and shows differences with one of its non-DNA-binding BEN domains (BD1). Finally, comparison with our previous invertebrate BEN structures, along with additional structural predictions using AlphaFold2 and RoseTTAFold, reveal distinct strategies for target DNA recognition by different types of BEN domain proteins. Together, these studies expand the DNA recognition activities of BEN factors and provide structural insights into sequence-specific DNA binding by mammalian BEN proteins. 
    more » « less