skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PDA-Pred: Predicting the binding affinity of protein-DNA complexes using machine learning techniques and structural features
Protein–DNA interactions play an important role in various biological processes such as gene expression, replication, and transcription. Understanding the important features that dictate the binding affinity of protein-DNA complexes and predicting their affinities is important for elucidating their recognition mechanisms. In this work, we have collected the experimental binding free energy (ΔG) for a set of 391 Protein-DNA complexes and derived several structure-based features such as interaction energy, contact potentials, volume and surface area of binding site residues, base step parameters of the DNA and contacts between different types of atoms. Our analysis on relationship between binding affinity and structural features revealed that the important factors mainly depend on the number of DNA strands as well as functional and structural classes of proteins. Specifically, binding site properties such as number of atom contacts between the DNA and protein, volume of protein binding sites and interaction-based features such as interaction energies and contact potentials are important to understand the binding affinity. Further, we developed multiple regression equations for predicting the binding affinity of protein-DNA complexes belonging to different structural and functional classes. Our method showed an average correlation and mean absolute error of 0.78 and 0.98 kcal/mol, respectively, between the experimental and predicted binding affinities on a jack-knife test. We have developed a webserver, PDA-PreD (Protein-DNA Binding affinity predictor), for predicting the affinity of protein-DNA complexes and it is freely available at https://web.iitm.ac.in/bioinfo2/pdapred/  more » « less
Award ID(s):
1925643
PAR ID:
10475943
Author(s) / Creator(s):
; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Methods
Volume:
213
Issue:
C
ISSN:
1046-2023
Page Range / eLocation ID:
10 to 17
Subject(s) / Keyword(s):
protein-dna interaction binding affinity prediction of interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The DNA-binding protein from starved cells (Dps) plays a crucial role in maintaining bacterial cell viability during periods of stress. Dps is a nucleoid-associated protein that interacts with DNA to create biomolecular condensates in live bacteria. Purified Dps protein can also rapidly form large complexes when combined with DNA in vitro. However, the mechanism that allows these complexes to nucleate on DNA remains unclear. Here, we examine how DNA topology influences the formation of Dps–DNA complexes. We find that DNA supercoils offer the most preferred template for the nucleation of condensed Dps structures. More generally, bridging contacts between different regions of DNA can facilitate the nucleation of condensed Dps structures. In contrast, Dps shows little affinity for stretched linear DNA before it is relaxed. Once DNA is condensed, Dps forms a stable complex that can form inter-strand contacts with nearby DNA, even without free Dps present in solution. Taken together, our results establish the important role played by bridging contacts between DNA strands in nucleating and stabilizing Dps complexes. 
    more » « less
  2. null (Ed.)
    A growing number of computational tools have been developed to accurately and rapidly predict the impact of amino acid mutations on protein-protein relative binding affinities. Such tools have many applications, for example, designing new drugs and studying evolutionary mechanisms. In the search for accuracy, many of these methods employ expensive yet rigorous molecular dynamics simulations. By contrast, non-rigorous methods use less exhaustive statistical mechanics, allowing for more efficient calculations. However, it is unclear if such methods retain enough accuracy to replace rigorous methods in binding affinity calculations. This trade-off between accuracy and computational expense makes it difficult to determine the best method for a particular system or study. Here, eight non-rigorous computational methods were assessed using eight antibody-antigen and eight non-antibody-antigen complexes for their ability to accurately predict relative binding affinities (ΔΔG) for 654 single mutations. In addition to assessing accuracy, we analyzed the CPU cost and performance for each method using a variety of physico-chemical structural features. This allowed us to posit scenarios in which each method may be best utilized. Most methods performed worse when applied to antibody-antigen complexes compared to non-antibody-antigen complexes. Rosetta-based JayZ and EasyE methods classified mutations as destabilizing (ΔΔG < -0.5 kcal/mol) with high (83–98%) accuracy and a relatively low computational cost for non-antibody-antigen complexes. Some of the most accurate results for antibody-antigen systems came from combining molecular dynamics with FoldX with a correlation coefficient (r) of 0.46, but this was also the most computationally expensive method. Overall, our results suggest these methods can be used to quickly and accurately predict stabilizing versus destabilizing mutations but are less accurate at predicting actual binding affinities. This study highlights the need for continued development of reliable, accessible, and reproducible methods for predicting binding affinities in antibody-antigen proteins and provides a recipe for using current methods. 
    more » « less
  3. Abstract Structural, regulatory and enzymatic proteins interact with DNA to maintain a healthy and functional genome. Yet, our structural understanding of how proteins interact with DNA is limited. We present MELD-DNA, a novel computational approach to predict the structures of protein–DNA complexes. The method combines molecular dynamics simulations with general knowledge or experimental information through Bayesian inference. The physical model is sensitive to sequence-dependent properties and conformational changes required for binding, while information accelerates sampling of bound conformations. MELD-DNA can: (i) sample multiple binding modes; (ii) identify the preferred binding mode from the ensembles; and (iii) provide qualitative binding preferences between DNA sequences. We first assess performance on a dataset of 15 protein–DNA complexes and compare it with state-of-the-art methodologies. Furthermore, for three selected complexes, we show sequence dependence effects of binding in MELD predictions. We expect that the results presented herein, together with the freely available software, will impact structural biology (by complementing DNA structural databases) and molecular recognition (by bringing new insights into aspects governing protein–DNA interactions). 
    more » « less
  4. Abstract MotivationComputational methods for compound–protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound–protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound–protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models. ResultsTo overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins. Availability and implementationData and source codes are available at https://github.com/Shen-Lab/CPAC. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. The emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has triggered a global COVID-19 pandemic, challenging healthcare systems worldwide. Effective therapeutic strategies against this novel coronavirus remain limited, underscoring the urgent need for innovative approaches. The present research investigates the potential of cannabis compounds as therapeutic agents against SARS-CoV-2 through their interaction with the virus’s papain-like protease (PLpro) protein, a crucial element in viral replication and immune evasion. Computational methods, including molecular docking and molecular dynamics (MD) simulations, were employed to screen cannabis compounds against PLpro and analyze their binding mechanisms and interaction patterns. The results showed cannabinoids with binding affinities ranging from −6.1 kcal/mol to −4.6 kcal/mol, forming interactions with PLpro. Notably, Cannabigerolic and Cannabidiolic acids exhibited strong binding contacts with critical residues in PLpro’s active region, indicating their potential as viral replication inhibitors. MD simulations revealed the dynamic behavior of cannabinoid–PLpro complexes, highlighting stable binding conformations and conformational changes over time. These findings shed light on the mechanisms underlying cannabis interaction with SARS-CoV-2 PLpro, aiding in the rational design of antiviral therapies. Future research will focus on experimental validation, optimizing binding affinity and selectivity, and preclinical assessments to develop effective treatments against COVID-19. 
    more » « less