skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protein docking model evaluation by 3D deep convolutional neural networks
Abstract Motivation Many important cellular processes involve physical interactions of proteins. Therefore, determining protein quaternary structures provide critical insights for understanding molecular mechanisms of functions of the complexes. To complement experimental methods, many computational methods have been developed to predict structures of protein complexes. One of the challenges in computational protein complex structure prediction is to identify near-native models from a large pool of generated models. Results We developed a convolutional deep neural network-based approach named DOcking decoy selection with Voxel-based deep neural nEtwork (DOVE) for evaluating protein docking models. To evaluate a protein docking model, DOVE scans the protein–protein interface of the model with a 3D voxel and considers atomic interaction types and their energetic contributions as input features applied to the neural network. The deep learning models were trained and validated on docking models available in the ZDock and DockGround databases. Among the different combinations of features tested, almost all outperformed existing scoring functions. Availability and implementation Codes available at http://github.com/kiharalab/DOVE, http://kiharalab.org/dove/. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1922883 1825941 1925643
PAR ID:
10188399
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Bioinformatics
Volume:
36
Issue:
7
ISSN:
1367-4803
Page Range / eLocation ID:
2113 to 2118
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning–based approach named Graph Neural Network–based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models. 
    more » « less
  2. Abstract MotivationQuality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. ResultsWe develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. Availability and implementationThe source code is available at https://github.com/BioinfoMachineLearning/EnQA. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Soares, Claudio M. (Ed.)
    Membrane proteins are significantly underrepresented in Protein Data Bank despite their essential role in cellular mechanisms and the major progress in experimental protein structure determination. Thus, computational approaches are especially valuable in the case of membrane proteins and their assemblies. The main focus in developing structure prediction techniques has been on soluble proteins, in part due to much greater availability of the structural data. Currently, structure prediction of protein complexes (protein docking) is a well-developed field of study. However, the generic protein docking approaches are not optimal for the membrane proteins because of the differences in physicochemical environment and the spatial constraints imposed by the membranes. Thus, docking of the membrane proteins requires specialized computational methods. Development and benchmarking of the membrane protein docking approaches has to be based on high-quality sets of membrane protein complexes. In this study we present a new dataset of 456 non-redundant alpha helical binary interfaces. The set is significantly larger and more representative than the previously developed sets. In the future, it will become the basis for the development of docking and scoring benchmarks, similar to the ones for soluble proteins in the Dockground resource http://dockground.compbio.ku.edu . 
    more » « less
  4. Abstract Computational modeling of protein–DNA complex structures has important implications in biomedical applications such as structure‐based, computer aided drug design. A key step in developing methods for accurate modeling of protein–DNA complexes is similarity assessment between models and their reference complex structures. Existing methods primarily rely on distance‐based metrics and generally do not consider important functional features of the complexes, such as interface hydrogen bonds that are critical to specific protein–DNA interactions. Here, we present a new scoring function, ComparePD, which takes interface hydrogen bond energy and strength into account besides the distance‐based metrics for accurate similarity measure of protein–DNA complexes. ComparePD was tested on two datasets of computational models of protein–DNA complexes generated using docking (classified as easy, intermediate, and difficult cases) and homology modeling methods. The results were compared with PDDockQ, a modified version of DockQ tailored for protein–DNA complexes, as well as the metrics employed by the community‐wide experiment CAPRI (Critical Assessment of PRedicted Interactions). We demonstrated that ComparePD provides an improved similarity measure over PDDockQ and the CAPRI classification method by considering both conformational similarity and functional importance of the complex interface. ComparePD identified more meaningful models as compared to PDDockQ for all the cases having different top models between ComparePD and PDDockQ except for one intermediate docking case. 
    more » « less
  5. Abstract Structural information of protein–protein interactions is essential for characterization of life processes at the molecular level. While a small fraction of known protein interactions has experimentally determined structures, computational modeling of protein complexes (protein docking) has to fill the gap. TheDockgroundresource (http://dockground.compbio.ku.edu) provides a collection of datasets for the development and testing of protein docking techniques. Currently,Dockgroundcontains datasets for the bound and the unbound (experimentally determined and simulated) protein structures, model–model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative docking. TheDockgroundbound proteins dataset is a core set, from which otherDockgrounddatasets are generated. It is devised as a relational PostgreSQL database containing information on experimentally determined protein–protein complexes. This report on theDockgroundresource describes current status of the datasets, new automated update procedures and further development of the core datasets. We also present a newDockgroundinteractive web interface, which allows search by various parameters, such as release date, multimeric state, complex type, structure resolution, and so on, visualization of the search results with a number of customizable parameters, as well as downloadable datasets with predefined levels of sequence and structure redundancy. 
    more » « less