skip to main content


Title: PepPro: A Nonredundant Structure Data Set for Benchmarking Peptide–Protein Computational Docking

We present a nonredundant benchmark, coined PepPro, for testing peptide–protein docking algorithms. Currently, PepPro contains 89 nonredundant experimentally determined peptide–protein complex structures, with peptide sequence lengths ranging from 5 to 30 amino acids. The benchmark covers peptides with distinct secondary structures, including helix, partial helix, a mixture of helix and β‐sheet, β‐sheet formed through binding, β‐sheet formed through self‐folding, and coil. In addition, unbound proteins' structures are provided for 58 complexes and can be used for testing the ability of a docking algorithm handling the conformational changes of proteins during the binding process. PepPro should benefit the docking community for the development and improvement of peptide docking algorithms. The benchmark is available athttp://zoulab.dalton.missouri.edu/PepPro_benchmark. © 2019 Wiley Periodicals, Inc.

 
more » « less
NSF-PAR ID:
10460479
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Journal of Computational Chemistry
Volume:
41
Issue:
4
ISSN:
0192-8651
Page Range / eLocation ID:
p. 362-369
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Characterization of life processes at the molecular level requires structural details of protein interactions. The number of experimentally determined structures of protein–protein complexes accounts only for a fraction of known protein interactions. This gap in structural description of the interactome has to be bridged by modeling. An essential part of the development of structural modeling/docking techniques for protein interactions is databases of protein–protein complexes. They are necessary for studying protein interfaces, providing a knowledge base for docking algorithms, and developing intermolecular potentials, search procedures, and scoring functions. Development of protein–protein docking techniques requires thorough benchmarking of different parts of the docking protocols on carefully curated sets of protein–protein complexes. We present a comprehensive description of the Dockgroundresource (http://dockground.compbio.ku.edu) for structural modeling of protein interactions, including previously unpublished unbound docking benchmark set 4, and the X‐ray docking decoy set 2. The resource offers a variety of interconnected datasets of protein–protein complexes and other data for the development and testing of different aspects of protein docking methodologies. Based on protein–protein complexes extracted from the PDB biounit files, Dockgroundoffers sets of X‐ray unbound, simulated unbound, model, and docking decoy structures. All datasets are freely available for download, as a whole or selecting specific structures, through a user‐friendly interface on one integrated website.

     
    more » « less
  2. Abstract

    Significant efforts have been devoted in the last decade to improving molecular docking techniques to predict both accurate binding poses and ranking affinities. Some shortcomings in the field are the limited number of standard methods for measuring docking success and the availability of widely accepted standard data sets for use as benchmarks in comparing different docking algorithms throughout the field. In order to address these issues, we have created a Cross‐Docking Benchmark server. The server is a versatile cross‐docking data set containing 4,399 protein‐ligand complexes across 95 protein targets intended to serve as benchmark set and gold standard for state‐of‐the‐art pose and ranking prediction in easy, medium, hard, or very hard docking targets. The benchmark along with a customizable cross‐docking data set generation tool is available athttp://disco.csb.pitt.edu. We further demonstrate the potential uses of the server in questions outside of basic benchmarking such as the selection of the ideal docking reference structure.

     
    more » « less
  3. Abstract

    Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor. Here we describe Open-ComBind, an easy-to-use, open-source version of the ComBind molecular docking pipeline that leverages information from multiple ligands without known bound structures to enhance pose selection. We first create distributions of feature similarities between ligand pose pairs, comparing near-native poses with all sampled docked poses. These distributions capture the likelihood of observing similar features, such as hydrogen bonds or hydrophobic contacts, in different pose configurations. These similarity distributions are then combined with a per-ligand docking score to enhance overall pose selection by 5% and 4.5% for high-affinity and congeneric series helper ligands, respectively. Open-ComBind reduces the average RMSD of ligands in our benchmark dataset by 9.0%. We provide Open-ComBind as an easy-to-use command line and Python API to increase pose prediction performance atwww.github.com/drewnutt/open_combind.

     
    more » « less
  4. Abstract

    Structural information of protein–protein interactions is essential for characterization of life processes at the molecular level. While a small fraction of known protein interactions has experimentally determined structures, computational modeling of protein complexes (protein docking) has to fill the gap. TheDockgroundresource (http://dockground.compbio.ku.edu) provides a collection of datasets for the development and testing of protein docking techniques. Currently,Dockgroundcontains datasets for the bound and the unbound (experimentally determined and simulated) protein structures, model–model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative docking. TheDockgroundbound proteins dataset is a core set, from which otherDockgrounddatasets are generated. It is devised as a relational PostgreSQL database containing information on experimentally determined protein–protein complexes. This report on theDockgroundresource describes current status of the datasets, new automated update procedures and further development of the core datasets. We also present a newDockgroundinteractive web interface, which allows search by various parameters, such as release date, multimeric state, complex type, structure resolution, and so on, visualization of the search results with a number of customizable parameters, as well as downloadable datasets with predefined levels of sequence and structure redundancy.

     
    more » « less
  5. LINKED ARTICLES

    This article is part of a themed issue on Structure Guided Pharmacology of Membrane Proteins (BJP 75th Anniversary). To view the other articles in this section visithttp://onlinelibrary.wiley.com/doi/10.1111/bph.v179.14/issuetoc

     
    more » « less