skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accurate ligand–protein docking in CASP15 using the ClusPro LigTBM server
Abstract In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template‐based ligand docking program ClusPro ligTBM, also implemented as a public server available athttps://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best‐performing teams. In fact, all the best groups used template‐based docking methods. Thus, it appears that the AlphaFold2‐generated models, despite the high accuracy of the predicted backbone, have local differences from the x‐ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology‐based docking.  more » « less
Award ID(s):
2054251
PAR ID:
10566041
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Proteins: Structure, Function, and Bioinformatics
Volume:
91
Issue:
12
ISSN:
0887-3585
Page Range / eLocation ID:
1822 to 1828
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We report the results of the “UM‐TBM” and “Zheng” groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D‐I‐TASSER and DMFold‐Multimer algorithms, respectively. For monomer structure prediction, D‐I‐TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi‐source MSA searching and a structural modeling‐based MSA ranker; (ii) attention‐network based spatial restraints; (iii) a multi‐domain module containing domain partition and arrangement for domain‐level templates and spatial restraints; (iv) an optimized I‐TASSER‐based folding simulation system for full‐length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge‐based potentials. For 47 free modeling targets in CASP15, the final models predicted by D‐I‐TASSER showed average TM‐score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo‐based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end‐to‐end deep learning methods alone. For protein complex structure prediction, DMFold‐Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end‐to‐end modeling module from AlphaFold2‐Multimer. For the 38 complex targets, DMFold‐Multimer generated models with an average TM‐score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking. 
    more » « less
  2. In recent years, the field of structural biology has seen remarkable advancements, particularly in modeling of protein tertiary and quaternary structures. The AlphaFold deep learning approach revolutionized protein structure prediction by achieving near‐experimental accuracy on many targets. This paper presents a detailed account of structural modeling of oligomeric targets in Round 55 of CAPRI by combining deep learning‐based predictions (AlphaFold2 multimer pipeline) with traditional docking techniques in a hybrid approach to protein–protein docking. To complement the AlphaFold models generated for the given oligomeric state of the targets, we built docking predictions by combining models generated for lower‐oligomeric states—dimers for trimeric targets and trimers/dimers for tetrameric targets. In addition, we used a template‐based docking procedure applied to AlphaFold predicted structures of the monomers. We analyzed the clustering of the generated AlphaFold models, the confidence in the prediction of intra‐ and inter‐chain residue‐residue contacts, and the correlation of the AlphaFold predictions stability with the quality of the submitted models. 
    more » « less
  3. Abstract In recent years, significant advancements have been made in deep learning‐based computational modeling of proteins, with DeepMind's AlphaFold2 standing out as a landmark achievement. These computationally modeled protein structures not only provide atomic coordinates but also include self‐confidence metrics to assess the relative quality of the modeling, either for individual residues or the entire protein. However, these self‐confidence scores are not always reliable; for instance, poorly modeled regions of a protein may sometimes be assigned high confidence. To address this limitation, we introduce Equivariant Quality Assessment Folding (EQAFold), an enhanced framework that refines the Local Distance Difference Test prediction head of AlphaFold to generate more accurate self‐confidence scores. Our results demonstrate that EQAFold outperforms the standard AlphaFold architecture and recent model quality assessment protocols in providing more reliable confidence metrics. Source code for EQAFold is available athttps://github.com/kiharalab/EQAFold_public. 
    more » « less
  4. Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively. 
    more » « less
  5. Abstract SNAPSHOT USA is a multicontributor, long‐term camera trap survey designed to survey mammals across the United States. Participants are recruited through community networks and directly through a website application (https://www.snapshot-usa.org/). The growing Snapshot dataset is useful, for example, for tracking wildlife population responses to land use, land cover, and climate changes across spatial and temporal scales. Here we present the SNAPSHOT USA 2021 dataset, the third national camera trap survey across the US. Data were collected across 109 camera trap arrays and included 1711 camera sites. The total effort equaled 71,519 camera trap nights and resulted in 172,507 sequences of animal observations. Sampling effort varied among camera trap arrays, with a minimum of 126 camera trap nights, a maximum of 3355 nights, a median 546 nights, and a mean 656 ± 431 nights. This third dataset comprises 51 camera trap arrays that were surveyed during 2019, 2020, and 2021, along with 71 camera trap arrays that were surveyed in 2020 and 2021. All raw data and accompanying metadata are stored on Wildlife Insights (https://www.wildlifeinsights.org/), and are publicly available upon acceptance of the data papers. SNAPSHOT USA aims to sample multiple ecoregions in the United States with adequate representation of each ecoregion according to its relative size. Currently, the relative density of camera trap arrays varies by an order of magnitude for the various ecoregions (0.22–5.9 arrays per 100,000 km2), emphasizing the need to increase sampling effort by further recruiting and retaining contributors. There are no copyright restrictions on these data. We request that authors cite this paper when using these data, or a subset of these data, for publication. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government. 
    more » « less