skip to main content


Title: Impact of AlphaFold on structure prediction of protein complexes: The CASP15‐CAPRI experiment
Abstract

We present the results for CAPRI Round 54, the 5th joint CASP‐CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo‐trimers, 13 heterodimers including 3 antibody–antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High‐quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2‐Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2‐Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

 
more » « less
Award ID(s):
1943008
PAR ID:
10510416
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Proteins: Structure, Function, and Bioinformatics
Volume:
91
Issue:
12
ISSN:
0887-3585
Page Range / eLocation ID:
1658-1683
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We report the results of the “UM‐TBM” and “Zheng” groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D‐I‐TASSER and DMFold‐Multimer algorithms, respectively. For monomer structure prediction, D‐I‐TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi‐source MSA searching and a structural modeling‐based MSA ranker; (ii) attention‐network based spatial restraints; (iii) a multi‐domain module containing domain partition and arrangement for domain‐level templates and spatial restraints; (iv) an optimized I‐TASSER‐based folding simulation system for full‐length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge‐based potentials. For 47 free modeling targets in CASP15, the final models predicted by D‐I‐TASSER showed average TM‐score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo‐based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end‐to‐end deep learning methods alone. For protein complex structure prediction, DMFold‐Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end‐to‐end modeling module from AlphaFold2‐Multimer. For the 38 complex targets, DMFold‐Multimer generated models with an average TM‐score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.

     
    more » « less
  2. Abstract

    Critical Assessment of PRediction of Interactions (CAPRI) rounds 37 through 45 introduced larger complexes, new macromolecules, and multistage assemblies. For these rounds, we used and expanded docking methods in Rosetta to model 23 target complexes. We successfully predicted 14 target complexes and recognized and refined near‐native models generated by other groups for two further targets. Notably, for targets T110 and T136, we achieved the closest prediction of any CAPRI participant. We created several innovative approaches during these rounds. Since round 39 (target 122), we have used the new RosettaDock 4.0, which has a revamped coarse‐grained energy function and the ability to perform conformer selection during docking with hundreds of pregenerated protein backbones. Ten of the complexes had some degree of symmetry in their interactions, so we tested Rosetta SymDock, realized its shortcomings, and developed the next‐generation symmetric docking protocol, SymDock2, which includes docking of multiple backbones and induced‐fit refinement. Since the last CAPRI assessment, we also developed methods for modeling and designing carbohydrates in Rosetta, and we used them to successfully model oligosaccharide‐protein complexes in round 41. Although the results were broadly encouraging, they also highlighted the pressing need to invest in (a) flexible docking algorithms with the ability to model loop and linker motions and in (b) new sampling and scoring methods for oligosaccharide‐protein interactions.

     
    more » « less
  3. Abstract

    Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS‐CoV‐2 genome. Forty‐seven research groups submitted over 3000 three‐dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure‐based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).

     
    more » « less
  4. Abstract

    CAPRI challenges offer a variety of blind tests for protein‐protein interaction prediction. In CAPRI Rounds 38‐45, we generated a set of putative binding modes for each target with an FFT‐based docking algorithm, and then scored and ranked these binding modes with a proprietary scoring function, ITScorePP. We have also developed a novel web server, Rebipp. The algorithm utilizes information retrieval to identify relevant biological information to significantly reduce the search space for a particular protein. In parallel, we have also constructed a GPU‐based docking server, MDockPP, for protein‐protein complex structure prediction. Here, the performance of our protocol in CAPRI rounds 38‐45 is reported, which include 16 docking and scoring targets. Among them, three targets contain multiple interfaces: Targets 124, 125, and 136 have 2, 4, and 3 interfaces, respectively. In the predictor experiments, we predicted correct binding modes for nine targets, including one high‐accuracy interface, six medium‐accuracy binding modes, and six acceptable‐accuracy binding modes. For the docking server prediction experiments, we predicted correct binding modes for eight targets, including one high‐accuracy, three medium‐accuracy, and five acceptable‐accuracy binding modes.

     
    more » « less
  5. Abstract

    Small angle X‐ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS‐assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography‐SAXS (SEC‐SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS‐assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.

     
    more » « less