Abstract MotivationProteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. ResultsIn this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. Availability and implementationThe source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
more »
« less
This content will become publicly available on March 1, 2026
Integrating CORUM and co-fractionation mass spectrometry to create enhanced benchmarks for protein complex predictions
Abstract Co-fractionation mass spectrometry (CFMS) enables the discovery of protein complexes and the systems-level analysis of multimer dynamics that facilitate responses to environmental and developmental conditions. A major challenge in CFMS data analysis, and omics approaches in general, is the development of reliable benchmarks for accurate evaluation of prediction methods. CORUM is commonly used as a source of benchmark complexes for protein complex composition predictions; however, its assumption of fully assembled subunit pools often conflicts with size exclusion chromatography (SEC) and interaction predictions from CFMS experiments. To address this, we developed an integrative analysis method that leverages cross-kingdom evolutionary conservation among specific CORUM complexes and high-resolution SEC profile data from cell extracts. The resulting benchmark complexes are supported by statistical significance and consistent sizes between calculated and measured apparent masses. The approach was robust, revealing both conserved and species-specific complexes. Designed specifically for benchmark identification, this method can be applied to any species and used to evaluate protein complex predictions from other studies.
more »
« less
- Award ID(s):
- 1951819
- PAR ID:
- 10586621
- Publisher / Repository:
- Oxford
- Date Published:
- Journal Name:
- Briefings in Bioinformatics
- Volume:
- 26
- Issue:
- 2
- ISSN:
- 1467-5463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract SummaryComputational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. Availability and implementationhttps://topsyturvy.csail.mit.edu. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract MotivationDeep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue–residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue–residue contacts in homodimers from residue–residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue–residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. ResultsTested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10% and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers. Availability and implementationThe source code of DRCon is available at https://github.com/jianlin-cheng/DRCon. The datasets are available at https://zenodo.org/record/5998532#.YgF70vXMKsB. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Kelso, Janet (Ed.)Abstract MotivationNative top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. ResultsWe have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a “one-stop shop” for characterizing both native protein complexes and proteoforms. Availability and implementationThe MASH Native app, video tutorials, written tutorials, and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHSoftware.php. All data files shown in user tutorials are included with the MASH Native software in the download .zip file.more » « less
-
RationaleTandem‐ion mobility spectrometry/mass spectrometry methods have recently gained traction for the structural characterization of proteins and protein complexes. However, ion activation techniques currently coupled with tandem‐ion mobility spectrometry/mass spectrometry methods are limited in their ability to characterize structures of proteins and protein complexes. MethodsHere, we describe the coupling of the separation capabilities of tandem‐trapped ion mobility spectrometry/mass spectrometry (tTIMS/MS) with the dissociation capabilities of ultraviolet photodissociation (UVPD) for protein structure analysis. ResultsWe establish the feasibility of dissociating intact proteins by UV irradiation at 213 nm between the two TIMS devices in tTIMS/MS and at pressure conditions compatible with ion mobility spectrometry (2–3 mbar). We validate that the fragments produced by UVPD under these conditions result from a radical‐based mechanism in accordance with prior literature on UVPD. The data suggest stabilization of fragment ions produced from UVPD by collisional cooling due to the elevated pressures used here (“UVnoD2”), which otherwise do not survive to detection. The data account for a sequence coverage for the protein ubiquitin comparable to recent reports, demonstrating the analytical utility of our instrument in mobility‐separating fragment ions produced from UVPD. ConclusionsThe data demonstrate that UVPD carried out at elevated pressures of 2–3 mbar yields extensive fragment ions rich in information about the protein and that their exhaustive analysis requires IMS separation post‐UVPD. Therefore, because UVPD and tTIMS/MS each have been shown to be valuable techniques on their own merit in proteomics, our contribution here underscores the potential of combining tTIMS/MS with UVPD for structural proteomics.more » « less