skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Complex portal 2025: predicted human complexes and enhanced visualisation tools for the comparison of orthologous and paralogous complexes
Abstract The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated reference database for molecular complexes. It is a unifying web resource linking aggregated data on composition, topology and the function of macromolecular complexes from 28 species. In addition to significantly extending the number of manually curated complexes, we have massively extended the coverage of the human complexome through the incorporation of high confidence assemblies predicted by machine-learning algorithms trained on large-scale experimental data. The current content of the portal comprising 2150 human complexes has been augmented by 14 964 machine-learning (ML) predicted complexes from hu.MAP3.0. We have refactored the website to enable easy search and filtering of these different classes of protein complexes and have implemented the Complex Navigator, a visualisation tool to facilitate comparison of related complexes in the context of orthology or paralogy. We have embedded the Rhea reaction visualisation tool into the website to enable users to view the catalytic activity of enzyme complexes.  more » « less
Award ID(s):
2314278
PAR ID:
10555807
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
53
Issue:
D1
ISSN:
0305-1048
Format(s):
Medium: X Size: p. D644-D650
Size(s):
p. D644-D650
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Macromolecular protein complexes carry out most functions in the cell including essential functions required for cell survival. Unfortunately, we lack the subunit composition for all human protein complexes. To address this gap we integrated >25,000 mass spectrometry experiments using a machine learning approach to identify > 15,000 human protein complexes. We show our map of protein complexes is highly accurate and more comprehensive than previous maps, placing ∼75% of human proteins into their physical contexts. We globally characterize our complexes using protein co-variation data (ProteomeHD.2) and identify co-varying complexes suggesting common functional associations. Our map also generates testable functional hypotheses for 472 uncharacterized proteins which we support using AlphaFold modeling. Additionally, we use AlphaFold modeling to identify 511 mutually exclusive protein pairs in hu.MAP3.0 complexes suggesting complexes serve different functional roles depending on their subunit composition. We identify expression as the primary way cells and organisms relieve the conflict of mutually exclusive subunits. Finally, we import our complexes to EMBL-EBI’s Complex Portal (https://www.ebi.ac.uk/complexportal/home) as well as provide complexes through our hu.MAP3.0 web interface (https://humap3.proteincomplexes.org/). We expect our resource to be highly impactful to the broader research community. 
    more » « less
  2. Liver fatty acid binding protein 1 (FABP1) binds diverse endogenous lipids and is highly expressed in the human liver. Binding to FABP1 alters the metabolism and homeostasis of endogenous lipids in the liver. Drugs have also been shown to bind to rat FABP1, but limited data are available for human FABP1 (hFABP1). FABP1 has a large binding pocket, and up to two fatty acids can bind to FABP1 simultaneously. We hypothesized that drug binding to hFABP1 results in formation of ternary complexes and that FABP1 binding alters drug metabolism. To test these hypotheses, native protein mass spectrometry (MS) and fluorescent 11-(dansylamino)undecanoic acid (DAUDA) displacement assays were used to characterize drug binding to hFABP1, and diclofenac oxidation by cytochrome P450 2C9 (CYP2C9) was studied in the presence and absence of hFABP1. DAUDA binding to hFABP1 involved high (Kd,1 = 0.2 μM) and low (Kd,2 > 10 μM) affinity binding sites. Nine drugs bound to hFABP1 with equilibrium dissociation constant (Kd) values ranging from 1 to 20 μM. None of the tested drugs completely displaced DAUDA from hFABP1, and fluorescence spectra showed evidence of ternary complex formation. Formation of DAUDA-hFABP1-diclofenac ternary complex was verified with native MS. Docking predicted diclofenac binding in the portal region of FABP1 with DAUDA in the binding cavity. The catalytic rate constant of diclofenac hydroxylation by CYP2C9 was decreased by ∼50% (P < 0.01) in the presence of FABP1. Together, these results suggest that drugs form ternary complexes with hFABP1 and that hFABP1 binding in the liver will alter drug metabolism and clearance. 
    more » « less
  3. Abstract As machine learning (ML) has matured, it has opened a new frontier in theoretical and computational chemistry by offering the promise of simultaneous paradigm shifts in accuracy and efficiency. Nowhere is this advance more needed, but also more challenging to achieve, than in the discovery of open‐shell transition metal complexes. Here, localizeddorfelectrons exhibit variable bonding that is challenging to capture even with the most computationally demanding methods. Thus, despite great promise, clear obstacles remain in constructing ML models that can supplement or even replace explicit electronic structure calculations. In this article, I outline the recent advances in building ML models in transition metal chemistry, including the ability to approach sub‐kcal/mol accuracy on a range of properties with tailored representations, to discover and enumerate complexes in large chemical spaces, and to reveal opportunities for design through analysis of feature importance. I discuss unique considerations that have been essential to enabling ML in open‐shell transition metal chemistry, including (a) the relationship of data set size/diversity, model complexity, and representation choice, (b) the importance of quantitative assessments of both theory and model domain of applicability, and (c) the need to enable autonomous generation of reliable, large data sets both for ML model training and in active learning or discovery contexts. Finally, I summarize the next steps toward making ML a mainstream tool in the accelerated discovery of transition metal complexes. This article is categorized under: Electronic Structure Theory > Density Functional Theory Software > Molecular Modeling Computer and Information Science > Chemoinformatics 
    more » « less
  4. Abstract Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins. 
    more » « less
  5. Abstract MotivationProteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. ResultsIn this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. Availability and implementationThe source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA. 
    more » « less