skip to main content

Title: Computed structures of core eukaryotic protein complexes
Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take advantage of advances in proteome-wide amino acid coevolution analysis and deep-learning–based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of yeast proteins, identify 1505 likely to interact, and build structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; « less
Award ID(s):
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning–based approach named Graph Neural Network–based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models.
  2. Abstract

    Eukaryotic microalgae play critical roles in the structure and function of marine food webs. The contribution of microalgae to food webs can be tracked using compound‐specific isotope analysis of amino acids (CSIA‐AA). Previous CSIA‐AA studies have defined eukaryotic microalgae as a single functional group in food web mixing models, despite their vast taxonomic and ecological diversity. Using controlled cultures, this work characterizes the amino acidδ13C (δ13CAA) fingerprints—a multivariate metric of amino acid carbon isotope values—of four major groups of eukaryotic microalgae: diatoms, dinoflagellates, raphidophytes, and prasinophytes. We found excellent separation of essential amino acidδ13C (δ13CEAA) fingerprints among four microalgal groups (mean posterior probability reclassification of 99.2 ± 2.9%). We also quantified temperature effects, a primary driver of microalgal bulk carbon isotope variability, on the fidelity ofδ13CAAfingerprints. A 10°C range in temperature conditions did not have significant impacts on variance inδ13CAAvalues or the diagnostic microalgalδ13CEAAfingerprints. Theseδ13CEAAfingerprints were used to identify primary producers at the base of food webs supporting consumers in two contrasting systems: (1) penguins feeding in a diatom‐based food web and (2) mixotrophic corals receiving amino acids directly from autotrophic endosymbiotic dinoflagellates and indirectly from water column diatoms, prasinophytes, and cyanobacteria, likely via heterotrophic feeding on zooplankton. The increased taxonomicmore »specificity of CSIA‐AA fingerprints developed here will greatly improve future efforts to reconstruct the contribution of diverse eukaryotic microalgae to the sources and cycling of organic matter in food web dynamics and biogeochemical cycling studies.

    « less
  3. Abstract

    To identify protein–protein interactions and phosphorylated amino acid sites in eukaryotic mRNA translation, replicate TAP‐MudPIT and control experiments are performed targetingSaccharomyces cerevisiaegenes previously implicated in eukaryotic mRNA translation by their genetic and/or functional roles in translation initiation, elongation, termination, or interactions with ribosomal complexes. Replicate tandem affinity purifications of each targeted yeast TAP‐tagged mRNA translation protein coupled with multidimensional liquid chromatography and tandem mass spectrometry analysis are used to identify and quantify copurifying proteins. To improve sensitivity and minimize spurious, nonspecific interactions, a novel cross‐validation approach is employed to identify the most statistically significant protein–protein interactions. Using experimental and computational strategies discussed herein, the previously described protein composition of the canonical eukaryotic mRNA translation initiation, elongation, and termination complexes is calculated. In addition, statistically significant unpublished protein interactions and phosphorylation sites forS. cerevisiae’s mRNA translation proteins and complexes are identified.

  4. Abstract Background Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. Results In this study, we propose a multi-component Quantitative Structure–Mutation–Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ( $$\hbox {IC}_{50}$$ IC 50 ) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein–protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed $$\hbox {IC}_{50}$$ IC 50 values in cell-based assays. Conclusions By integrating multi-omics data in the QSMARTmore »model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.« less
  5. Abstract

    RNA‐protein interactions play essential roles in regulating gene expression. While some RNA‐protein interactions are “specific”, that is, the RNA‐binding proteins preferentially bind to particular RNA sequence or structural motifs, others are “non‐RNA specific.” Deciphering the protein‐RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein‐RNA interfaces, there is a need for computational methods to identify RNA‐binding residues in proteins. While most of the existing computational methods for predicting RNA‐binding residues in RNA‐binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner‐specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner‐specific protein‐RNA interface prediction tools, PS‐PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA‐specificity metric (RSM), for quantifying the RNA‐specificity of the RNA binding residues predicted by such tools. Our results show that the RNA‐binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner‐agnostic metrics, RNA partner‐specific methods are outperformedmore »by the state‐of‐the‐art partner‐agnostic methods. We conjecture that either (a) the protein‐RNA complexes in PDB are not representative of the protein‐RNA interactions in nature, or (b) the current methods for partner‐specific prediction of RNA‐binding residues in proteins fail to account for the differences in RNA partner‐specific versus partner‐agnostic protein‐RNA interactions, or both.

    « less