skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Unraveling the role of physicochemical differences in predicting protein–protein interactions
The ability to accurately predict protein–protein interactions is critically important for understanding major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein–protein interactions using only primary sequence information. It utilizes the concept of physicochemical similarity to determine which interactions will most likely occur. In our approach, the physicochemical features of proteins are extracted using bioinformatics tools for different organisms. Then they are utilized in a machine-learning method to identify successful protein–protein interactions via correlation analysis. It was found that the most important property that correlates most with the protein–protein interactions for all studied organisms is dipeptide amino acid composition (the frequency of specific amino acid pairs in a protein sequence). While current approaches often overlook the specificity of protein–protein interactions with different organisms, our method yields context-specific features that determine protein–protein interactions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators, as well as to the barnase–barstar complex, demonstrating the method’s versatility across different biological systems. Our approach can be applied to predict protein–protein interactions in any biological system, providing an important tool for investigating complex biological processes’ mechanisms.  more » « less
Award ID(s):
2246878
PAR ID:
10596399
Author(s) / Creator(s):
; ;
Publisher / Repository:
AIP
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
161
Issue:
4
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Some proteins, including yeast translation termination factor Sup35 (eRF3) are capable of both stress-induced liquid-liquid phase separation (LLPS) and formation of solid fibrous aggregates (amyloids). Fragmentation and propagation of amyloid fibrils generates transmissible (in yeast, heritable) self-perpetuating protein agents, termed prions. Relationships between these processes are still poorly understood. Previous literature data suggested that the ability of Sup35 orthologs to form a prion is sporadically distributed in fungal evolution, and depends on amino acid composition of Sup35 prion domain (PrD), rather than on a evolutionarily variable specific sequence. We have studied two groups of proteins: 1) fungal Sup35 PrDs of various evolutionary origins, and 2) artificially synthesized “scrambled” variants of Saccharomyces cerevisiae Sup35 PrD, that possess identical amino acid composition but different sequences. These proteins were fused to fluorophores and expressed in S. cerevisiae cells. LLPS and amyloid/prion formation were assessed by fluorescence microscopy and biochemical approaches. Amino acid sequences were analyzed by various computational algorithms. Our data indicates that propagation of prion state strongly depends on the evolutionary distance from the host. In contrast, majority of proteins studied are capable of both LLPS and ability to form amyloid fibrils. These capabilities are associated with specific patterns of PrD amino acid distribution, that are broadly conserved among fungi. Notably, PrDs of different sequences differ from each other by their ability to convert from liquid condensates to amyloids, and relationship between these processes is apparently optimized in evolution. Moreover, heterotypic PrDs are can colocalize with each other within liquid condensates and influence amyloid conversion by each other. To conclude, LLPS and amyloid properties depend on specific evolutionarily conserved sequence patterns, indicating possible important biological roles for these processes. These patterns could potentially be used to predict LLPS and prion potential in other sequence contexts. This work was supported by NSF grant 2345660. 
    more » « less
  2. Introduction: Some proteins, including yeast prion protein Sup35 (eRF3) are capable of both stress-induced liquid-liquid phase separation (LLPS) and formation of prion state, propagated via solid fibrous aggregates (amyloids). Relationships between these processes are still poorly understood. Previous literature data suggested that prion formation by Sup35 is sporadically distributed in fungal evolution and depends on amino acid composition of its prion domain (PrD), rather than on a specific sequence which is highly variable. Objectives: Identify sequence patterns that control LLPS and amyloid formation by Sup35 PrD, and trace their conservation in fungal evolution. Methods: Fungal Sup35 PrDs of various evolutionary origins, as well as artificially synthesized “scrambled” variants of Saccharomyces cerevisiae Sup35 PrD, having identical amino acid composition but different sequences, were fused to fluorophores and expressed in S. cerevisiae cells. LLPS and amyloid/prion formation were assessed by fluorescence microscopy and biochemical approaches. Amino acid sequences were analyzed by various computational algorithms. Results/Discussion: While propagation of prion state depends on evolutionary distance from the host, both LLPS and ability to form an amyloid are associated with specific patterns of PrD amino acid distribution, that are broadly conserved among fungi. PrDs of different origins are capable of colocalizing within liquid condensates and influencing amyloid conversion by each other. Conclusion: LLPS and amyloid properties depend on specific evolutionarily conserved sequence patterns, indicating possible important biological roles for these processes. These patterns could potentially be used to predict LLPS and prion potential in other sequence contexts. Funding: NSF grant 2345660 
    more » « less
  3. We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 promises to reveal a high-quality native structure for possibly many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem. 
    more » « less
  4. Abstract MotivationMost proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. ResultsWe present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementationWeb-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. null (Ed.)
    Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/. 
    more » « less