skip to main content


Title: Extracting phylogenetic dimensions of coevolution reveals hidden functional signals
Abstract

Despite the structural and functional information contained in the statistical coupling between pairs of residues in a protein, coevolution associated with function is often obscured by artifactual signals such as genetic drift, which shapes a protein’s phylogenetic history and gives rise to concurrent variation between protein sequences that is not driven by selection for function. Here, we introduce a background model for phylogenetic contributions of statistical coupling that separates the coevolution signal due to inter-clade and intra-clade sequence comparisons and demonstrate that coevolution can be measured on multiple phylogenetic timescales within a single protein. Our method, nested coevolution (NC), can be applied as an extension to any coevolution metric. We use NC to demonstrate that poorly conserved residues can nonetheless have important roles in protein function. Moreover, NC improved the structural-contact predictions of several coevolution-based methods, particularly in subsampled alignments with fewer sequences. NC also lowered the noise in detecting functional sectors of collectively coevolving residues. Sectors of coevolving residues identified after application of NC were more spatially compact and phylogenetically distinct from the rest of the protein, and strongly enriched for mutations that disrupt protein activity. Thus, our conceptualization of the phylogenetic separation of coevolution provides the potential to further elucidate relationships among protein evolution, function, and genetic diseases.

 
more » « less
NSF-PAR ID:
10361796
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
12
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ribosome translates the genetic code into proteins in all domains of life. Its size and complexity demand long-range interactions that regulate ribosome function. These interactions are largely unknown. Here, we apply a global coevolution method, statistical coupling analysis (SCA), to identify coevolving residue networks (sectors) within the 23S ribosomal RNA (rRNA) of the large ribosomal subunit. As in proteins, SCA reveals a hierarchical organization of evolutionary constraints with near-independent groups of nucleotides forming physically contiguous networks within the three-dimensional structure. Using a quantitative, continuous-culture-with-deep-sequencing assay, we confirm that the top two SCA-predicted sectors contribute to ribosome function. These sectors map to distinct ribosome activities, and their origins trace to phylogenetic divergences across all domains of life. These findings provide a foundation to map ribosome allostery, explore ribosome biogenesis, and engineer ribosomes for new functions. Despite differences in chemical structure, protein and RNA enzymes appear to share a common internal logic of interaction and assembly. 
    more » « less
  2. Abstract The dual specificity phosphatase (DUSP) family has catalytically inactive members, called pseudophosphatases. They have mutations in their catalytic motifs that render them enzymatically inactive. This study analyzes the significance of two pseudophosphatases, MK-STYX [MAPK (mitogen-activated protein kinase phosphoserine/threonine/tyrosine-binding protein]) and STYX (serine/threonine/tyrosine-interacting protein), throughout their evolution and provides measurements and comparison of their evolutionary conservation. Phylogenetic trees were constructed to show any deviation from various species evolutionary paths. Data was collected on a large set of proteins that have either one of the two domains of MK-STYX, the DUSP domain or the cdc-25 homology (CH2) /rhodanese-like domain. The distance between species pairs for MK-STYX or STYX and Ka/Ks ratio were calculated. In addition, both pseudophosphatases were ranked among a large set of related proteins, including the active homologs of MK-STYX, MKP (MAPK phosphatase)-1 and MKP-3. MK-STYX had one of the highest species-species protein distances and was under weaker purifying selection pressure than most proteins with its domains. In contrast, the protein distances of STYX were lower than 82% of the DUSP-containing proteins and was under one of the strongest purifying selection pressures. However, there was similar selection pressure on the N-terminal sequences of MK-STYX, STYX, MKP-1, and MKP-3. We next perform statistical coupling analysis, a process that reveals interconnected regions within the proteins. We find that while MKP-1,-3, and STYX all have 2 functional units (sectors), MK-STYX only has one, and that MK-STYX is similar to MKP-3 in the evolutionary coupling of the active site and KIM domain. Within those two domains, the mean coupling is also most similar for MK-STYX and MKP-3. This study reveals striking distinctions between the evolutionary patterns of MK-STYX and STYX, suggesting a very specific role for each pseudophosphatase, further highlighting the relevance of these atypical members of DUSP as signaling regulators. Therefore, our study provides computational evidence and evolutionary reasons to further explore the properties of pseudophosphatases, in particular MK-STYX and STYX. 
    more » « less
  3. Abstract

    Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in theE. colimembrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex.

     
    more » « less
  4. Abstract

    Proteins gain optimal fitness such as foldability and function through evolutionary selection. However, classical studies have found that evolutionarily designed protein sequences alone cannot guarantee foldability, or at least not without considering local contacts associated with the initial folding steps. We previously showed that foldability and function can be restored by removing frustration in the folding energy landscape of a model WW domain protein, CC16, which was designed based on Statistical Coupling Analysis (SCA). Substitutions ensuring the formation of five local contacts identified as “on‐path” were selected using the closest homolog native folded sequence, N21. Surprisingly, the resulting sequence, CC16‐N21, bound to Group I peptides, while N21 did not. Here, we identified single‐point mutations that enable N21 to bind a Group I peptide ligand through structure and dynamic‐based computational design. Comparison of the docked position of the CC16‐N21/ligand complex with the N21 structure showed that residues at positions 9 and 19 are important for peptide binding, whereas the dynamic profiles identified position 10 as allosterically coupled to the binding site and exhibiting different dynamics between N21 and CC16‐N21. We found that swapping these positions in N21 with matched residues from CC16‐N21 recovers nature‐like binding affinity to N21. This study validates the use of dynamic profiles as guiding principles for affecting the binding affinity of small proteins.

     
    more » « less
  5. dos Reis, Mario (Ed.)
    Abstract Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions. 
    more » « less