skip to main content


Title: Epistasis Creates Invariant Sites and Modulates the Rate of Molecular Evolution
Abstract Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates.  more » « less
Award ID(s):
1934848
NSF-PAR ID:
10354816
Author(s) / Creator(s):
; ;
Editor(s):
Ozkan, Banu
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
39
Issue:
5
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Sequence‐specific binding by transcription factors (TFs) plays a significant role in the selection and regulation of target genes. At the protein:DNA interface, amino acid side‐chains construct a diverse physicochemical network of specific and non‐specific interactions, and seemingly subtle changes in amino acid identity at certain positions may dramatically impact TF:DNA binding. Variation of these specificity‐determining residues (SDRs) is a major mechanism of functional divergence between TFs with strong structural or sequence homology.

    Methods

    In this study, we employed a combination of high‐throughput specificity profiling by SELEX and Spec‐seq, structural modeling, and evolutionary analysis to probe the binding preferences of winged helix‐turn‐helix TFs belonging to the OmpR sub‐family inEscherichia coli.

    Results

    We found thatE. coliOmpR paralogs recognize tandem, variably spaced repeats composed of “GT‐A” or “GCT”‐containing half‐sites. Some divergent sequence preferences observed within the “GT‐A” mode correlate with amino acid similarity; conversely, “GCT”‐based motifs were observed for a subset of paralogs with low sequence homology. Direct specificity profiling of a subset of OmpR homologues (CpxR, RstA, and OmpR) as well as predicted “SDR‐swap” variants revealed that individual SDRs may impact sequence preferences locally through direct contact with DNA bases or distally via the DNA backbone.

    Conclusions

    Overall, our work provides evidence for a common structural “code” for sequence‐specific wHTH‐DNA interactions, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Further examination of SDR predictions will likely reveal additional mechanisms controlling the evolutionary divergence of this important class of transcriptional regulators.

     
    more » « less
  2. Abstract

    Improvements in the description of amino acid substitution are required to develop better pseudo‐energy‐based protein structure‐aware models for use in phylogenetic studies. These models are used to characterize the probabilities of amino acid substitution and enable better simulation of protein sequences over a phylogeny. A better characterization of amino acid substitution probabilities in turn enables numerous downstream applications, like detecting positive selection, ancestral sequence reconstruction, and evolutionarily‐motivated protein engineering. Many existing Markov models for amino acid substitution in molecular evolution disregard molecular structure and describe the amino acid substitution process over longer evolutionary periods poorly. Here, we present a new model upgraded with a site‐specific parameterization of pseudo‐energy terms in a coarse‐grained force field, which describes local heterogeneity in physical constraints on amino acid substitution better than a previous pseudo‐energy‐based model with minimum cost in runtime. The importance of each weight term parameterization in characterizing underlying features of the site, including contact number, solvent accessibility, and secondary structural elements was evaluated, returning both expected and biologically reasonable relationships between model parameters. This results in the acceptance of proposed amino acid substitutions that more closely resemble those observed site‐specific frequencies in gene family alignments. The modular site‐specific pseudo‐energy function is made available for download through the following website:https://liberles.cst.temple.edu/Software/CASS/index.html.

     
    more » « less
  3. null (Ed.)
    We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed. 
    more » « less
  4. Most aspects of the molecular biology of cells involve tightly coordinated intermolecular interactions requiring specific recognition at the nucleotide and/or amino acid levels. This has led to long-standing interest in the degree to which constraints on interacting molecules result in conserved vs. accelerated rates of sequence evolution, with arguments commonly being made that molecular coevolution can proceed at rates exceeding the neutral expectation. Here, a fairly general model is introduced to evaluate the degree to which the rate of evolution at functionally interacting sites is influenced by effective population sizes ( N e ), mutation rates, strength of selection, and the magnitude of recombination between sites. This theory is of particular relevance to matters associated with interactions between organelle- and nuclear-encoded proteins, as the two genomic environments often exhibit dramatic differences in the power of mutation and drift. Although genes within low N e environments can drive the rate of evolution of partner genes experiencing higher N e , rates exceeding the neutral expectation require that the former also have an elevated mutation rate. Testable predictions, some counterintuitive, are presented on how patterns of coevolutionary rates should depend on the relative intensities of drift, selection, and mutation. 
    more » « less
  5. Abstract We explore sequence determinants of enzyme activity and specificity in a major enzyme family of terpene synthases. Most enzymes in this family catalyze reactions that produce cyclic terpenes—complex hydrocarbons widely used by plants and insects in diverse biological processes such as defense, communication, and symbiosis. To analyze the molecular mechanisms of emergence of terpene cyclization, we have carried out in-depth examination of mutational space around (E)-β-farnesene synthase, an Artemisia annua enzyme which catalyzes production of a linear hydrocarbon chain. Each mutant enzyme in our synthetic libraries was characterized biochemically, and the resulting reaction rate data were used as input to the Michaelis–Menten model of enzyme kinetics, in which free energies were represented as sums of one-amino-acid contributions and two-amino-acid couplings. Our model predicts measured reaction rates with high accuracy and yields free energy landscapes characterized by relatively few coupling terms. As a result, the Michaelis–Menten free energy landscapes have simple, interpretable structure and exhibit little epistasis. We have also developed biophysical fitness models based on the assumption that highly fit enzymes have evolved to maximize the output of correct products, such as cyclic products or a specific product of interest, while minimizing the output of byproducts. This approach results in nonlinear fitness landscapes that are considerably more epistatic. Overall, our experimental and computational framework provides focused characterization of evolutionary emergence of novel enzymatic functions in the context of microevolutionary exploration of sequence space around naturally occurring enzymes. 
    more » « less