skip to main content


This content will become publicly available on May 9, 2024

Title: Evolution and diversification of the ACT-like domain associated with plant basic helix–loop–helix transcription factors
Basic helix–loop–helix (bHLH) proteins are one of the largest families of transcription factor (TF) in eukaryotes, and ~30% of all flowering plants’ bHLH TFs contain the aspartate kinase, chorismate mutase, and TyrA (ACT)-like domain at variable distances C-terminal from the bHLH. However, the evolutionary history and functional consequences of the bHLH/ACT-like domain association remain unknown. Here, we show that this domain association is unique to the plantae kingdom with green algae (chlorophytes) harboring a small number of bHLH genes with variable frequency of ACT-like domain’s presence. bHLH-associated ACT-like domains form a monophyletic group, indicating a common origin. Indeed, phylogenetic analysis results suggest that the association of ACT-like and bHLH domains occurred early in Plantae by recruitment of an ACT-like domain in a common ancestor with widely distributed ACT DOMAIN REPEAT ( ACR ) genes by an ancestral bHLH gene. We determined the functional significance of this association by showing that Chlamydomonas reinhardtii ACT-like domains mediate homodimer formation and negatively affect DNA binding of the associated bHLH domains. We show that, while ACT-like domains have experienced faster selection than the associated bHLH domain, their rates of evolution are strongly and positively correlated, suggesting that the evolution of the ACT-like domains was constrained by the bHLH domains. This study proposes an evolutionary trajectory for the association of ACT-like and bHLH domains with the experimental characterization of the functional consequence in the regulation of plant-specific processes, highlighting the impacts of functional domain coevolution.  more » « less
Award ID(s):
2210431 2218206 2107215
NSF-PAR ID:
10432967
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
120
Issue:
19
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Sequence‐specific binding by transcription factors (TFs) plays a significant role in the selection and regulation of target genes. At the protein:DNA interface, amino acid side‐chains construct a diverse physicochemical network of specific and non‐specific interactions, and seemingly subtle changes in amino acid identity at certain positions may dramatically impact TF:DNA binding. Variation of these specificity‐determining residues (SDRs) is a major mechanism of functional divergence between TFs with strong structural or sequence homology.

    Methods

    In this study, we employed a combination of high‐throughput specificity profiling by SELEX and Spec‐seq, structural modeling, and evolutionary analysis to probe the binding preferences of winged helix‐turn‐helix TFs belonging to the OmpR sub‐family inEscherichia coli.

    Results

    We found thatE. coliOmpR paralogs recognize tandem, variably spaced repeats composed of “GT‐A” or “GCT”‐containing half‐sites. Some divergent sequence preferences observed within the “GT‐A” mode correlate with amino acid similarity; conversely, “GCT”‐based motifs were observed for a subset of paralogs with low sequence homology. Direct specificity profiling of a subset of OmpR homologues (CpxR, RstA, and OmpR) as well as predicted “SDR‐swap” variants revealed that individual SDRs may impact sequence preferences locally through direct contact with DNA bases or distally via the DNA backbone.

    Conclusions

    Overall, our work provides evidence for a common structural “code” for sequence‐specific wHTH‐DNA interactions, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Further examination of SDR predictions will likely reveal additional mechanisms controlling the evolutionary divergence of this important class of transcriptional regulators.

     
    more » « less
  2. INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability. 
    more » « less
  3. null (Ed.)
    Chloroviruses are large, plaque-forming, dsDNA viruses that infect chlorella-like green algae that live in a symbiotic relationship with protists. Chloroviruses have genomes from 290 to 370 kb, and they encode as many as 400 proteins. One interesting feature of chloroviruses is that they encode a potassium ion (K+) channel protein named Kcv. The Kcv protein encoded by SAG chlorovirus ATCV-1 is one of the smallest known functional K+ channel proteins consisting of 82 amino acids. The KcvATCV-1 protein has similarities to the family of two transmembrane domain K+ channel proteins; it consists of two transmembrane α-helixes with a pore region in the middle, making it an ideal model for studying K+ channels. To assess their genetic diversity, kcv genes were sequenced from 103 geographically distinct SAG chlorovirus isolates. Of the 103 kcv genes, there were 42 unique DNA sequences that translated into 26 new Kcv channels. The new predicted Kcv proteins differed from KcvATCV-1 by 1 to 55 amino acids. The most conserved region of the Kcv protein was the filter, the turret and the pore helix were fairly well conserved, and the outer and the inner transmembrane domains of the protein were the most variable. Two of the new predicted channels were shown to be functional K+ channels. 
    more » « less
  4. Sara Osman Carolina Perdigoto (Ed.)
    Gene expression in all eukaryotes depends critically on the function of transcriptional activation domains of gene activator proteins. The conventional model for activation domain (AD) function is the direct physical recruitment of specific coactivators and transcriptional machinery components. However, ADs are short and astronomically variable sequences, with up to 10^24 possible interchangeable sequence variants for a single gene activator; each variant is intrinsically disordered in structure and interacts with its targets with low specificity and affinity. How these peptides recruit their targets is becoming increasingly difficult to explain, exposing a massive knowledge gap in molecular biology. Here, we show that the single required characteristic of ADs—consistent with their extreme variability, intrinsic structural disorder, and near-stochastic interaction mode—is an amphiphilic aromatic–acidic surfactant-like property. We propose that the AD surfactant, by triggering the local gene-promoter chromatin phase transition, catalyzes the formation of “transcription factory” condensates. We demonstrate that the presence of tryptophan and aspartic acid residues in the AD sequence is sufficient for in vivo functionality, even when present only as a single pair of residues within a 20-amino-acid sequence containing nothing more than additional 18 glycine residues. We demonstrate that the amphipathic α-helix structure, suggested previously as beneficial for AD function, is actually detrimental, and breaking this helix by inserting prolines significantly increases activation domain functionality. The proposed surfactant action mechanism based on near-stochastic interactions implied by the minimalistic activation domains changes not only the paradigm for the explanation of gene activation but also the fundamental biochemistry paradigm based on the specificity of sequence-to-structure-to-functional-interaction. The mechanism of activity regulation by near-stochastic allosteric interactions could easily be applied to other biological processes. 
    more » « less
  5. Vertically transmitted (VT) microbial symbionts play a vital role in the evolution of their insect hosts. A longstanding question in symbiont research is what genes help promote long-term stability of vertically transmitted lifestyles. Symbiont success in insect hosts is due in part to expression of beneficial or manipulative phenotypes that favor symbiont persistence in host populations. In Spiroplasma, these phenotypes have been linked to toxin and virulence domains among a few related strains. However, these domains also appear frequently in phylogenetically distant Spiroplasma, and little is known about their distribution across the Spiroplasma genus. In this study, we present the complete genome sequence of the Spiroplasma symbiont of Drosophila atripex, a non-manipulating member of the Ixodetis clade of Spiroplasma, for which genomic data are still limited. We perform a genus-wide comparative analysis of toxin domains implicated in defensive and reproductive phenotypes. From 12 VT and 31 non-VT Spiroplasma genomes, ribosome-inactivating proteins (RIPs), OTU-like cysteine proteases (OTUs), ankyrins, and ETX/MTX2 domains show high propensity for VT Spiroplasma compared to non-VT Spiroplasma. Specifically, OTU and ankyrin domains can be found only in VT Spiroplasma, and RIP domains are found in all VT Spiroplasma and three non-VT Spiroplasma. These domains are frequently associated with Spiroplasma plasmids, suggesting a possible mechanism for dispersal and maintenance among heritable strains. Searching insect genome assemblies available on public databases uncovered uncharacterized Spiroplasma genomes from which we identified several spaid-like genes encoding RIP, OTU, and ankyrin domains, suggesting functional interactions among those domain types. Our results suggest a conserved core of symbiont domains play an important role in the evolution and persistence of VT Spiroplasma in insects. 
    more » « less