skip to main content


Title: Evolution of a Cytoplasmic Determinant: Evidence for the Biochemical Basis of Functional Evolution of the Novel Germ Line Regulator Oskar
Abstracts Germ line specification is essential in sexually reproducing organisms. Despite their critical role, the evolutionary history of the genes that specify animal germ cells is heterogeneous and dynamic. In many insects, the gene oskar is required for the specification of the germ line. However, the germ line role of oskar is thought to be a derived role resulting from co-option from an ancestral somatic role. To address how evolutionary changes in protein sequence could have led to changes in the function of Oskar protein that enabled it to regulate germ line specification, we searched for oskar orthologs in 1,565 publicly available insect genomic and transcriptomic data sets. The earliest-diverging lineage in which we identified an oskar ortholog was the order Zygentoma (silverfish and firebrats), suggesting that oskar originated before the origin of winged insects. We noted some order-specific trends in oskar sequence evolution, including whole gene duplications, clade-specific losses, and rapid divergence. An alignment of all known 379 Oskar sequences revealed new highly conserved residues as candidates that promote dimerization of the LOTUS domain. Moreover, we identified regions of the OSK domain with conserved predicted RNA binding potential. Furthermore, we show that despite a low overall amino acid conservation, the LOTUS domain shows higher conservation of predicted secondary structure than the OSK domain. Finally, we suggest new key amino acids in the LOTUS domain that may be involved in the previously reported Oskar−Vasa physical interaction that is required for its germ line role.  more » « less
Award ID(s):
1764269
NSF-PAR ID:
10328919
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Malik, Harmit
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
12
ISSN:
1537-1719
Page Range / eLocation ID:
5491 to 5513
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  2. Abstract

    Alphaherpesviruses are a subfamily of herpesviruses that include the significant human pathogens herpes simplex viruses (HSV) and varicella zoster virus (VZV). Glycoprotein K (gK), conserved in all alphaherpesviruses, is a multi-membrane spanning virion glycoprotein essential for virus entry into neuronal axons, virion assembly, and pathogenesis. Despite these critical functions, little is known about which gK domains and residues are most important for maintaining these functions across all alphaherpesviruses. Herein, we employed phylogenetic and structural analyses including the use of a novel model for evolutionary rate variation across residues to predict conserved gK functional domains. We found marked heterogeneity in the evolutionary rate at the level of both individual residues and domains, presumably as a result of varying selective constraints. To clarify the potential role of conserved sequence features, we predicted the structures of several gK orthologs. Congruent with our phylogenetic analysis, slowly evolving residues were identified at potentially structurally significant positions across domains. We found that using a quantitative measure of amino acid rate variation combined with molecular modeling we were able to identify amino acids predicted to be critical for gK protein structure/function. This analysis yields targets for the design of anti-herpesvirus therapeutic strategies across all alphaherpesvirus species that would be absent from more traditional analyses of conservation.

     
    more » « less
  3. Ozkan, Banu (Ed.)
    Abstract Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates. 
    more » « less
  4. INTRODUCTION During the independent process of cereal evolution, many trait shifts appear to have been under convergent selection to meet the specific needs of humans. Identification of convergently selected genes across cereals could help to clarify the evolution of crop species and to accelerate breeding programs. In the past several decades, researchers have debated whether convergent phenotypic selection in distinct lineages is driven by conserved molecular changes or by diverse molecular pathways. Two of the most economically important crops, maize and rice, display some conserved phenotypic shifts—including loss of seed dispersal, decreased seed dormancy, and increased grain number during evolution—even though they experienced independent selection. Hence, maize and rice can serve as an excellent system for understanding the extent of convergent selection among cereals. RATIONALE Despite the identification of a few convergently selected genes, our understanding of the extent of molecular convergence on a genome-wide scale between maize and rice is very limited. To learn how often selection acts on orthologous genes, we investigated the functions and molecular evolution of the grain yield quantitative trait locus KRN2 in maize and its rice ortholog OsKRN2 . We also identified convergently selected genes on a genome-wide scale in maize and rice, using two large datasets. RESULTS We identified a selected gene, KRN2 ( kernel row number2 ), that differs between domesticated maize and its wild ancestor, teosinte. This gene underlies a major quantitative trait locus for kernel row number in maize. Selection in the noncoding upstream regions resulted in a reduction of KRN2 expression and an increased grain number through an increase in kernel rows. The rice ortholog, OsKRN2 , also underwent selection and negatively regulates grain number via control of secondary panicle branches. These orthologs encode WD40 proteins and function synergistically with a gene of unknown function, DUF1644, which suggests that a conserved protein interaction controls grain number in maize and rice. Field tests show that knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by ~10% and ~8%, respectively, with no apparent trade-off in other agronomic traits. This suggests potential applications of KRN2 and its orthologs for crop improvement. On a genome-wide scale, we identified a set of 490 orthologous genes that underwent convergent selection during maize and rice evolution, including KRN2/OsKRN2 . We found that the convergently selected orthologous genes appear to be significantly enriched in two specific pathways in both maize and rice: starch and sucrose metabolism, and biosynthesis of cofactors. A deep analysis of convergently selected genes in the starch metabolic pathway indicates that the degree of genetic convergence via convergent selection is related to the conservation and complexity of the gene network for a given selection. CONCLUSION Our findings show that common phenotypic shifts during maize and rice evolution acting on conserved genes are driven at least in part by convergent selection, which in maize and rice likely occurred both during and after domestication. We provide evolutionary and functional evidence on the convergent selection of KRN2/OsKRN2 for grain number between maize and rice. We further found that a complete loss-of-function allele of KRN2/OsKRN2 increased grain yield without an apparent negative impact on other agronomic traits. Exploring the role of KRN2/OsKRN2 and other convergently selected genes across the cereals could provide new opportunities to enhance the production of other global crops. Shared selected orthologous genes in maize and rice for convergent phenotypic shifts during domestication and improvement. By comparing 3163 selected genes in maize and 18,755 selected genes in rice, we identified 490 orthologous gene pairs, including KRN2 and its rice ortholog OsKRN2 , as having been convergently selected. Knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by increasing kernel rows and secondary panicle branches, respectively. 
    more » « less
  5. Storz, Gisela (Ed.)
    ABSTRACT Mutations in regulatory mechanisms that control gene expression contribute to phenotypic diversity and thus facilitate the adaptation of microbes and other organisms to new niches. Comparative genomics can be used to infer rewiring of regulatory architecture based on large effect mutations like loss or acquisition of transcription factors but may be insufficient to identify small changes in noncoding, intergenic DNA sequence of regulatory elements that drive phenotypic divergence. In human-derived Vibrio cholerae , the response to distinct chemical cues triggers production of multiple transcription factors that can regulate the type VI secretion system (T6), a broadly distributed weapon for interbacterial competition. However, to date, the signaling network remains poorly understood because no regulatory element has been identified for the major T6 locus. Here we identify a conserved cis -acting single nucleotide polymorphism (SNP) controlling T6 transcription and activity. Sequence alignment of the T6 regulatory region from diverse V. cholerae strains revealed conservation of the SNP that we rewired to interconvert V. cholerae T6 activity between chitin-inducible and constitutive states. This study supports a model of pathogen evolution through a noncoding cis -regulatory mutation and preexisting, active transcription factors that confers a different fitness advantage to tightly regulated strains inside a human host and unfettered strains adapted to environmental niches. IMPORTANCE Organisms sense external cues with regulatory circuits that trigger the production of transcription factors, which bind specific DNA sequences at promoters (“ cis ” regulatory elements) to activate target genes. Mutations of transcription factors or their regulatory elements create phenotypic diversity, allowing exploitation of new niches. Waterborne pathogen Vibrio cholerae encodes the type VI secretion system “nanoweapon” to kill competitor cells when activated. Despite identification of several transcription factors, no regulatory element has been identified in the promoter of the major type VI locus, to date. Combining phenotypic, genetic, and genomic analysis of diverse V. cholerae strains, we discovered a single nucleotide polymorphism in the type VI promoter that switches its killing activity between a constitutive state beneficial outside hosts and an inducible state for constraint in a host. Our results support a role for noncoding DNA in adaptation of this pathogen. 
    more » « less