skip to main content


Title: Molecular Evolution across Mouse Spermatogenesis
Abstract Genes involved in spermatogenesis tend to evolve rapidly, but we lack a clear understanding of how protein sequences and patterns of gene expression evolve across this complex developmental process. We used fluorescence-activated cell sorting (FACS) to generate expression data for early (meiotic) and late (postmeiotic) cell types across 13 inbred strains of mice (Mus) spanning ∼7 My of evolution. We used these comparative developmental data to investigate the evolution of lineage-specific expression, protein-coding sequences, and expression levels. We found increased lineage specificity and more rapid protein-coding and expression divergence during late spermatogenesis, suggesting that signatures of rapid testis molecular evolution are punctuated across sperm development. Despite strong overall developmental parallels in these components of molecular evolution, protein and expression divergences were only weakly correlated across genes. We detected more rapid protein evolution on the X chromosome relative to the autosomes, whereas X-linked gene expression tended to be relatively more conserved likely reflecting chromosome-specific regulatory constraints. Using allele-specific FACS expression data from crosses between four strains, we found that the relative contributions of different regulatory mechanisms also differed between cell types. Genes showing cis-regulatory changes were more common late in spermatogenesis, and tended to be associated with larger differences in expression levels and greater expression divergence between species. In contrast, genes with trans-acting changes were more common early and tended to be more conserved across species. Our findings advance understanding of gene evolution across spermatogenesis and underscore the fundamental importance of developmental context in molecular evolutionary studies.  more » « less
Award ID(s):
2012041
NSF-PAR ID:
10327203
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Wittkopp, Patricia
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
39
Issue:
2
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. John Davey ; Lisa Nagy ; Elizabeth Jockusch ; Julia Bowsher (Ed.)
    Clade-specific (a.k.a. lineage-specific) genes are very common and found at all taxonomic levels and in all clades examined. They can arise by duplication of previously existing genes, which can involve partial truncations or combinations with other protein domains or regulatory sequences. They can also evolve de novo from non-coding sequences, leading to potentially truly novel protein domains. Finally, since clade-specific genes are generally defined by lack of sequence homology with other proteins, they can also arise by sequence evolution that is rapid enough that previous sequence homology can no longer be detected. In such cases, where the rapid evolution is followed by constraint, we consider them to be ontologically non-novel but likely novel at a functional level. In general, clade-specific genes have received less attention from biologists but there are increasing numbers of fascinating examples of their roles in important traits. Here we review some selected recent examples, and argue that attention to clade-specific genes is an important corrective to the focus on the conserved developmental regulatory toolkit that has been the habit of evo-devo as a field. Finally, we discuss questions that arise about the evolution of clade-specific genes, and how these might be addressed by future studies. We highlight the hy- pothesis that clade-specific genes are more likely to be involved in synapomorphies that arose in the stem group where they appeared, compared to other genes. 
    more » « less
  2. Malik, Harmit S. (Ed.)
    Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas , required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas . The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis. 
    more » « less
  3. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  4. Abstract

    Incompatibilities on the sex chromosomes are important in the evolution of hybrid male sterility, but the evolutionary forces underlying this phenomenon are unclear. House mice (Mus musculus) lineages have provided powerful models for understanding the genetic basis of hybrid male sterility. X chromosome–autosome interactions cause strong incompatibilities in M. musculus F1 hybrids, but variation in sterility phenotypes suggests a more complex genetic basis. In addition, XY chromosome conflict has resulted in rapid expansions of ampliconic genes with dosage-dependent expression that is essential to spermatogenesis. Here, we evaluated the contribution of XY lineage mismatch to male fertility and stage-specific gene expression in hybrid mice. We performed backcrosses between two house mouse subspecies to generate reciprocal Y-introgression strains and used these strains to test the effects of XY mismatch in hybrids. Our transcriptome analyses of sorted spermatid cells revealed widespread overexpression of the X chromosome in sterile F1 hybrids independent of Y chromosome subspecies origin. Thus, postmeiotic overexpression of the X chromosome in sterile F1 mouse hybrids is likely a downstream consequence of disrupted meiotic X-inactivation rather than XY gene copy number imbalance. Y chromosome introgression did result in subfertility phenotypes and disrupted expression of several autosomal genes in mice with an otherwise nonhybrid genomic background, suggesting that Y-linked incompatibilities contribute to reproductive barriers, but likely not as a direct consequence of XY conflict. Collectively, these findings suggest that rapid sex chromosome gene family evolution driven by genomic conflict has not resulted in strong male reproductive barriers between these subspecies of house mice.

     
    more » « less
  5. Understanding how regulatory mechanisms evolve is critical for understanding the processes that give rise to novel phenotypes. Snake venom systems represent a valuable and tractable model for testing hypotheses related to the evolution of novel regulatory networks, yet the regulatory mechanisms underlying venom production remain poorly understood. Here, we use functional genomics approaches to investigate venom regulatory architecture in the prairie rattlesnake and identify cis -regulatory sequences (enhancers and promoters), trans -regulatory transcription factors, and integrated signaling cascades involved in the regulation of snake venom genes. We find evidence that two conserved vertebrate pathways, the extracellular signal-regulated kinase and unfolded protein response pathways, were co-opted to regulate snake venom. In one large venom gene family (snake venom serine proteases), this co-option was likely facilitated by the activity of transposable elements. Patterns of snake venom gene enhancer conservation, in some cases spanning 50 million yr of lineage divergence, highlight early origins and subsequent lineage-specific adaptations that have accompanied the evolution of venom regulatory architecture. We also identify features of chromatin structure involved in venom regulation, including topologically associated domains and CTCF loops that underscore the potential importance of novel chromatin structure to coevolve when duplicated genes evolve new regulatory control. Our findings provide a model for understanding how novel regulatory systems may evolve through a combination of genomic processes, including tandem duplication of genes and regulatory sequences, cis -regulatory sequence seeding by transposable elements, and diverse transcriptional regulatory proteins controlled by a co-opted regulatory cascade. 
    more » « less