skip to main content

Title: Rapid macrosatellite evolution promotes X-Linked hybrid male sterility in a feline interspecies cross
Abstract The sterility or inviability of hybrid offspring produced from an interspecific mating results from incompatibilities between parental genotypes that are thought to result from divergence of loci involved in epistatic interactions. However, attributes contributing to the rapid evolution of these regions also complicates their assembly, thus discovery of candidate hybrid sterility loci is difficult and has been restricted to a small number of model systems. Here we reported rapid interspecific divergence at the DXZ4 macrosatellite locus in an interspecific cross between two closely related mammalian species: the domestic cat (Felis silvestris catus) and the Jungle cat (Felis chaus). DXZ4 is an interesting candidate due to its structural complexity, copy number variability, and described role in the critical yet complex biological process of X-chromosome inactivation. However, the full structure of DXZ4 was absent or incomplete in nearly every available mammalian genome assembly given its repetitive complexity. We compared highly continuous genomes for three cat species, each containing a complete DXZ4 locus, and discovered that the felid DXZ4 locus differs substantially from the human ortholog, and that it varies in copy number between cat species. Additionally, we reported expression, methylation, and structural conformation profiles of DXZ4 and the X chromosome during stages of spermatogenesis that have been previously associated with hybrid male sterility. Collectively, these findings suggest a new role for DXZ4 in male meiosis and a proposed mechanism of feline interspecific incompatibility through rapid satellite divergence.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Gojobori, Jun
Date Published:
Journal Name:
Molecular Biology and Evolution
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  2. Shapiro, Beth (Ed.)
    Abstract In addition to including one of the most popular companion animals, species from the cat family Felidae serve as a powerful system for genetic analysis of inherited and infectious disease, as well as for the study of phenotypic evolution and speciation. Previous diploid-based genome assemblies for the domestic cat have served as the primary reference for genomic studies within the cat family. However, these versions suffered from poor resolution of complex and highly repetitive regions, with substantial amounts of unplaced sequence that is polymorphic or copy number variable. We sequenced the genome of a female F1 Bengal hybrid cat, the offspring of a domestic cat (Felis catus) x Asian leopard cat (Prionailurus bengalensis) cross, with PacBio long sequence reads and used Illumina sequence reads from the parents to phase >99.9% of the reads into the 2 species’ haplotypes. De novo assembly of the phased reads produced highly continuous haploid genome assemblies for the domestic cat and Asian leopard cat, with contig N50 statistics exceeding 83 Mb for both genomes. Whole-genome alignments reveal the Felis and Prionailurus genomes are colinear, and the cytogenetic differences between the homologous F1 and E4 chromosomes represent a case of centromere repositioning in the absence of a chromosomal inversion. Both assemblies offer significant improvements over the previous domestic cat reference genome, with a 100% increase in contiguity and the capture of the vast majority of chromosome arms in 1 or 2 large contigs. We further demonstrated that comparably accurate F1 haplotype phasing can be achieved with members of the same species when one or both parents of the trio are not available. These novel genome resources will empower studies of feline precision medicine, adaptation, and speciation. 
    more » « less
  3. INTRODUCTION Resolving the role that different environmental forces may have played in the apparent explosive diversification of modern placental mammals is crucial to understanding the evolutionary context of their living and extinct morphological and genomic diversity. RATIONALE Limited access to whole-genome sequence alignments that sample living mammalian biodiversity has hampered phylogenomic inference, which until now has been limited to relatively small, highly constrained sequence matrices often representing <2% of a typical mammalian genome. To eliminate this sampling bias, we used an alignment of 241 whole genomes to comprehensively identify and rigorously analyze noncoding, neutrally evolving sequence variation in coalescent and concatenation-based phylogenetic frameworks. These analyses were followed by validation with multiple classes of phylogenetically informative structural variation. This approach enabled the generation of a robust time tree for placental mammals that evaluated age variation across hundreds of genomic loci that are not restricted by protein coding annotations. RESULTS Coalescent and concatenation phylogenies inferred from multiple treatments of the data were highly congruent, including support for higher-level taxonomic groupings that unite primates+colugos with treeshrews (Euarchonta), bats+cetartiodactyls+perissodactyls+carnivorans+pangolins (Scrotifera), all scrotiferans excluding bats (Fereuungulata), and carnivorans+pangolins with perissodactyls (Zooamata). However, because these approaches infer a single best tree, they mask signatures of phylogenetic conflict that result from incomplete lineage sorting and historical hybridization. Accordingly, we also inferred phylogenies from thousands of noncoding loci distributed across chromosomes with historically contrasting recombination rates. Throughout the radiation of modern orders (such as rodents, primates, bats, and carnivores), we observed notable differences between locus trees inferred from the autosomes and the X chromosome, a pattern typical of speciation with gene flow. We show that in many cases, previously controversial phylogenetic relationships can be reconciled by examining the distribution of conflicting phylogenetic signals along chromosomes with variable historical recombination rates. Lineage divergence time estimates were notably uniform across genomic loci and robust to extensive sensitivity analyses in which the underlying data, fossil constraints, and clock models were varied. The earliest branching events in the placental phylogeny coincide with the breakup of continental landmasses and rising sea levels in the Late Cretaceous. This signature of allopatric speciation is congruent with the low genomic conflict inferred for most superordinal relationships. By contrast, we observed a second pulse of diversification immediately after the Cretaceous-Paleogene (K-Pg) extinction event superimposed on an episode of rapid land emergence. Greater geographic continuity coupled with tumultuous climatic changes and increased ecological landscape at this time provided enhanced opportunities for mammalian diversification, as depicted in the fossil record. These observations dovetail with increased phylogenetic conflict observed within clades that diversified in the Cenozoic. CONCLUSION Our genome-wide analysis of multiple classes of sequence variation provides the most comprehensive assessment of placental mammal phylogeny, resolves controversial relationships, and clarifies the timing of mammalian diversification. We propose that the combination of Cretaceous continental fragmentation and lineage isolation, followed by the direct and indirect effects of the K-Pg extinction at a time of rapid land emergence, synergistically contributed to the accelerated diversification rate of placental mammals during the early Cenozoic. The timing of placental mammal evolution. Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting. 
    more » « less
  4. Abstract

    Incompatibilities on the sex chromosomes are important in the evolution of hybrid male sterility, but the evolutionary forces underlying this phenomenon are unclear. House mice (Mus musculus) lineages have provided powerful models for understanding the genetic basis of hybrid male sterility. X chromosome–autosome interactions cause strong incompatibilities in M. musculus F1 hybrids, but variation in sterility phenotypes suggests a more complex genetic basis. In addition, XY chromosome conflict has resulted in rapid expansions of ampliconic genes with dosage-dependent expression that is essential to spermatogenesis. Here, we evaluated the contribution of XY lineage mismatch to male fertility and stage-specific gene expression in hybrid mice. We performed backcrosses between two house mouse subspecies to generate reciprocal Y-introgression strains and used these strains to test the effects of XY mismatch in hybrids. Our transcriptome analyses of sorted spermatid cells revealed widespread overexpression of the X chromosome in sterile F1 hybrids independent of Y chromosome subspecies origin. Thus, postmeiotic overexpression of the X chromosome in sterile F1 mouse hybrids is likely a downstream consequence of disrupted meiotic X-inactivation rather than XY gene copy number imbalance. Y chromosome introgression did result in subfertility phenotypes and disrupted expression of several autosomal genes in mice with an otherwise nonhybrid genomic background, suggesting that Y-linked incompatibilities contribute to reproductive barriers, but likely not as a direct consequence of XY conflict. Collectively, these findings suggest that rapid sex chromosome gene family evolution driven by genomic conflict has not resulted in strong male reproductive barriers between these subspecies of house mice.

    more » « less
  5. Katz, Laura A (Ed.)
    Abstract Sterility among hybrids is one of the most prevalent forms of reproductive isolation delineating species boundaries and is expressed disproportionately in heterogametic XY males. While hybrid male sterility (HMS) due to the “large X effect” is a well-recognized mechanism of reproductive isolation, it is less clear how HMS manifests in species that lack heteromorphic sex chromosomes. We evaluated differences in allele frequencies at approximately 460,000 SNPs between fertile and sterile F2 interpopulation male hybrids to characterize the genomic architecture of HMS in a species without sex chromosomes (Tigriopus californicus). We tested associations between HMS and mitochondrial-nuclear and/or nuclear-nuclear signatures of incompatibility. Genomic regions associated with HMS were concentrated on a single chromosome with the same primary 2-Mbp regions identified in one pair of reciprocal crosses. Gene Ontology analysis revealed that annotations associated with spermatogenesis were the most overrepresented within the implicated region, with nine protein-coding genes connected with this process found in the quantitative trait locus of chromosome 2. Our results indicate that a narrow genomic region was associated with the sterility of male hybrids in T. californicus and suggest that incompatibilities among select nuclear loci may replace the large X effect when sex chromosomes are absent. 
    more » « less