skip to main content


Title: Insights into mammalian TE diversity through the curation of 248 genome assemblies

We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals.

 
more » « less
Award ID(s):
2032063
NSF-PAR ID:
10487964
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Corporate Creator(s):
Publisher / Repository:
AAAS
Date Published:
Journal Name:
Science
Volume:
380
Issue:
6643
ISSN:
0036-8075
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Transposable elements (TEs) pervade most eukaryotic genomes. The repetitive nature of TEs complicates the analysis of their expression. Evaluation of the expression of both TE families (using unique and multi-mapping reads) and specific elements (using uniquely mapping reads) in leaf tissue of three maize (Zea mays) inbred lines subjected to heat or cold stress reveals no evidence for genome-wide activation of TEs; however, some specific TE families generate transcripts only in stress conditions. There is substantial variation for which TE families exhibit stress-responsive expression in the different genotypes. In order to understand the factors that drive expression of TEs, we focused on a subset of families in which we could monitor expression of individual elements. The stress-responsive activation of a TE family can often be attributed to a small number of elements in the family that contains regions lacking DNA methylation. Comparisons of the expression of TEs in different genotypes revealed both genetic and epigenetic variation. Many of the specific TEs that are activated in stress in one inbred are not present in the other inbred, explaining the lack of activation. Among the elements that are shared in both genomes but only expressed in one genotype, we found that many exhibit differences in DNA methylation such that the genotype without expression is fully methylated. This study provides insights into the regulation of expression of TEs in normal and stress conditions and highlights the role of chromatin variation between elements in a family or between genotypes for contributing to expression variation. The highly repetitive nature of many TEs complicates the analysis of their expression. Although most TEs are not expressed, some exhibits expression in certain tissues or conditions. We monitored the expression of both TE families (using unique and multi-mapping reads) and specific elements (using uniquely mapping reads) in leaf tissue of three maize (Zea mays) inbred lines subjected to heat or cold stress. While genome-wide activation of TEs did not occur, some TE families generated transcripts only in stress conditions with variation by genotype. To better understand the factors that drive expression of TEs, we focused on a subset of families in which we could monitor expression of individual elements. In most cases, stress-responsive activation of a TE family was attributed to a small number of elements in the family. The elements that contained small regions lacking DNA methylation regions showed enriched expression while fully methylated elements were rarely expressed in control or stress conditions. The cause of varied expression in the different genotypes was due to both genetic and epigenetic variation. Many specific TEs activated by stress in one inbred were not present in the other inbred. Among the elements shared in both genomes, full methylation inhibited expression in one of the genotypes. This study provides insights into the regulation of TE expression in normal and stress conditions and highlights the role of chromatin variation between elements in a family or between genotypes for contributing to expression. 
    more » « less
  2. Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE calledLAVA(LINE-AluSz-VNTR-AluLIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression incis. We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.

     
    more » « less
  3. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  4. SUMMARY

    The DOMAINS REARRANGED METHYLTRANSFERASEs (DRMs) are crucial for RNA‐directed DNA methylation (RdDM) in plant species.Setaria viridisis a model monocot species with a relatively compact genome that has limited transposable element (TE) content. CRISPR‐based genome editing approaches were used to create loss‐of‐function alleles for the two putative functional DRM genes inS. viridisto probe the role of RdDM. Double mutant (drm1ab)plants exhibit some morphological abnormalities but are fully viable. Whole‐genome methylation profiling provided evidence for the widespread loss of methylation in CHH sequence contexts, particularly in regions with high CHH methylation in wild‐type plants. Evidence was also found for the locus‐specific loss of CG and CHG methylation, even in some regions that lack CHH methylation. Transcriptome profiling identified genes with altered expression in thedrm1abmutants. However, the majority of genes with high levels of CHH methylation directly surrounding the transcription start site or in nearby promoter regions in wild‐type plants do not have altered expression in thedrm1abmutant, even when this methylation is lost, suggesting limited regulation of gene expression by RdDM. Detailed analysis of the expression of TEs identified several transposons that are transcriptionally activated indrm1abmutants. These transposons are likely to require active RdDM for the maintenance of transcriptional repression.

     
    more » « less
  5. Arkhipova, Irina (Ed.)
    Abstract Genome size has been measurable since the 1940s but we still do not understand genome size variation. Caenorhabditis nematodes show strong conservation of chromosome number but vary in genome size between closely related species. Androdioecy, where populations are composed of males and self-fertile hermaphrodites, evolved from outcrossing, female-male dioecy, three times in this group. In Caenorhabditis, androdioecious genomes are 10–30% smaller than dioecious species, but in the nematode Pristionchus, androdioecy evolved six times and does not correlate with genome size. Previous hypotheses include genome size evolution through: 1) Deletions and “genome shrinkage” in androdioecious species; 2) Transposable element (TE) expansion and DNA loss through large deletions (the “accordion model”); and 3) Differing TE dynamics in androdioecious and dioecious species. We analyzed nematode genomes and found no evidence for these hypotheses. Instead, nematode genome sizes had strong phylogenetic inertia with increases in a few dioecious species, contradicting the “genome shrinkage” hypothesis. TEs did not explain genome size variation with the exception of the DNA transposon Mutator which was twice as abundant in dioecious genomes. Across short and long evolutionary distances Caenorhabditis genomes evolved through small structural mutations including gene-associated duplications and insertions. Seventy-one protein families had significant, parallel decreases across androdioecious Caenorhabditis including genes involved in the sensory system, regulatory proteins and membrane-associated immune responses. Our results suggest that within a dynamic landscape of frequent small rearrangements in Caenorhabditis, reproductive mode mediates genome evolution by altering the precise fates of individual genes, proteins, and the phenotypes they underlie. 
    more » « less