- NSF-PAR ID:
- 10335387
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Date Published:
- Journal Name:
- Science
- Volume:
- 376
- Issue:
- 6588
- ISSN:
- 0036-8075
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx.more » « less
-
Centromeres attach chromosomes to spindle microtubules during cell division and, despite this conserved role, show paradoxically rapid evolution and are typified by complex repeats. We used long-read sequencing to generate the Col-CEN Arabidopsis thaliana genome assembly that resolves all five centromeres. The centromeres consist of megabase-scale tandemly repeated satellite arrays, which support CENTROMERE SPECIFIC HISTONE H3 (CENH3) occupancy and are densely DNA methylated, with satellite variants private to each chromosome. CENH3 preferentially occupies satellites that show the least amount of divergence and occur in higher-order repeats. The centromeres are invaded by ATHILA retrotransposons, which disrupt genetic and epigenetic organization. Centromeric crossover recombination is suppressed, yet low levels of meiotic DNA double-strand breaks occur that are regulated by DNA methylation. We propose that Arabidopsis centromeres are evolving through cycles of satellite homogenization and retrotransposon-driven diversification.more » « less
-
Abstract The spatiotemporal organization of DNA replication produces a highly robust and reproducible replication timing profile. Sequencing-based methods for assaying replication timing genome-wide have become commonplace, but regions of high repeat content in the human genome have remained refractory to analysis. Here, we report the first nearly-gapless telomere-to-telomere replication timing profiles in human, using the T2T-CHM13 genome assembly and sequencing data for five cell lines. We find that replication timing can be successfully assayed in centromeres and large blocks of heterochromatin. Centromeric regions replicate in mid-to-late S-phase and contain replication-timing peaks at a similar density to other genomic regions, while distinct families of heterochromatic satellite DNA differ in their bias for replicating in late S-phase. The high degree of consistency in centromeric replication timing across chromosomes within each cell line prompts further investigation into the mechanisms dictating that some cell lines replicate their centromeres earlier than others, and what the consequences of this variation are.
-
null (Ed.)Abstract Centromeres are epigenetically determined nuclear domains strictly required for chromosome segregation and genome stability. However, the mechanisms regulating centromere and kinetochore chromatin modifications are not known. Here, we demonstrate that LSH is enriched at meiotic kinetochores and its targeted deletion induces centromere instability and abnormal chromosome segregation. Superresolution chromatin analysis resolves LSH at the inner centromere and kinetochores during oocyte meiosis. LSH knockout pachytene oocytes exhibit reduced HDAC2 and DNMT-1. Notably, mutant oocytes show a striking increase in histone H3 phosphorylation at threonine 3 (H3T3ph) and accumulation of major satellite transcripts in both prophase-I and metaphase-I chromosomes. Moreover, knockout oocytes exhibit centromere fusions, ectopic kinetochore formation and abnormal exchange of chromatin fibers between paired bivalents and asynapsed chromosomes. Our results indicate that loss of LSH affects the levels and chromosomal localization of H3T3ph and provide evidence that, by maintaining transcriptionally repressive heterochromatin, LSH may be essential to prevent deleterious meiotic recombination events at repetitive centromeric sequences.more » « less
-
Abstract The sequence and assembly of human genomes using long‐read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high‐fidelity (HiFi) or continuous long‐read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5‐fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.