skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies
Award ID(s):
1732253 1350041
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Nature Methods
Page Range / eLocation ID:
687 to 695
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The telomere protein assemblies in different fungal lineages manifest quite profound structural and functional divergence, implying a high degree of flexibility and adaptability. Previous comparative analyses of fungal telomeres have focused on the role of telomere sequence alterations in promoting the evolution of corresponding proteins, particularly in budding and fission yeast. However, emerging evidence suggests that even in fungi with the canonical 6-bp telomere repeat unit, there are significant remodeling of the telomere assembly. Indeed, a new protein family can be recruited to serve dedicated telomere functions, and then experience subsequent loss in sub-branches of the clade. An especially interesting example is the Tay1 family of proteins, which emerged in fungi prior to the divergence of basidiomycetes from ascomycetes. This relatively recent protein family appears to have acquired its telomere DNA-binding activity through the modification of another Myb-containing protein. Members of the Tay1 family evidently underwent rather dramatic functional diversification, serving, e.g., as transcription factors in fission yeast while acting to promote telomere maintenance in basidiomycetes and some hemi-ascomycetes. Remarkably, despite its distinct structural organization and evolutionary origin, a basidiomycete Tay1 appears to promote telomere replication using the same mechanism as mammalian TRF1, i.e., by recruiting and regulating Blm helicase activity. This apparent example of convergent evolution at the molecular level highlight the ability of telomere proteins to acquire new interaction targets. The remarkable evolutionary history of Tay1 illustrates the power of protein modularity and the facile acquisition of nucleic acid/protein-binding activity to promote telomere flexibility. 
    more » « less
  2. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  3. Abstract With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights. 
    more » « less
  4. Abstract

    Duplex telomere binding proteins exhibit considerable structural and functional diversity in fungi. Herein we interrogate the activities and functions of two Myb-containing, duplex telomere repeat-binding factors inUstilago maydis, a basidiomycete that is evolutionarily distant from the standard fungi. These two telomere-binding proteins,UmTay1 andUmTrf2, despite having distinct domain structures, exhibit comparable affinities and sequence specificity for the canonical telomere repeats.UmTay1 specializes in promoting telomere replication and an ALT-like pathway, most likely by modulating the helicase activity of Blm.UmTrf2, in contrast, is critical for telomere protection; transcriptional repression ofUmtrf2leads to severe growth defects and profound telomere aberrations. Comparative analysis ofUmTay1 homologs in different phyla reveals broad functional diversity for this protein family and provides a case study for how DNA-binding proteins can acquire and lose functions at various chromosomal locations. Our findings also point to stimulatory effect of telomere protein on ALT inUstilago maydisthat may be conserved in other systems.

    more » « less