skip to main content


This content will become publicly available on September 28, 2024

Title: High levels of intra-strain structural variation in Drosophila simulans X pericentric heterochromatin
Abstract

Large genome structural variations can impact genome regulation and integrity. Repeat-rich regions like pericentric heterochromatin are vulnerable to structural rearrangements although we know little about how often these rearrangements occur over evolutionary time. Repetitive genome regions are particularly difficult to study with genomic approaches, as they are missing from most genome assemblies. However, cytogenetic approaches offer a direct way to detect large rearrangements involving pericentric heterochromatin. Here, we use a cytogenetic approach to reveal large structural rearrangements associated with the X pericentromeric region of Drosophila simulans. These rearrangements involve large blocks of satellite DNA—the 500-bp and Rsp-like satellites—which colocalize in the X pericentromeric heterochromatin. We find that this region is polymorphic not only among different strains, but between isolates of the same strain from different labs, and even within individual isolates. On the one hand, our observations raise questions regarding the potential impact of such variation at the phenotypic level and our ability to control for such genetic variability. On the other hand, this highlights the very rapid turnover of the pericentric heterochromatin most likely associated with genomic instability of the X pericentromere. It represents a unique opportunity to study the dynamics of pericentric heterochromatin, the evolution of associated satellites on a very short time scale, and to better understand how structural variation arises.

 
more » « less
Award ID(s):
1844693
NSF-PAR ID:
10476787
Author(s) / Creator(s):
;
Editor(s):
Bateman, J
Publisher / Repository:
Oxford
Date Published:
Journal Name:
GENETICS
ISSN:
1943-2631
Subject(s) / Keyword(s):
["structural variation","pericentromeric heterochromatin","satellite repeats","Drosophila"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how these sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry. 
    more » « less
  2. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  3. null (Ed.)
    Pericentromeric heterochromatin in Drosophila generally consists of repetitive DNA, forming the environment associated with gene silencing. Despite the expanding knowledge of the impact of transposable elements (TEs) on the host genome, little is known about the evolution of pericentromeric heterochromatin, its structural composition, and age. During the evolution of the Drosophilidae, hundreds of genes have become embedded within pericentromeric regions yet retained activity. We investigated a pericentromeric heterochromatin fragment found in D. virilis and related species, describing the evolution of genes in this region and the age of TE invasion. Regardless of the heterochromatic environment, the amino acid composition of the genes is under purifying selection. However, the selective pressure affects parts of genes in varying degrees, resulting in expansion of gene introns due to TEs invasion. According to the divergence of TEs, the pericentromeric heterochromatin of the species of virilis group began to form more than 20 million years ago by invasions of retroelements, miniature inverted repeat transposable elements (MITEs), and Helitrons. Importantly, invasions into the heterochromatin continue to occur by TEs that fall under the scope of piRNA silencing. Thus, the pericentromeric heterochromatin, in spite of its ability to induce silencing, has the means for being dynamic, incorporating the regions of active transcription. 
    more » « less
  4. Abstract Background

    Carbapenem-resistant Enterobacterales (CRE) are highly concerning MDR pathogens. Horizontal transfer of broad-host-range IncN plasmids may contribute to the dissemination of the Klebsiella pneumoniae carbapenemase (KPC), spreading carbapenem resistance among unrelated bacteria. However, the population structure and genetic diversity of IncN plasmids has not been fully elucidated.

    Objectives

    We reconstructed blaKPC-harbouring IncN plasmid genomes to characterize shared gene content, structural variability, and putative horizontal transfer within and across patients and diverse bacterial clones.

    Methods

    We performed short- and long-read sequencing and hybrid assembly on 45 CRE isolates with blaKPC-harbouring IncN plasmids. Eight serial isolates from two patients were included to assess intra-patient plasmid dynamics. Comparative genomic analysis was performed to assess structural and sequence similarity across plasmids. Within IncN sublineages defined by plasmid MLST and kmer-based clustering, phylogenetic analysis was used to identify closely related plasmids.

    Results

    Comparative analysis of IncN plasmid genomes revealed substantial heterogeneity including large rearrangements in serial patient plasmids and differences in structure and content across plasmid clusters. Within plasmid sublineages, core genome content and resistance gene regions were largely conserved. Closely related plasmids (≤1 SNP) were found in highly diverse isolates, including ten pST6 plasmids found in eight bacterial clones from three different species.

    Conclusions

    Genomic analysis of blaKPC-harbouring IncN plasmids revealed the presence of several distinct sublineages as well as substantial host diversity within plasmid clusters suggestive of frequent mobilization. This study reveals complex plasmid dynamics within a single plasmid family, highlighting the challenge of tracking plasmid-mediated transmission of blaKPC in clinical settings.

     
    more » « less
  5. Mitchell, Aaron P. (Ed.)
    ABSTRACT Candida albicans is an opportunistic fungal pathogen of humans that is typically diploid yet has a highly labile genome tolerant of large-scale perturbations including chromosomal aneuploidy and loss-of-heterozygosity events. The ability to rapidly generate genetic variation is crucial for C. albicans to adapt to changing or stressful environments, like those encountered in the host. Genetic variation occurs via stress-induced mutagenesis or can be generated through its parasexual cycle, in which tetraploids arise via diploid mating or stress-induced mitotic defects and undergo nonmeiotic ploidy reduction. However, it remains largely unknown how genetic background contributes to C. albicans genome instability in vitro or in the host environment. Here, we tested how genetic background, ploidy, and the host environment impacts C. albicans genome stability. We found that host association induced both loss-of-heterozygosity events and genome size changes, regardless of genetic background or ploidy. However, the magnitude and types of genome changes varied across C. albicans strain background and ploidy state. We then assessed if host-induced genomic changes resulted in fitness consequences on growth rate and nonlethal virulence phenotypes and found that many host-derived isolates significantly changed relative to their parental strain. Interestingly, diploid host-associated C. albicans predominantly decreased host reproductive fitness, whereas tetraploid host-associated C. albicans increased host reproductive fitness. Together, these results are important for understanding how host-induced genomic changes in C. albicans alter its relationship with the host. IMPORTANCE Candida albicans is an opportunistic fungal pathogen of humans. The ability to generate genetic variation is essential for adaptation and is a strategy that C. albicans and other fungal pathogens use to change their genome size. Stressful environments, including the host, induce C. albicans genome instability. Here, we investigated how C. albicans genetic background and ploidy state impact genome instability, both in vitro and in a host environment. We show that the host environment induces genome instability, but the magnitude depends on C. albicans genetic background. Furthermore, we show that tetraploid C. albicans is highly unstable in host environments and rapidly reduces in genome size. These reductions in genome size often resulted in reduced virulence. In contrast, diploid C. albicans displayed modest host-induced genome size changes, yet these frequently resulted in increased virulence. Such studies are essential for understanding how opportunistic pathogens respond and potentially adapt to the host environment. 
    more » « less