skip to main content


Title: Hotspot of de novo telomere addition stabilizes linear amplicons in yeast grown in sulfate-limiting conditions
Abstract

Evolution is driven by the accumulation of competing mutations that influence survival. A broad form of genetic variation is the amplification or deletion of DNA (≥50 bp) referred to as copy number variation (CNV). In humans, CNV may be inconsequential, contribute to minor phenotypic differences, or cause conditions such as birth defects, neurodevelopmental disorders, and cancers. To identify mechanisms that drive CNV, we monitored the experimental evolution of Saccharomyces cerevisiae populations grown under sulfate-limiting conditions. Cells with increased copy number of the gene SUL1, which encodes a primary sulfate transporter, exhibit a fitness advantage. Previously, we reported interstitial inverted triplications of SUL1 as the dominant rearrangement in a haploid population. Here, in a diploid population, we find instead that small linear fragments containing SUL1 form and are sustained over several generations. Many of the linear fragments are stabilized by de novo telomere addition within a telomere-like sequence near SUL1 (within the SNF5 gene). Using an assay that monitors telomerase action following an induced chromosome break, we show that this region acts as a hotspot of de novo telomere addition and that required sequences map to a region of <250 base pairs. Consistent with previous work showing that association of the telomere-binding protein Cdc13 with internal sequences stimulates telomerase recruitment, mutation of a four-nucleotide motif predicted to associate with Cdc13 abolishes de novo telomere addition. Our study suggests that internal telomere-like sequences that stimulate de novo telomere addition can contribute to adaptation by promoting genomic plasticity.

 
more » « less
NSF-PAR ID:
10416190
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GENETICS
Volume:
224
Issue:
2
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  2. Abstract

    Telomere healing occurs when telomerase, normally restricted to chromosome ends, acts upon a double-strand break to create a new, functional telomere. De novo telomere addition (dnTA) on the centromere-proximal side of a break truncates the chromosome but, by blocking resection, may allow the cell to survive an otherwise lethal event. We previously identified several sequences in the baker's yeast, Saccharomyces cerevisiae, that act as hotspots of dnTA [termed Sites of Repair-associated Telomere Addition (SiRTAs)], but the distribution and functional relevance of SiRTAs is unclear. Here, we describe a high-throughput sequencing method to measure the frequency and location of telomere addition within sequences of interest. Combining this methodology with a computational algorithm that identifies SiRTA sequence motifs, we generate the first comprehensive map of telomere-addition hotspots in yeast. Putative SiRTAs are strongly enriched in subtelomeric regions where they may facilitate formation of a new telomere following catastrophic telomere loss. In contrast, outside of subtelomeres, the distribution and orientation of SiRTAs appears random. Since truncating the chromosome at most SiRTAs would be lethal, this observation argues against selection for these sequences as sites of telomere addition per se. We find, however, that sequences predicted to function as SiRTAs are significantly more prevalent across the genome than expected by chance. Sequences identified by the algorithm bind the telomeric protein Cdc13, raising the possibility that association of Cdc13 with single-stranded regions generated during the response to DNA damage may facilitate DNA repair more generally.

     
    more » « less
  3. de Visser, J. Arjan (Ed.)
    The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood–free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright–Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright–Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10 −4.7 to 10 −4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods—barcode lineage tracking and pairwise fitness assays—which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network–based likelihood–free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution. 
    more » « less
  4. Abstract

    Copy number variation (CNV) is a major part of the genetic diversity segregating within populations, but remains poorly understood relative to single nucleotide variation. Here, we report on atRNAligase gene (Migut.N02091;RLG1a) exhibiting unprecedented, and fitness‐relevant,CNVwithin an annual population of the yellow monkeyflowerMimulus guttatus.RLG1a variation was associated with multiple traits in pooled population sequencing (PoolSeq) scans of phenotypic and phenological cohorts. Resequencing of inbred lines revealed intermediate‐frequency three‐copy variants ofRLG1a (trip+;5/35 = 14%), andtrip+lines exhibited elevatedRLG1a expression under multiple conditions.trip+carriers, in addition to being over‐represented in late‐flowering and large‐flowered PoolSeq populations, flowered later under stressful conditions in a greenhouse experiment (p < 0.05). In wild population samples, we discovered an additional rareRLG1a variant (high+) that carries 250–300 copies ofRLG1a totalling ~5.7 Mb (20–40% of a chromosome). In the progeny of ahigh+carrier, Mendelian segregation of diagnostic alleles andqPCR‐based copy counts indicate thathigh+is a single tandem array unlinked to the single‐copyRLG1a locus. In the wild,high+carriers had highest fitness in two particularly dry and/or hot years (2015 and 2017; bothp < 0.01), while single‐copy individuals were twice as fecund as eitherCNVtype in a lush year (2016:p < 0.005). Our results demonstrate fluctuating selection onCNVs affecting phenological traits in a wild population, suggest that planttRNAligases mediate stress‐responsive life‐history traits, and introduce a novel system for investigating the molecular mechanisms of gene amplification.

     
    more » « less
  5. The rapid evolution of repetitive DNA sequences, including satellite DNA, tandem duplications, and transposable elements, underlies phenotypic evolution and contributes to hybrid incompatibilities between species. However, repetitive genomic regions are fragmented and misassembled in most contemporary genome assemblies. We generated highly contiguous de novo reference genomes for the Drosophila simulans species complex ( D. simulans , D. mauritiana , and D. sechellia ), which speciated ∼250,000 yr ago. Our assemblies are comparable in contiguity and accuracy to the current D. melanogaster genome, allowing us to directly compare repetitive sequences between these four species. We find that at least 15% of the D. simulans complex species genomes fail to align uniquely to D. melanogaster owing to structural divergence—twice the number of single-nucleotide substitutions. We also find rapid turnover of satellite DNA and extensive structural divergence in heterochromatic regions, whereas the euchromatic gene content is mostly conserved. Despite the overall preservation of gene synteny, euchromatin in each species has been shaped by clade- and species-specific inversions, transposable elements, expansions and contractions of satellite and tRNA tandem arrays, and gene duplications. We also find rapid divergence among Y-linked genes, including copy number variation and recent gene duplications from autosomes. Our assemblies provide a valuable resource for studying genome evolution and its consequences for phenotypic evolution in these genetic model species. 
    more » « less