Title: Zinc Finger Readers of Methylated DNA
DNA methylation is a prevalent epigenetic modification involved in regulating a number of essential cellular processes, including genomic accessibility and transcriptional outcomes. As such, aberrant alterations in global DNA methylation patterns have been associated with a growing number of disease conditions. Nevertheless, the full mechanisms by which DNA methylation information is interpreted and translated into genomic responses is not yet fully understood. Methyl-CpG binding proteins (MBPs) function as important mediators of this essential process by selectively reading DNA methylation signals and translating this information into down-stream cellular outcomes. The Cys2His2 zinc finger scaffold is one of the most abundant DNA binding motifs found within human transcription factors, yet only a few zinc finger containing proteins capable of conferring selectivity for mCpG over CpG sites have been characterized. This review summarizes our current structural understanding for the mechanisms by which the zinc finger MBPs evaluated to date read this essential epigenetic mark. Further, some of the biological implications for mCpG readout elicited by this family of MBPs are discussed.
National Science Foundation
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implementedmore »a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx.« less
  2. INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how thesemore »sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry.« less
  3. Abstract Polycomb repressive complex 2 (PRC2) is a histone methyltransferase that methylates histone H3 at Lysine 27. PRC2 is critical for epigenetic gene silencing, cellular differentiation and the formation of facultative heterochromatin. It can also promote or inhibit oncogenesis. Despite this importance, the molecular mechanisms by which PRC2 compacts chromatin are relatively understudied. Here, we visualized the binding of PRC2 to naked DNA in liquid at the single-molecule level using atomic force microscopy. Analysis of the resulting images showed PRC2, consisting of five subunits (EZH2, EED, SUZ12, AEBP2 and RBBP4), bound to a 2.5-kb DNA with an apparent dissociation constant ($K_{\rm{D}}^{{\rm{app}}}$) of 150 ± 12 nM. PRC2 did not show sequence-specific binding to a region of high GC content (76%) derived from a CpG island embedded in such a long DNA substrate. At higher concentrations, PRC2 compacted DNA by forming DNA loops typically anchored by two or more PRC2 molecules. Additionally, PRC2 binding led to a 3-fold increase in the local bending of DNA’s helical backbone without evidence of DNA wrapping around the protein. We suggest that the bending and looping of DNA by PRC2, independent of PRC2’s methylation activity, may contribute to heterochromatin formation and therefore epigenetic gene silencing.
    ABSTRACT D-block metal cations are essential for most biological processes; however, excessive metal exposure can be deleterious to the survival of microorganisms. To tightly control heavy metal regulation, prokaryotic organisms have developed several mechanisms to sense and adapt to changes in intracellular and extracellular metal concentrations. The ferric uptake regulator superfamily of transcription factors associates with DNA when complexed with a regulatory metal cofactor and often represses the transcription of genes involved in metal transport, thus providing a genomic response to an environmental stressor. Although extensively studied in mesothermic organisms, there is little information describing ferric uptake regulator homologs in thermophiles. In this study, we biochemically characterize the ferric uptake regulator homolog TTHA1292 in the extreme thermophile Thermus thermophilus HB8. We identify the preferred DNA-binding sequence of TTHA1292 using the combinatorial approach, restriction endonuclease, protection, selection, and amplification (REPSA). We map this sequence to the Thermus thermophilus HB8 genome and identify the TTHA1292 transcription regulatory network, which includes the zinc ABC transporter subunit genes TTHA0596 and TTHA0453/4 . We formally implicate TTHA1292 as a zinc uptake regulator and show that zinc coordination is critical for the multimerization of TTHA1292 dimers on DNA in vitro and transcription repression in vivo .more »IMPORTANCE Discovering how organisms sense and adapt to their environments is paramount to understanding biology. Thermophilic organisms have adapted to survive at elevated temperatures (>50°C); however, our understanding of how these organisms adapt to changes in their environment is limited. In this study, we identify a zinc uptake regulator in the extreme thermophile Thermus thermophilus HB8 that provides a genomic response to fluctuations in zinc availability. These results provide insights into thermophile biology, as well as the zinc uptake regulator family of proteins.« less
    Abstract The methyltransferase like (METTL) proteins constitute a family of seven-beta-strand methyltransferases with S-adenosyl methionine binding domains that modify DNA, RNA, and proteins. Methylation by METTL proteins contributes to the epigenetic, and in the case of RNA modifications, epitranscriptomic regulation of a variety of biological processes. Despite their functional importance, most investigations of the substrates and functions of METTLs within metazoans have been restricted to model vertebrate taxa. In the present work, we explore the evolutionary mechanisms driving the diversification and functional differentiation of 33 individual METTL proteins across Metazoa. Our results show that METTLs are nearly ubiquitous across the animal kingdom, with most having arisen early in metazoan evolution (i.e., occur in basal metazoan phyla). Individual METTL lineages each originated from single independent ancestors, constituting monophyletic clades, which suggests that each METTL was subject to strong selective constraints driving its structural and/or functional specialization. Interestingly, a similar process did not extend to the differentiation of nucleoside-modifying and protein-modifying METTLs (i.e., each METTL type did not form a unique monophyletic clade). The members of these two types of METTLs also exhibited differences in their rates of evolution. Overall, we provide evidence that the long-term evolution of METTL family members wasmore »driven by strong purifying selection, which in combination with adaptive selection episodes, led to the functional specialization of individual METTL lineages. This work contributes useful information regarding the evolution of a gene family that fulfills a variety of epigenetic functions, and can have profound influences on molecular processes and phenotypic traits.« less