skip to main content


Title: Chromosome-Level Assembly of the Atlantic Silverside Genome Reveals Extreme Levels of Sequence Diversity and Structural Genetic Variation
Abstract The levels and distribution of standing genetic variation in a genome can provide a wealth of insights about the adaptive potential, demographic history, and genome structure of a population or species. As structural variants are increasingly associated with traits important for adaptation and speciation, investigating both sequence and structural variation is essential for wholly tapping this potential. Using a combination of shotgun sequencing, 10x Genomics linked reads and proximity-ligation data (Chicago and Hi-C), we produced and annotated a chromosome-level genome assembly for the Atlantic silverside (Menidia menidia)—an established ecological model for studying the phenotypic effects of natural and artificial selection—and examined patterns of genomic variation across two individuals sampled from different populations with divergent local adaptations. Levels of diversity varied substantially across each chromosome, consistently being highly elevated near the ends (presumably near telomeric regions) and dipping to near zero around putative centromeres. Overall, our estimate of the genome-wide average heterozygosity in the Atlantic silverside is among the highest reported for a fish, or any vertebrate (1.32–1.76% depending on inference method and sample). Furthermore, we also found extreme levels of structural variation, affecting ∼23% of the total genome sequence, including multiple large inversions (> 1 Mb and up to 12.6 Mb) associated with previously identified haploblocks showing strong differentiation between locally adapted populations. These extreme levels of standing genetic variation are likely associated with large effective population sizes and may help explain the remarkable adaptive divergence among populations of the Atlantic silverside.  more » « less
Award ID(s):
1756316
NSF-PAR ID:
10317275
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Lohmueller, Kirk
Date Published:
Journal Name:
Genome Biology and Evolution
Volume:
13
Issue:
6
ISSN:
1759-6653
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The study of local adaptation in the presence of ongoing gene flow is the study of natural selection in action, revealing the functional genetic diversity most relevant to contemporary pressures. In addition to individual genes, genome-wide architecture can itself evolve to enable adaptation. Distributed across a steep thermal gradient along the east coast of North America, Atlantic silversides (Menidia menidia) exhibit an extraordinary degree of local adaptation in a suite of traits, and the capacity for rapid adaptation from standing genetic variation, but we know little about the patterns of genomic variation across the species range that enable this remarkable adaptability. Here, we use low-coverage, whole-transcriptome sequencing of Atlantic silversides sampled along an environmental cline to show marked signatures of divergent selection across a gradient of neutral differentiation. Atlantic silversides sampled across 1371 km of the southern section of its distribution have very low genome-wide differentiation (median FST = 0.006 across 1.9 million variants), consistent with historical connectivity and observations of recent migrants. Yet almost 14,000 single nucleotide polymorphisms (SNPs) are nearly fixed (FST > 0.95) for alternate alleles. Highly differentiated SNPs cluster into four tight linkage disequilibrium (LD) blocks that span hundreds of genes and several megabases. Variants in these LD blocks are disproportionately nonsynonymous and concentrated in genes enriched for multiple functions related to known adaptations in silversides, including variation in lipid storage, metabolic rate, and spawning behavior. Elevated levels of absolute divergence and demographic modeling suggest selection maintaining divergence across these blocks under gene flow. These findings represent an extreme case of heterogeneity in levels of differentiation across the genome, and highlight how gene flow shapes genomic architecture in continuous populations. Locally adapted alleles may be common features of populations distributed along environmental gradients, and will likely be key to conserving variation to enable future responses to environmental change.

     
    more » « less
  2. By investigating evolutionary adaptations that change physiological functions, we can enhance our understanding of how organisms work, the importance of physiological traits, and the genes that influence these traits. This approach of investigating the evolution of physiological adaptation has been used with the teleost fish Fundulus heteroclitus and has produced insights into (i) how protein polymorphisms enhance swimming and development; (ii) the role of equilibrium enzymes in modulating metabolic flux; (iii) how variation in DNA sequences and mRNA expression patterns mitigate changes in temperature, pollution, and salinity; and (iv) the importance of nuclear-mitochondrial genome interactions for energy metabolism. Fundulus heteroclitus provides so many examples of adaptive evolution because their local population sizes are large, they have significant standing genetic variation, and they experience large ranges of environmental conditions that enhance the likelihood that adaptive evolution will occur. Thus, F. heteroclitus research takes advantage of evolutionary changes associated with exposure to diverse environments, both across the North American Atlantic coast and within local habitats, to contrast neutral versus adaptive divergence. Based on evolutionary analyses contrasting neutral and adaptive evolution in F. heteroclitus populations, we conclude that adaptive evolution can occur readily and rapidly, at least in part because it depends on large amounts of standing genetic variation among many genes that can alter physiological traits. These observations of polygenic adaptation enhance our understanding of how evolution and physiological adaptation progresses, thus informing both biological and medical scientists about genotype-phenotype relationships 
    more » « less
  3. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  4. Whitehead, A (Ed.)
    Abstract Many species that are extensively studied in the laboratory are less well characterized in their natural habitat, and laboratory strains represent only a small fraction of the variation in a species’ genome. Here we investigate genomic variation in 3 natural North American populations of an agricultural pest and a model insect for many scientific disciplines, the tobacco hornworm (Manduca sexta). We show that hornworms from Arizona, Kansas, and North Carolina are genetically distinct, with Arizona being particularly differentiated from the other 2 populations using Illumina whole-genome resequencing. Peaks of differentiation exist across the genome, but here, we focus in on the most striking regions. In particular, we identify 2 likely segregating inversions found in the Arizona population. One inversion on the Z chromosome may enhance adaptive evolution of the sex chromosome. The larger, 8 Mb inversion on chromosome 12 contains a pseudogene which may be involved in the exploitation of a novel hostplant in Arizona, but functional genetic assays will be required to support this hypothesis. Nevertheless, our results reveal undiscovered natural variation and provide useful genomic data for both pest management and evolutionary genetics of this insect species. 
    more » « less
  5. Enard, David (Ed.)
    Abstract High-quality reference genomes are fundamental tools for understanding population history, and can provide estimates of genetic and demographic parameters relevant to the conservation of biodiversity. The federally endangered Pacific pocket mouse (PPM), which persists in three small, isolated populations in southern California, is a promising model for studying how demographic history shapes genetic diversity, and how diversity in turn may influence extinction risk. To facilitate these studies in PPM, we combined PacBio HiFi long reads with Omni-C and Hi-C data to generate a de novo genome assembly, and annotated the genome using RNAseq. The assembly comprised 28 chromosome-length scaffolds (N50 = 72.6 MB) and the complete mitochondrial genome, and included a long heterochromatic region on chromosome 18 not represented in the previously available short-read assembly. Heterozygosity was highly variable across the genome of the reference individual, with 18% of windows falling in runs of homozygosity (ROH) >1 MB, and nearly 9% in tracts spanning >5 MB. Yet outside of ROH, heterozygosity was relatively high (0.0027), and historical Ne estimates were large. These patterns of genetic variation suggest recent inbreeding in a formerly large population. Currently the most contiguous assembly for a heteromyid rodent, this reference genome provides insight into the past and recent demographic history of the population, and will be a critical tool for management and future studies of outbreeding depression, inbreeding depression, and genetic load. 
    more » « less