skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Genomic structural variation: A complex but important driver of human evolution
Abstract Structural variants (SVs)—including duplications, deletions, and inversions of DNA—can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single‐nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well‐documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single‐nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever‐expanding SV compendium propelled by biotechnology advancements.  more » « less
Award ID(s):
2145885
PAR ID:
10484238
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
American Journal of Biological Anthropology
Volume:
181
Issue:
S76
ISSN:
2692-7691
Page Range / eLocation ID:
118 to 144
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mérot, Claire; Connallon, Tim (Ed.)
    Abstract The search for the genetic basis of phenotypes has primarily focused on single nucleotide polymorphisms, often overlooking structural variants (SVs). SVs can significantly affect gene function, but detecting and characterizing them is challenging, even with long-read sequencing. Moreover, traditional single-reference methods can fail to capture many genetic variants. Using long reads, we generated a Capuchino Seedeater (Sporophila) pangenome, including 16 individuals from 7 species, to investigate how SVs contribute to species and coloration differences. Leveraging this pangenome, we mapped short-read data from 127 individuals, genotyped variants identified in the pangenome graph, and subsequently performed FST scans and genome-wide association studies. Species divergence primarily arises from SNPs and indels (< 50 bp) in non-coding regions of melanin-related genes, as larger SVs rarely overlap with divergence peaks. One exception was a 55 bp deletion near the OCA2 and HERC2 genes, associated with feather pheomelanin content. These findings support the hypothesis that the reshuffling of small regulatory alleles, rather than larger species-specific mutations, accelerated plumage evolution leading to prezygotic isolation in Capuchinos. 
    more » « less
  2. ABSTRACT Comprehensively identifying the loci shaping trait variation has been challenging, in part because standard approaches often miss many types of genetic variants. Structural variants (SVs), especially transposable elements (TEs), are likely to affect phenotypic variation but we lack methods that can detect polymorphic SVs and TEs using short‐read sequencing data. Here, we used a whole genome alignment between two maize genotypes to identify polymorphic SVs and then genotyped a large maize diversity panel for these variants using short‐read sequencing data. After characterising SV variation in the panel, we identified SV polymorphisms that are associated with life history traits and genotype‐by‐environment (GxE) interactions. While most of the SVs associated with traits contained TEs, only two of the SVs had boundaries that clearly matched TE breakpoints indicative of a TE insertion, while the other polymorphisms were likely caused by deletions. One of the SVs that appeared to be caused by a TE insertion had the most associations with gene expression compared to other trait‐associated SVs. All of the SVs associated with traits were in linkage disequilibrium with nearby single nucleotide polymorphisms (SNPs), suggesting that the approach used here did not identify unique associations that would have been missed in a SNP association study. Overall, we have (1) created a technique to genotype SV polymorphisms across a large diversity panel using support from genomic short‐read sequencing alignments and (2) connected this presence/absence SV variation to diverse traits and GxE interactions. 
    more » « less
  3. Despite insertions and deletions being the most common structural variants (SVs) found across genomes, not much is known about how much these SVs vary within populations and between closely related species, nor their significance in evolution. To address these questions, we characterized the evolution of indel SVs using genome assemblies of three closely related Heliconius butterfly species. Over the relatively short evolutionary timescales investigated, up to 18.0% of the genome was composed of indels between two haplotypes of an individual Heliconius charithonia butterfly and up to 62.7% included lineage-specific SVs between the genomes of the most distant species (11 Mya). Lineage-specific sequences were mostly characterized as transposable elements (TEs) inserted at random throughout the genome and their overall distribution was similarly affected by linked selection as single nucleotide substitutions. Using chromatin accessibility profiles (i.e., ATAC-seq) of head tissue in caterpillars to identify sequences with potential cis -regulatory function, we found that out of the 31,066 identified differences in chromatin accessibility between species, 30.4% were within lineage-specific SVs and 9.4% were characterized as TE insertions. These TE insertions were localized closer to gene transcription start sites than expected at random and were enriched for sites with significant resemblance to several transcription factor binding sites with known function in neuron development in Drosophila . We also identified 24 TE insertions with head-specific chromatin accessibility. Our results show high rates of structural genome evolution that were previously overlooked in comparative genomic studies and suggest a high potential for structural variation to serve as raw material for adaptive evolution. 
    more » « less
  4. Abstract Genomic clusters of immune genes, including those encoding nucleotide-binding leucine-rich repeat (NLR) proteins, are a model for exploring the dynamics of genomic regions in flux. Rapid sequence evolution of immune genes, including NLRs, and variation in their gene content, may enable long-lived plants, which lack adaptive immune systems, to keep pace with the fast evolution of pathogens. To explore the patterns and processes shaping the evolution of NLR gene content in a genus of long-lived tree species, we unified the annotation of NLR genes across 11 accessions (or 15 haplotypes) from the genusCitrusand its relatives, including three new diploid genome assemblies. A majority of NLRs were arranged in genomic clusters composed of paralogous genes, typically from a single gene family. Even larger clusters, with 10 or more NLRs, were limited to genes derived from one or few gene families. These patterns suggested that genomic clustering of NLRs arose through local expansion of phylogenetically related NLRs, but the mechanistic processes driving these patterns are not clear. Local gene duplication can be mediated by multiple processes, including transposon-mediated gene capture and subsequent proliferation, and non-allelic repair of double stranded breaks, including unequal recombination. Examples of retrotransposon-mediated duplication of NLRs were identified, but these were not sufficient to explain massive regional expansions. Signatures of unequal recombination are challenging to identify. Focusing on recent lineage-specific sequence duplications, at least one case of unequal recombination was identified, supporting a role for unequal recombination in shaping genomic variation in these regions. 
    more » « less
  5. Large genomic insertions and deletions are a potent source of functional variation, but are challenging to resolve with short-read sequencing, limiting knowledge of the role of such structural variants (SVs) in human evolution. Here, we used a graph-based method to genotype long-read-discovered SVs in short-read data from diverse human genomes. We then applied an admixture-aware method to identify 220 SVs exhibiting extreme patterns of frequency differentiation – a signature of local adaptation. The top two variants traced to the immunoglobulin heavy chain locus, tagging a haplotype that swept to near fixation in certain southeast Asian populations, but is rare in other global populations. Further investigation revealed evidence that the haplotype traces to gene flow from Neanderthals, corroborating the role of immune-related genes as prominent targets of adaptive introgression. Our study demonstrates how recent technical advances can help resolve signatures of key evolutionary events that remained obscured within technically challenging regions of the genome. 
    more » « less