skip to main content

Title: Genomic structural variants constrain and facilitate adaptation in natural populations of Theobroma cacao , the chocolate tree

Genomic structural variants (SVs) can play important roles in adaptation and speciation. Yet the overall fitness effects of SVs are poorly understood, partly because accurate population-level identification of SVs requires multiple high-quality genome assemblies. Here, we use 31 chromosome-scale, haplotype-resolved genome assemblies ofTheobroma cacao—an outcrossing, long-lived tree species that is the source of chocolate—to investigate the fitness consequences of SVs in natural populations. Among the 31 accessions, we find over 160,000 SVs, which together cover eight times more of the genome than single-nucleotide polymorphisms and short indels (125 versus 15 Mb). Our results indicate that a vast majority of these SVs are deleterious: they segregate at low frequencies and are depleted from functional regions of the genome. We show that SVs influence gene expression, which likely impairs gene function and contributes to the detrimental effects of SVs. We also provide empirical support for a theoretical prediction that SVs, particularly inversions, increase genetic load through the accumulation of deleterious nucleotide variants as a result of suppressed recombination. Despite the overall detrimental effects, we identify individual SVs bearing signatures of local adaptation, several of which are associated with genes differentially expressed between populations. Genes involved in pathogen resistance are strongly enriched among these candidates, highlighting the contribution of SVs to this important local adaptation trait. Beyond revealing empirical evidence for the evolutionary importance of SVs, these 31 de novo assemblies provide a valuable resource for genetic and breeding studies inT.cacao.

more » « less
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Page Range / eLocation ID:
Article No. e2102914118
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Structural variants (SVs) are a major source of genetic variation; and descriptions in natural populations and connections with phenotypic traits are beginning to accumulate in the literature. We integrated advances in genomic sequencing and animal tracking to begin filling this knowledge gap in the Eurasian blackcap. Specifically, we (a) characterized the genome-wide distribution, frequency, and overall fitness effects of SVs using haplotype-resolved assemblies for 79 birds, and (b) used these SVs to study the genetics of seasonal migration. We detected >15 K SVs. Many SVs overlapped repetitive regions and exhibited evidence of purifying selection suggesting they have overall deleterious effects on fitness. We used estimates of genomic differentiation to identify SVs exhibiting evidence of selection in blackcaps with different migratory strategies. Insertions and deletions dominated the SVs we identified and were associated with genes that are either directly (e.g., regulatory motifs that maintain circadian rhythms) or indirectly (e.g., through immune response) related to migration. We also broke migration down into individual traits (direction, distance, and timing) using existing tracking data and tested if genetic variation at the SVs we identified could account for phenotypic variation at these traits. This was only the case for 1 trait—direction—and 1 specific SV (a deletion on chromosome 27) accounted for much of this variation. Our results highlight the evolutionary importance of SVs in natural populations and provide insight into the genetic basis of seasonal migration.

    more » « less
  2. Purugganan, Michael (Ed.)
    Abstract Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type—which included inversions, duplications, deletions, translocations, and mobile element insertions—was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest. 
    more » « less
  3. Abstract

    Introduced and invasive species make excellent natural experiments for investigating rapid evolution. Here, we describe the effects of genetic drift and rapid genetic adaptation in pink salmon (Oncorhynchus gorbuscha) that were accidentally introduced to the Great Lakes via a single introduction event 31 generations ago. Using whole‐genome resequencing for 134 fish spanning five sample groups across the native and introduced range, we estimate that the source population's effective population size was 146,886 at the time of introduction, whereas the founding population's effective population size was just 72—a 2040‐fold decrease. As expected with a severe founder event, we show reductions in genome‐wide measures of genetic diversity, specifically a 37.7% reduction in the number of SNPs and an 8.2% reduction in observed heterozygosity. Despite this decline in genetic diversity, we provide evidence for putative selection at 47 loci across multiple chromosomes in the introduced populations, including missense variants in genes associated with circadian rhythm, immunological response and maturation, which match expected or known phenotypic changes in the Great Lakes. For one of these genes, we use a species‐specific agent‐based model to rule out genetic drift and conclude our results support a strong response to selection occurring in a period gene (per2) that plays a predominant role in determining an organism's daily clock, matching large day length differences experienced by introduced salmon during important phenological periods. Together, these results inform how populations might evolve rapidly to new environments, even with a small pool of standing genetic variation.

    more » « less
  4. Abstract

    Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long‐read (Oxford nanopore) whole‐genome sequencing and a hybrid zone between twoLycaeidesbutterfly taxa (L.melissaand Jackson HoleLycaeides) to comprehensively evaluate genome‐wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry‐informative SVs exhibiting genomic clines that deviated from null expectations based on genome‐average ancestry. Overall, hybrids exhibited a directional shift towards Jackson HoleLycaeidesancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average than SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess ancestry from Jackson HoleLycaeides. Excess Jackson HoleLycaeidesancestry in hybrids was also especially pronounced for Z‐linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might disproportionately affect hybrid fitness and thus contribute to reproductive isolation.

    more » « less
  5. Abstract

    Viruses persist in nature owing to their extreme genetic heterogeneity and large –population sizes, which enable them to evade host immune defenses, escape anti-viral drugs, and adapt to new hosts. The persistence of viruses is challenging to study because mutations affect multiple virus genes, interactions among genes in their impacts on virus growth are seldom known, and measures of viral fitness have yet to be standardized. To address these challenges, we employed a data-driven computational model of cell infection by a virus. The infection model accounted for the kinetics of viral gene expression, functional gene-gene interactions, genome replication, and allocation of host cellular resources to produce progeny of vesicular stomatitis virus (VSV), a prototype RNA virus. We used this model to computationally probe how interactions among genes carrying up to 11 deleterious mutations affect different measures of virus fitness: single-cycle growth yields and multi-cycle rates of infection spread. Individual mutations were implemented by perturbing biophysical parameters associated with individual gene functions of the wild-type model. Our analysis revealed synergistic epistasis among deleterious mutations in their effects on virus yield; so adverse effects of single deleterious mutations were amplified by interaction. For the same mutations, multi-cycle infection spread indicated weak or negligible epistasis, where single mutations act alone in their effects on infection spread. These results were robust to simulation under high and low host resource environments. Our work highlights how different types and magnitudes of epistasis can arise for genetically identical virus variants, depending on the fitness measure. More broadly, gene-gene interactions can differently affect how viruses grow and spread.

    more » « less