Summary White oak (Quercus alba) is an abundant forest tree species across eastern North America that is ecologically, culturally, and economically important.We report the first haplotype‐resolved chromosome‐scale genome assembly ofQ. albaand conduct comparative analyses of genome structure and gene content against other published Fagaceae genomes. We investigate the genetic diversity of this widespread species and the phylogenetic relationships among oaks using whole genome data.Despite strongly conserved chromosome synteny and genome size acrossQuercus, certain gene families have undergone rapid changes in size, including defense genes. Unbiased annotation of resistance (R) genes across oaks revealed that the overall number of R genes is similar across species – as are the chromosomal locations of R gene clusters – but, gene number within clusters is more labile. We found thatQ. albahas high genetic diversity, much of which predates its divergence from other oaks and likely impacts divergence time estimations. Our phylogenetic results highlight widespread phylogenetic discordance across the genus.The white oak genome represents a major new resource for studying genome diversity and evolution inQuercus. Additionally, we show that unbiased gene annotation is key to accurately assessing R gene evolution inQuercus.
more »
« less
This content will become publicly available on September 28, 2026
Structure and sequence evolution in the pennycress ( Thlaspi arvense ) pangenome
Summary Eukaryotic genomes harbor many forms of variation, including nucleotide diversity and structural polymorphisms, which experience natural selection and contribute to genome evolution and biodiversity. However, harnessing this variation for agriculture hinges on our ability to detect, quantify, catalog, and utilize genetic diversity.Here, we explore seven complete genomes of the emerging biofuel crop pennycress (Thlaspi arvense) drawn from across the species’s current genetic diversity to catalogue variation in genome structure and content.Across this new pangenome resource, we find contrasting evolutionary modes in different genomic regions. Gene-poor, repeat-rich pericentromeric regions experience frequent rearrangements, including repeated centromere repositioning. In contrast, conserved gene-dense chromosome arms maintain large-scale synteny across accessions, even in fast-evolving immune genes where microsynteny breaks down across species but the macrosynteny of gene cluster positioning is maintained.Our findings highlight that multiple elements of the genome experience dynamic evolution that conserves functional content on the chromosome scale but allows rearrangement and presence-absence variation on a local scale. This diversity is invisible to classical reference-based approaches and highlights the strength and utility of pangenomic resources. These results provide a valuable case study of rapid genomic structural evolution within a species and powerful resources for crop development in an emerging biofuel crop.
more »
« less
- Award ID(s):
- 1906486
- PAR ID:
- 10651096
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- University of California, Davis
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Birchler, James (Ed.)Abstract Ancient whole-genome duplications (WGDs) are believed to facilitate novelty and adaptation by providing the raw fuel for new genes. However, it is unclear how recent WGDs may contribute to evolvability within recent polyploids. Hybridization accompanying some WGDs may combine divergent gene content among diploid species. Some theory and evidence suggest that polyploids have a greater accumulation and tolerance of gene presence-absence and genomic structural variation, but it is unclear to what extent either is true. To test how recent polyploidy may influence pangenomic variation, we sequenced, assembled, and annotated twelve complete, chromosome-scale genomes of Camelina sativa, an allohexaploid biofuel crop with three distinct subgenomes. Using pangenomic comparative analyses, we characterized gene presence-absence and genomic structural variation both within and between the subgenomes. We found over 75% of ortholog gene clusters are core in Camelina sativa and <10% of sequence space was affected by genomic structural rearrangements. In contrast, 19% of gene clusters were unique to one subgenome, and the majority of these were Camelina-specific (no ortholog in Arabidopsis). We identified an inversion that may contribute to vernalization requirements in winter-type Camelina, and an enrichment of Camelina-specific genes with enzymatic processes related to seed oil quality and Camelina’s unique glucosinolate profile. Genes related to these traits exhibited little presence-absence variation. Our results reveal minimal pangenomic variation in this species, and instead show how hybridization accompanied by WGD may benefit polyploids by merging diverged gene content of different species.more » « less
-
ABSTRACT Yellow monkeyflowers (Mimulus guttatuscomplex, Phrymaceae) are a powerful system for studying ecological adaptation, reproductive variation, and genome evolution. To initiate pan‐genomics in this group, we present four chromosome‐scale assemblies and annotations of accessions spanning a broad evolutionary spectrum: two from a singleM. guttatuspopulation, one from the closely related selfing speciesM. nasutus, and one from a more divergent speciesM. tilingii. All assemblies are highly complete and resolve centromeric and repetitive regions. Comparative analyses reveal such extensive structural variation in repeat‐rich, gene‐poor regions that large portions of the genome are unalignable across accessions. As a result, thisMimuluspan‐genome is primarily informative in genic regions, underscoring limitations of resequencing approaches in such polymorphic taxa. We document gene presence–absence, investigate the recombination landscape using high‐resolution linkage data, and quantify nucleotide diversity. Surprisingly, pairwise differences at fourfold synonymous sites are exceptionally high—even in regions of very low recombination—reaching ~3.2% within a singleM. guttatuspopulation, ~7% within the interfertileM. guttatusspecies complex (approximately equal to SNP divergence between great apes and Old World monkeys), and ~7.4% between that complex and the reproductively isolatedM. tilingii. Genome‐wide patterns of nucleotide variation show little evidence of linked selection, and instead suggest that the concentration of genes (and likely selected sites) in high‐recombination regions may buffer diversity loss. These assemblies, annotations, and comparative analyses provide a robust genomic foundation forMimulusresearch and offer new insights into the interplay of recombination, structural variation, and molecular evolution in highly diverse plant genomes.more » « less
-
Wittkopp, Patricia (Ed.)Abstract Recent pangenome studies have revealed a large fraction of the gene content within a species exhibits presence-absence variation (PAV). However, coding regions alone provide an incomplete assessment of functional genomic sequence variation at the species level. Little to no attention has been paid to noncoding regulatory regions in pangenome studies, though these sequences directly modulate gene expression and phenotype. To uncover regulatory genetic variation, we generated chromosome-scale genome assemblies for thirty Arabidopsis thaliana accessions from multiple distinct habitats and characterized species level variation in Conserved Noncoding Sequences (CNS). Our analyses uncovered not only PAV and positional variation (PosV) but that diversity in CNS is non-random, with variants shared across different accessions. Using evolutionary analyses and chromatin accessibility data, we provide further evidence supporting roles for conserved and variable CNS in gene regulation. Additionally, our data suggests transposable elements contribute to CNS variation. Characterizing species-level diversity in all functional genomic sequences may later uncover previously unknown mechanistic links between genotype and phenotype.more » « less
-
Abstract Effective utilization of wild relatives is key to overcoming challenges in genetic improvement of cultivated tomato, which has a narrow genetic basis; however, current efforts to decipher high-quality genomes for tomato wild species are insufficient. Here, we report chromosome-scale tomato genomes from nine wild species and two cultivated accessions, representative of Solanum section Lycopersicon , the tomato clade. Together with two previously released genomes, we elucidate the phylogeny of Lycopersicon and construct a section-wide gene repertoire. We reveal the landscape of structural variants and provide entry to the genomic diversity among tomato wild relatives, enabling the discovery of a wild tomato gene with the potential to increase yields of modern cultivated tomatoes. Construction of a graph-based genome enables structural-variant-based genome-wide association studies, identifying numerous signals associated with tomato flavor-related traits and fruit metabolites. The tomato super-pangenome resources will expedite biological studies and breeding of this globally important crop.more » « less
An official website of the United States government
