<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Introgression, Phylogeography, and Genomic Species Cohesion in the Eastern North American White Oak Syngameon</title></titleStmt>
			<publicationStmt>
				<publisher>Molecular Ecology</publisher>
				<date>06/09/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10639156</idno>
					<idno type="doi">10.1111/mec.17822</idno>
					<title level='j'>Molecular Ecology</title>
<idno>0962-1083</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Gabe Ribicoff</author><author>Mira Garner</author><author>Kasey Pham</author><author>Kieran N Althaus</author><author>Jeannine Cavender‐Bares</author><author>Andrew A Crowl</author><author>Samantha Gray</author><author>Paul Gugger</author><author>Marlene Hahn</author><author>Shuai Liao</author><author>Paul S Manos</author><author>Rebekah A Mohn</author><author>Ian S Pearse</author><author>Nicholas R Steichmann</author><author>Ashley L Tuffin</author><author>Alan T Whittemore</author><author>Andrew L Hipp</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>ABSTRACT</title> <p>Hybridization and interspecific gene flow play a substantial role in the evolution of plant taxa. The eastern North American white oak syngameon, a group of approximately 15 ecologically, morphologically and genomically distinguishable species, has long been recognised as a model system for studying introgressive hybridization in temperate trees. However, the prevalence, genomic context and environmental correlates of introgression in this system remain largely unknown. To assess introgression in the eastern North American white oak syngameon and population structure within the widespread<styled-content style='fixed-case'><italic>Quercus macrocarpa</italic></styled-content>, we conducted a rangewide survey of<styled-content style='fixed-case'><italic>Q.macrocarpa</italic></styled-content>and four sympatric eastern North American white oak species. Using a Hyb‐Seq approach, we assembled a dataset of 3412 thinned single‐nucleotide polymorphisms (SNPs) in 445 enriched target loci including 62 genes putatively associated with various ecological functions, as well as associated intronic regions and some off‐target intergenic regions (not associated with the exons). Admixture analysis and hybrid class inference demonstrated species coherence despite hybridization and introgressive gene flow (due to backcrossing of F1s to one or both parents). Additionally, we recovered a genetic structure within<styled-content style='fixed-case'><italic>Q.macrocarpa</italic></styled-content>associated with latitude. Generalised linear mixed models (GLMMs) indicate that proximity to range edge predicts interspecific admixture, but rates of genetic differentiation do not appear to vary between putative functional gene classes. Our study suggests that gene flow between eastern North American white oak species may not be as rampant as previously assumed and that hybridization is most strongly predicted by proximity to a species' range margin.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>putative functional gene classes. Our study suggests that gene flow between eastern North American white oak species may not be as rampant as previously assumed and that hybridization is most strongly predicted by proximity to a species' range margin.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">| Introduction</head><p>Interspecific gene flow is common and evolutionarily significant across the tree of life, with up to 10% of animal species and 25% of plant species estimated to engage in hybridization <ref type="bibr">(Mallet 2005)</ref>. Botanists in particular have long recognised the evolutionary importance of hybridization and subsequent introgressive gene flow due to backcrossing <ref type="bibr">(Anderson 1953;</ref><ref type="bibr">Barton 2001;</ref><ref type="bibr">Grant 1981;</ref><ref type="bibr">Stebbins et al. 1947</ref>). Hybridization has the potential to generate novel genotypic combinations more rapidly than mutation <ref type="bibr">(Anderson 1949, 62, 102;</ref><ref type="bibr">De Carvalho et al. 2010;</ref><ref type="bibr">Gompert et al. 2014)</ref>, particularly within syngameons, multispecies networks of partially interfertile species <ref type="bibr">(Buck and Flores-Renter&#237;a 2022;</ref><ref type="bibr">Cannon and Lerdau 2022;</ref><ref type="bibr">Grant 1981;</ref><ref type="bibr">Lotsy 1917)</ref>. Syngameons have been documented in reef-building corals <ref type="bibr">(Mao 2020)</ref>, African lake cichlids <ref type="bibr">(Olave and Meyer 2020;</ref><ref type="bibr">Schliewen and Klee 2004)</ref>, Heliconius butterflies <ref type="bibr">(Mallet et al. 2007</ref>) and numerous plant clades <ref type="bibr">(Buck and Flores-Renter&#237;a 2022;</ref><ref type="bibr">Ellstrand et al. 1996)</ref>, including many distantly related and ecologically dominant boreal, temperate and tropical trees (e.g., <ref type="bibr">Buck et al. 2023;</ref><ref type="bibr">Chhatre et al. 2018;</ref><ref type="bibr">Gardner et al. 2023;</ref><ref type="bibr">Larson et al. 2021;</ref><ref type="bibr">Linan et al. 2022;</ref><ref type="bibr">Percy et al. 2014;</ref><ref type="bibr">Sun et al. 2018;</ref><ref type="bibr">Tsuda et al. 2017)</ref>. Understanding landscape, climate, habitat, and demographic contexts of interspecific gene flow is key to understanding the evolutionary importance of these complex, multispecies systems.</p><p>The oak genus (Quercus L.) comprises one of the best-known syngameons <ref type="bibr">(Kremer and Hipp 2020;</ref><ref type="bibr">Lazic et al. 2021)</ref>. This northern hemisphere clade of ca. 425 species, mostly trees, has served as a model of plant hybridization for at least 150 years <ref type="bibr">(Engelmann 1878;</ref><ref type="bibr">MacDougal 1907;</ref><ref type="bibr">Palmer 1948;</ref><ref type="bibr">Wiegand 1935)</ref>, syngameon dynamics for more than 75 <ref type="bibr">(Dodd and Afzal-Rafii 2004;</ref><ref type="bibr">Muller 1952;</ref><ref type="bibr">Stebbins et al. 1947;</ref><ref type="bibr">Tucker 1961)</ref> and genomic patterns of adaptive introgression for the past several years <ref type="bibr">(Fu et al. 2022;</ref><ref type="bibr">Leroy et al. 2020;</ref><ref type="bibr">O'Donnell et al. 2021;</ref><ref type="bibr">Zhou, Yuan, et al. 2022)</ref>. The eastern North American white oaks (Quercus sect. Quercus) have been particularly important in shaping botanists' understanding of introgressive gene flow and species concepts <ref type="bibr">(Burger 1975;</ref><ref type="bibr">Coyne and Orr 2004;</ref><ref type="bibr">Van Valen 1976)</ref>. Nearly all eastern North American white oak species can hybridize, but they still remain ecologically, morphologically and genomically distinct <ref type="bibr">(Hardin 1975;</ref><ref type="bibr">Hipp et al. 2019)</ref>.</p><p>Bur oak (Quercus macrocarpa Michx.) has the longest latitudinal range of any eastern North American white oak, ranging from north of Winnipeg, Manitoba to south of Houston, Texas, and from Nova Scotia west to Montana (Figure <ref type="figure">1</ref>). Bur oak is a dominant of dry, fire-prone upper Midwestern oak savannas; mesic, closed-canopy forests around the Great Lakes and Northeast; and bottomland forests along the floodplains of the southcentral United States <ref type="bibr">(Johnson 1990)</ref>. In most of these habitats, bur oak grows in close association with at least one other eastern North American white oak species. Bur oak is sympatric with more than 12 white oak species across its range, with hybridization described between it and at least 8 naturally co-occurring species <ref type="bibr">(Palmer 1948)</ref>. Prior population genetic studies of bur oak have demonstrated high stand-level diversity and weak isolation-by-distance, both at smaller scales among fragmented populations <ref type="bibr">(Craft and Ashley 2007;</ref><ref type="bibr">Dow and Ashley 1998)</ref> and at larger scales among rangewide samples <ref type="bibr">(Hipp et al. 2019;</ref><ref type="bibr">Schnabel and Hamrick 1990;</ref><ref type="bibr">Whittemore and Schaal 1991)</ref>. However, previous studies have relied upon a limited number of loci to assess nuclear patterns of population structure in the species, and none have included sufficient sampling of other species to estimate introgression across the range of the species.</p><p>To evaluate patterns and determinants of interspecific gene flow in the eastern North American white oak syngameon, we sampled bur oak and the four most prevalent white oak species sympatric with it: eastern white oak (Q. alba L.), a dry-mesic forest species that ranges from near the Canada-U.S. border to northern Florida and Texas; swamp white oak (Q. bicolor Willd.), a wet forest species mostly restricted to north of 37&#176;; post oak (Q. stellata Wangenh.), common in drier locales south of approximately 40&#176;; and chinkapin oak (Q. muehlenbergii Engelm.), an upland limestone specialist mostly contained within the range of Q. alba, but extending a bit further into eastern Kansas, Oklahoma and Texas (Figures <ref type="figure">1</ref> and <ref type="figure">2</ref>). We sequenced 465 target genes in 517 individuals across 56 sites in eastern North America. We addressed the following questions in our study: (1) How prevalent is interspecific gene flow in the syngameon, and what are the spatial and ecological drivers of introgression? (2) How do levels of introgression and/or genetic differentiation vary over candidate gene classes potentially implicated in environmental adaptation? (3) How is genetic variation geographically partitioned in Q. macrocarpa, and does genetic variation in the species follow the ecological gradient observed from predominantly upland sites in the north to increasingly bottomland sites in the south?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">| Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">| Sample Collection</head><p>Sampling was designed to cover the majority of the Q. macrocarpa range and include co-occurring white oaks at as many sites as possible. A focused field sampling of wild populations was conducted in the summer of 2017; these wild populations were supplemented with botanical garden material from Starhill Forest Arboretum (Petersburg, IL, USA) grown from acorns of known wild provenance (thus representing wild populations our group did not visit in the field) (Figure <ref type="figure">1</ref>). At each wild population, leaf samples were collected from up to three individuals of Q. macrocarpa and of each co-occurring white oak species found at the site (Figure <ref type="figure">2</ref>). Conspecific samples were taken at least 30 m apart from each other in wild populations, when possible, to avoid genotyping close relatives. Botanical garden material was selected based on wild provenance, so spatial distance within gardens was disregarded. All trees collected from wild populations were georeferenced in the field using a hand-held GPS unit. Botanical garden samples from acorns of known wild provenance could not be georeferenced precisely and were consequently excluded from spatial analyses (Table <ref type="table">S1</ref>).</p><p>Morphological intermediates with potential hybrid ancestry were neither intentionally targeted nor avoided in the course of sampling; clearly intermediate individuals were determined to the closest morphospecies. Species determinations made in the field were checked and, if necessary, updated by examining leaf vestiture, gross leaf morphology, and twig and bud characters according to descriptions provided in the Flora of North America <ref type="bibr">(Nixon and Muller 1997)</ref>. After harvesting, leaf specimens were placed on ice in the field, then frozen at -80&#176;C within 1 day to 1 week of collecting for later genomic analysis. Voucher samples were collected for all individuals and deposited at the Morton Arboretum Herbarium (MOR). Duplicate vouchers were deposited at the Bell Museum at the University of Minnesota (MIN) (herbarium acronyms from Thiers, Updated continuously [accessed 2025-02-02] n.d.).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">| Sequence Capture Design and DNA Preparation</head><p>We selected target loci and designed sequence capture exon baits based on gene annotations of coding domain sequences in the Q. robur <ref type="bibr">(Plomion et al. 2016</ref><ref type="bibr">(Plomion et al. , 2018) )</ref> and Q. lobata <ref type="bibr">(Sork, Fitz-Gibbon, et al. 2016</ref>) draft assemblies, as well as leaf transcriptomes of Q. robur, Q. alba and Q. rubra <ref type="bibr">(Lesur et al. 2015</ref>) (WO454_v2, Hardwood Genomics Project, <ref type="url">https:// doi</ref>. org/ 10. 25504/ FAIRs haring. srgkaf; RO454_v2, Hardwood Genomics Project, <ref type="url">https:// doi</ref>. org/ 10. 25504/ FAIRs haring. srgkaf). We used MarkerMiner v1.0 <ref type="bibr">(Chamala et al. 2015)</ref> to identify 403 longexon, single-copy nuclear genes shared across Quercus references that were expected to provide high-quality phylogenetic resolution. 30 candidate genes for bud-break phenology and waterlogging response, 30 for drought tolerance and 2 for freezing tolerance were included as targets <ref type="bibr">(Lesur et al. 2015;</ref><ref type="bibr">Meireles et al. 2017;</ref><ref type="bibr">Oney-Birol et al. 2018;</ref><ref type="bibr">Ueno et al. 2013)</ref>. Target loci are provided as a fasta file in Appendix S1 (Table <ref type="table">S2</ref>).</p><p>DNA was extracted from frozen leaf tissue using a DNeasy Plant Mini Kit (Qiagen) with a modified DNeasy protocol as described in <ref type="bibr">Hipp and Weber (2008)</ref>. DNA was quantified using a Qubit Fluorometer and visualised on an agarose gel to confirm DNA quality. DNA was fragmented to approximately 550 bp using a Covaris M220 (Covaris, Woburn, MA, USA) or NEBNExt dsDNA fragmentase (New England Biolabs, Ipswich, MA, USA). Library preparation used the Adapterama iTru system and KAPA Hyper Prep Kit (Roche Diagnostics Corporation, Wilmington, MA, USA), with KAPA Pure Beads, KAPA HyperPure Beads, or Speedbeads <ref type="bibr">(Gardner et al. 2021;</ref><ref type="bibr">Glenn et al. 2016;</ref><ref type="bibr">Hale et al. 2020;</ref><ref type="bibr">Rohland and Reich 2012)</ref>. Pools of 12-16 libraries were enriched using our 465-gene custom Quercus MYbaits kit <ref type="bibr">(Morales-Salda&#241;a et al. 2024</ref>) and either the Mybaits v3 or v4 reagent kit (Arbor Biosciences, Ann Arbor, MI, USA). Hybridization was done at 65&#176;C for 16 h with &#189; volume of baits used per enrichment. QIAquick PCR purification kit (Qiagen) was used to purify PCR-amplified enriched libraries. Pooled libraries were quantified using a Qubit Fluorometer and run on a Bioanalyzer (Agilent, Santa Clara, CA, USA) to ascertain fragment size distributions before these were combined for sequencing. Samples were subsequently sequenced using the Illumina MiSeq (2 &#215; 300 bp) or HiSeq 4000 (2 &#215; 150 bp) platforms. Two-sample t-tests with sequencing platform as the predictor of latitude were used on all samples as well as Q. macrocarpa samples to test whether there was a latitudinal bias in the sequencing platform used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">| Bioinformatics Workflow</head><p>Raw FASTQ files were cleaned with Trimmomatic 0.39 <ref type="bibr">(Bolger et al. 2014)</ref> to remove ILLUMINA adapters, leading and trailing bases with Phred score &lt; 15, portions of reads with an average Phred score &lt; 20 across a 4 bp sliding window, and cleaned reads shorter than 40 bp. Subsequently, reads were mapped to the Q. lobata genome v3.2 <ref type="bibr">(Sork et al. 2022</ref>) using BWA-MEM 0.7.17 <ref type="bibr">(Li and Durbin 2009)</ref>. Duplicate reads were marked using Samtools 1.17 <ref type="bibr">(Danecek et al. 2021)</ref>. Variants were called with FreeBayes 1.3.6 (Garrison and Marth 2012) using default settings. After decomposing multinucleotide polymorphisms and retaining only biallelic SNPs, we implemented an iterative filtering pipeline adapted from O'Leary et al. <ref type="bibr">(2018, pipeline FS5)</ref>. Site genotyping rates and missing data per individual filters were repeatedly applied within the same pipeline according to progressively more stringent thresholds, removing low-quality variants while maximising the number of retained samples and SNPs.</p><p>Filtering was performed through a custom pipeline of vcflib 1.0.3 <ref type="bibr">(Garrison et al. 2022</ref>), VCFtools 0.1.16 <ref type="bibr">(Danecek et al. 2011)</ref> and BCFtools <ref type="bibr">(Danecek et al. 2021)</ref>; see details in methods in Appendix S1. After filtering, SNPs were thinned to a minimum distance of 500 bp using PLINK 1.9 <ref type="bibr">(Purcell et al. 2007)</ref>, as prior studies in oaks have demonstrated rapid decay of linkage disequilibrium, both within and across individual genes <ref type="bibr">(Kremer et al. 2012;</ref><ref type="bibr">Nocchi et al. 2022;</ref><ref type="bibr">Sork, Squire, et al. 2016)</ref>. Scripts to conduct variant-calling and SNP-filtering are archived on Zenodo (v 0.9-3, <ref type="url">https:// doi</ref>. org/ 10. 5281/ zenodo. 15216009). Sequence reads are archived on NCBI (SRA Bioproject #1223965; <ref type="url">http:// www</ref>. ncbi. nlm. nih. gov/ biopr oject/ 1223965).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">| Admixture and Spatial Genetic Analysis</head><p>Ancestry estimation was performed using the maximumlikelihood software ADMIXTURE <ref type="bibr">(Alexander et al. 2009)</ref>, which jointly optimises the allele frequencies of a predefined number of ancestral clusters (K) and the proportional membership ("Q-value") of each individual in each cluster based on SNP genotype data. Fifty replicate runs were carried out for each value of K from K = 2 to K = 8, and the run with the highest log-likelihood per K was selected for visualisation and downstream analysis, as performed by <ref type="bibr">Marcus et al. (2021)</ref>. Evaluation of the optimal value of K was performed using the default ADMIXTURE cross-validation procedure enabled by the "--cv" flag. We visualised ADMIXTURE ancestry estimation and cross-validation results with ggplot2 <ref type="bibr">(Wickham 2016)</ref> in R version 4.3.0 (R Core Team 2023). We also ran ADMIXTURE on downsampled datasets with 20 randomly sampled individuals per species and only putatively "pure" Q. macrocarpa individuals (Q &gt; 0.95) to assess the robustness of our dataset to detect population structure within this species.</p><p>Ancestral clusters in the K = 5 and K = 6 runs corresponded almost exactly to the 5 morphospecies included in our study, with the exception of Q. macrocarpa, which was separated into two admixed subpopulations at K = 6. For subsequent analyses using ADMIXTURE Q-values, we defined the conspecific Q-value for non-Q. macrocarpa individuals as the maximum ancestry proportion in the K = 6 cluster, which almost invariably matched a sample's morphospecies determination. For samples identified as Q. macrocarpa, the conspecific Q-value was calculated as the sum of the Q-values for the two Q. macrocarpa clusters estimated in the K = 6 run. Heterospecific Q-values were calculated as 1-conspecific Q-value.</p><p>We tested for spatial autocorrelation of the northern Q. macrocarpa ancestral cluster (hereafter termed MAC N , in contrast to the southern genetic cluster subsequently referred to as MAC S ) using global Moran's I <ref type="bibr">(Moran 1950)</ref>. "Pure" (conspecific Qvalue &gt; 0.95) Q. macrocarpa individuals were grouped by sampling site. An inverse distance matrix for spatial weighting was calculated using each site's centroid coordinate. Pairwise geographic distances between sites were calculated using the Haversine formula, assuming a spherical globe (Sinnott 1984). Moran's I was calculated with the MAC N Q-value for each Q. macrocarpa individual using adegenet 2.1.10 and spdep 1.3_3 in R <ref type="bibr">(Bivand 2022;</ref><ref type="bibr">Jombart 2008;</ref><ref type="bibr">Jombart and Ahmed 2011)</ref>. We evaluated the significance of Moran's I via permutation test with 99,999 resamples to obtain a one-tailed p-value, corresponding to an alternative hypothesis of positive spatial autocorrelation.</p><p>To further clarify whether genetic clustering within Q. macrocarpa reflected either continuous genetic variation or genetic discontinuity, we examined spatial genetic variation using the Python package Fast Estimation of Effective Migration Surfaces (FEEMS) <ref type="bibr">(Marcus et al. 2021)</ref>. FEEMS uses deme-aggregated allele frequencies at spatial nodes to fit edge-specific effective migration estimates, inferring landscape-wide patterns of heterogeneous isolation-by-distance via a penalised likelihood approach. We subsetted the thinned SNP dataset to only precisely georeferenced Q. macrocarpa samples with conspecific Q-value &gt; 0.95 and removed any resulting invariant sites ("-maf" PLINK flag) from the PLINK binary files. Demes were constructed by aggregating individuals to their nearest node in a 100 km-resolution spatial grid. We performed leave-oneout cross-validation to determine the optimal value of the model tuning hyperparameter &#955;, evaluating 20 values of &#955; from 10 -6 (low smoothing) to 10 2 (high smoothing). All analyses were conducted using modified FEEMS tutorial scripts published by the package authors (<ref type="url">https:// github</ref>. com/ Novem brelab/ feems ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">| Hybrid Class Analysis</head><p>To more precisely characterise hybrids, we used NewHybrids, a Bayesian ancestry inference algorithm that uses patterns of observed genotype frequencies and inferred population allele frequencies to estimate the posterior probability of assignment to a predefined set of hybrid classes <ref type="bibr">(Anderson and Thompson 2002)</ref>, specifying expected genotype frequencies for multiple backcross generations (Table <ref type="table">S3</ref>). Identifying hybrids based on genotype frequency is more tractable with loci that approach fixation between populations <ref type="bibr">(Wringe et al. 2017)</ref>. For this reason, and because NewHybrids can only consider pairwise species comparisons, we selected the 200 most highly differentiated loci per species pair using the getTopLoc function in the R package HybridDetective 4.3.1 (Wringe et al. 2017) based on putatively pure individuals identified in ADMIXTURE runs (conspecific Q-value &gt; 0.95). The choice of 200 loci for analysis was based on computational limits of NewHybrids and supported by a previous finding that including more than 10 markers per linkage group (more than 120 loci in the case of oaks) provides minimally increased power to detect F1s and first-generation backcrosses <ref type="bibr">(Chakraborty and Rannala 2023)</ref>. All individuals with cumulative two-species Q-values &gt; 0.95 across pairwise parental ancestral clusters were considered for each run of NewHybrids; individuals with substantial hybrid parentage from more than two species were therefore excluded, as NewHybrids does not model 3-way hybrids. For each species pair, four independent MCMC chains were run with an uninformative Jeffreys prior for a burn-in period of 300,000 iterations prior to 600,000 sampling sweeps. Traceplots and effective sample sizes, computed across all chains per pairwise analysis for the population-wide hybrid class membership proportion parameters via the R package coda 0.19-4 <ref type="bibr">(Plummer et al. 2006)</ref>, were inspected to evaluate MCMC convergence. As a nonparametric check on the hybrid class assignment inferred by NewHybrids, we plotted ancestry proportion against interclass heterozygosity for all individuals in the Q. alba-Q. macrocarpa and Q. bicolor-Q. macrocarpa comparisons, using triangulaR v.0.0.1 <ref type="bibr">(Wiens and Colella 2024)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6">| Environmental Predictors</head><p>BIOCLIM rasters were downloaded at 30 arcsecond resolution from the WorldClim 2 database <ref type="bibr">(Fick and Hijmans 2017)</ref>. Moisture index I m was calculated as 100 &#215; MAP -PET PET <ref type="bibr">(Thornthwaite 1948)</ref>, where MAP is mean annual precipitation (BIO12) and PET is potential evapotranspiration extracted from the Global Aridity Index and Potential Evapotranspiration Database v3 <ref type="bibr">(Zomer et al. 2022</ref>). <ref type="bibr">Little's (1971)</ref> species range maps, digitised and compiled by <ref type="bibr">Prasad and Iverson (2003)</ref>, were downloaded as shapefiles (<ref type="url">https:// github</ref>. com/ andre w-hipp/ white -oak-synga meon/ tree/ master/ data/ littl e-maps; doi: 10.5281/zenodo.13150341). For all spatial analyses, coordinates were transformed to the North American Albers equal area conic projection to ensure accurate raster grid sizing. Projections and all subsequent spatial calculations, analyses and raster extractions were performed using the R packages sf v1.0-14 and terra v1.7-39 <ref type="bibr">(Hijmans 2019</ref><ref type="bibr">(Hijmans -2025;;</ref><ref type="bibr">Pebesma 2018)</ref>.</p><p>Given range-margin uncertainty and long-distance pollen dispersal <ref type="bibr">(Ashley 2021)</ref>, we extended all range limits by 20 km for analyses involving range calculations. Distance from range edge was calculated as the distance between a sample's coordinate and the nearest buffered range edge, with lake and ocean shorelines masked to avoid artificial range truncations. Number of sympatric species was calculated as the number of eastern North American white oak ranges (of the 5 species analysed in our study), aside from an individual's species determination, overlapping with the coordinates of a sample. Soil data were obtained from the USDA-NRCS Soil Survey Geographic Database (SSURGO; Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture, n.d.) accessed through the Web Soil Survey API using a custom R script.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.7">| Species Distribution Modelling (SDM) and Species Co-Occurrence</head><p>Species occurrence records were obtained from the Global Biodiversity Information Facility (GBIF) database and previously curated USDA Forest Inventory and Analysis (FIA) data (Cavender- <ref type="bibr">Bares et al. 2018</ref>) (<ref type="url">https:// gitlab</ref>. com/ meire les/ compi le_ fia_ data/ ). GBIF records were filtered to remove erroneous coordinates and data points corresponding to county or state centroids, fossils and planted specimens, urban occurrences and coordinates over bodies of water using R packages rgbif 3.7.7 and CoordinateCleaner 2.0.20 <ref type="bibr">(Chamberlain et al. 2024;</ref><ref type="bibr">Chamberlain and Boettiger 2017;</ref><ref type="bibr">Zizka et al. 2019)</ref>. FIA data as analysed in Cavender- <ref type="bibr">Bares et al. (2018)</ref> were used to summarise species co-occurrence, treating the presence of two species within any of the subplots of a single FIA plot as co-occurrence.</p><p>Preliminarily cleaned occurrence records from both sources were merged, downloaded and manually curated to remove spatial outliers in QGIS 3.28 (QGIS Development Team 2023) and thinned to a minimum distance of 20 km via spThin 0.2.0 <ref type="bibr">(Aiello-Lammens et al. 2015)</ref>. Study extent was defined by the 250 km-buffered convex hull of occurrence points for each respective species, with 20,000 unique, randomly sampled raster grid cells serving as a background dataset. We used all 19 BIOCLIM variables as environmental predictors in our SDMs, as Maxent's regularisation algorithm is robust to predictor multicollinearity <ref type="bibr">(Merow et al. 2013)</ref>. SDMs were fit using MAXENT 3.4.3 as implemented in the R package ENMeval 2.0.4 <ref type="bibr">(Kass et al. 2021;</ref><ref type="bibr">Phillips et al. 2017)</ref>. See methods in Appendix S1 for model-fitting details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.8">| Modelling Predictors of Admixture</head><p>We employed Bayesian generalised linear mixed models (GLMMs) in brms v2.20.1 <ref type="bibr">(B&#252;rkner 2017;</ref><ref type="bibr">Carpenter et al. 2017)</ref> to examine associations between environmental and ecological factors and introgression and Q. macrocarpa population structure. Beta regression with a logit link and default flat priors on all parameters was used for all models, and sampling site was incorporated as a random effect to control for spatial autocorrelation and reduce the possible risk of pseudoreplication <ref type="bibr">(Douma and Weedon 2019)</ref>. See methods in Appendix S1 for details of model-fitting and MCMC diagnostics.</p><p>To test the association of genetic differentiation in Q. macrocarpa with the gradient from more southern bottomlands to more northern uplands, we included as fixed effects climate and soil predictors expected to vary along moisture and temperature gradients, with expectations as follows: temperature seasonality (BIO4) and moisture index (I m ), based on our previous work demonstrating that both vary strongly among North American oak species <ref type="bibr">(Hipp et al. 2018)</ref>; maximum temperature of the warmest month (BIO5), as a proxy for drought severity; mean annual precipitation (BIO12), as a predictor of overall plant water requirements; and categorical soil predictors for ponding frequency, flooding frequency and drainage class, as predictors of wetland status (which we expect would differentiate Q. alba and Q. stellata as upland species most strongly from Q. bicolor, Q. lyrata and bottomland ecotypes of Q. macrocarpa). For modelling population structure within Q. macrocarpa, we analysed only individuals with conspecific Q-value &gt; 0.95, including site as a random effect. Whole-dataset admixture was modelled by regressing heterospecific Q-value against latitude, number of sympatric species, Maxent-inferred habitat suitability, and distance from range edge, with species as a fixed categorical predictor and a random effect for site. Because we were interested in understanding the drivers of gene flow between species, rather than merely first-generation hybridization, F1s identified by NewHybrids were excluded from the admixture GLMM. See methods in Appendix S1 for modelling details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.9">| Admixture at Candidate Versus Background Loci</head><p>Single-nucleotide polymorphisms were annotated to indicate which candidate genes they fell within, and genes with fewer than three SNPs were excluded from jackknife analysis. To test whether genetic differentiation varied significantly between candidate gene classes, we obtained per-class F ST distributions by jackknife resampling (10,000 replicates) of 15 genes per class (bud break phenology, drought tolerance, and other) and three SNPs per gene. Pairwise F ST estimates (calculated per Weir and Goudet 2017) for each combination of two species were computed using the R package hierfstat v. 0.5-11 <ref type="bibr">(Goudet 2005)</ref> for each jackknife replicate. We compared the means of the F ST distributions for the BBP and DT classes to that of the 'Other' genes to calculate two-tailed pvalues. See methods in Appendix S1 for details on establishing gene orthology, categorising genes, and analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">| Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">| Sequencing Results and SNP Recovery</head><p>Our post-filtering SNP dataset consisted of 3412 biallelic SNPs (Figure <ref type="figure">3</ref>, black dots) with an average SNP genotyping rate of 87% across all individuals and a mean read depth of 22.0 &#177; 12.1 SD across all SNPs, averaged over all samples. Variants were recovered from 445 of the 465 target genes (Figure <ref type="figure">3</ref>, red bars); nine of the targets initially identified in Q. robur failed to BLAST to the Q. lobata genome and 11 lacked any associated SNPs. Target loci and SNPs were distributed across all (as mapped to Q. lobata genome v3.2). Of filtered and thinned SNPs, 22% fell outside of the target gene regions. Given that Hyb-Seq can recover high-copy, off-target nuclear regions <ref type="bibr">(Weitemier et al. 2014)</ref>, and because we applied stringent filtering parameters to remove erroneous and poorly resolved paralogous calls, we chose to retain off-target SNPs for subsequent analyses. ADMIXTURE results below are those from analyses conducted with the full dataset; results using only SNPs confined to target loci produced highly similar ancestry estimates (not shown</p><p>). Of the 517 individuals sequenced, 364 (224 of 319 Q. macrocarpa [70.2%], 64 of 78 Q. alba [82.1%], 30 of 60 Q. bicolor [50.0%], 25 of 31 Q. stellata [80.6%] and 21 of 29 Q. muehlenbergii [72.4%]) passed the iterative SNP filtering pipeline (Table <ref type="table">S1</ref>). 318 individuals with associated GPS coordinates or manual georeferences were retained for downstream spatial analyses, including GLMMs (Table <ref type="table">1</ref>). Sequencing platform was unbiased with respect to sample latitude across species (t = -0.99667, df = 184.63, p = 0.3202) as well as within just Q. macrocarpa (t = -0.34215, df = 230.86, p = 0.7325).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">| Genetic Structure Across the Eastern North American White Oak Syngameon</head><p>Non-metric multidimensional scaling (NMDS) ordination of the SNP distance matrix revealed well-defined, discontinuous clusters corresponding to the 5 species sampled, with a much smaller number of admixed individuals found at varying distances between parental clusters (Figure <ref type="figure">4a</ref>,<ref type="figure">b</ref>). ADMIXTURE's cross-validation procedure for determining the optimal number of ancestral clusters (K) returned a minimum prediction error at K = 6 (0.17007) and slightly higher errors for K = 5 (0.17232) and K = 7 (0.17214) (Figure <ref type="figure">5</ref>). Cross-validation errors increased considerably for values of K smaller than 5 and larger than 7, supporting both specieslevel genomic delineations (K = 5; Figure <ref type="figure">5a</ref>) and a population genetic divide within Q. macrocarpa (which appears only at K &#8805; 6; Figure <ref type="figure">5b</ref>, Figure <ref type="figure">S1</ref>). As these genomic clusters within Q. macrocarpa are strongly geographically structured between northern and southern populations (see results below, Spatial and environmental predictors of admixture and  population structure), we refer to these clusters as MAC N and MAC S throughout the paper.</p><p>Of the 364 individuals included in the total-dataset ADMIXTURE runs, 19 (5.2%) were estimated as being F1s (17) or three-way hybrids (2) based on having their maximum Qvalue &lt; 0.60; all but one has at least 0.30 assignment to Q. macrocarpa (the exception being an inferred F1 between Q. alba and Q. muehlenbergii). The two three-way hybrids inferred include Q. alba as the dominant genotype (Q = 0.52, 0.57) and Q. macrocarpa and Q. bicolor as the only other genomic components of Q &gt; 0.05 (Q = 0.14-0.32). Some of these might also be F2, F3 or other generation hybrids, but NewHybrid analyses demonstrate that we have minimal power to distinguish among these classes. Of the remaining samples, 6 (1.6%) have a maximum Q-value &lt; 0.9, 18 (4.9%) have maximum Q-value &lt; 0.95 and 56 (15.4%) have maximum Q-value &lt; 0.99 (Table <ref type="table">S4</ref>).</p><p>The distribution of introgressed (not hybrid; i.e., mixed ancestry Site State Latitude Longitude Sample size Sakatah Lake MN 44.6243 -93.3961 3 Q. macrocarpa Twin Cities MN 44.9557 -93.1620 4 Q. macrocarpa Winona MN 43.8992 -91.6427 1 Q. alba; 1 Q. macrocarpa; 1 Q. muehlenbergii Eminence MO 37.1573 -91.3650 1 Q. alba; 3 Q. macrocarpa; 1 Q. muehlenbergii Eureka MO 38.5122 -90.5657 1 Q. alba; 1 Q. macrocarpa; 1 Q. muehlenbergii; 2 Q. stellata Ha Ha Tonka MO 37.9732 -92.7623 2 Q. alba; 1 Q. muehlenbergii; 1 Q. stellata Shaw MO 38.4639 -90.8100 3 Q. macrocarpa Sullivan MO 38.2226 -91.0855 3 Q. alba; 1 Q. bicolor; 2 Q. macrocarpa; 3 Q. stellata Bienville MS 32.3582 -89.5472 8 Q. stellata Daughmer OH 40.7299 -83.0936 3 Q. macrocarpa Goll Woods OH 41.5522 -84.3574 3 Q. alba; 2 Q. macrocarpa; 3 Q. muehlenbergii Pearl King OH 40.0440 -83.4794 2 Q. macrocarpa Hinton OK 35.4453 -98.3543 2 Q. macrocarpa; 2 Q. muehlenbergii Norman OK 35.1771 -97.4496 3 Q. macrocarpa Pawhuska OK 36.8455 -96.4232 1 Q. macrocarpa; 3 Q. stellata Tulsa OK 36.2177 -95.8987 1 Q. alba; 3 Q. macrocarpa Custer SD 43.7887 -103.3634 2 Q. macrocarpa Spearfish SD 44.4913 -103.8090 1 Q. macrocarpa Buescher TX 30.0764 -97.1817 2 Q. stellata Addison VT 43.9580 -73.1881 3 Q. macrocarpa Bennington VT 42.9102 -73.2193 3 Q. macrocarpa Burlington VT 44.4682 -73.2109 2 Q. alba; 2 Q. macrocarpa Ferrisburgh VT 44.2398 -73.2444 1 Q. alba; 3 Q. macrocarpa Grand Isle VT 44.6925 -73.3354 2 Q. macrocarpa Rutland VT 43.6341 -73.1529 1 Q. alba; 2 Q. macrocarpa Dodgeville WI 43.0187 -90.1162 3 Q. alba; 7 Q. macrocarpa Gotham WI 43.2451 -90.3034 5 Q. bicolor; 3 Q. macrocarpa Madison WI 43.0391 -89.4394 3 Q. alba; 3 Q. macrocarpa Ottawa WI 43.0168 -88.4384 3 Q. alba; 3 Q. macrocarpa Pleasant Valley WI 43.1068 -89.8063 1 Q. alba; 1 Q. macrocarpa Round Lake WI 46.0247 -91.1393 3 Q. macrocarpa Note: Sites represent the closest named locality to the population; state/province abbreviations are standard. Latitude and longitude are computed as the centroid of the coordinates for all individually georeferenced samples at a given site and rounded to 4 decimal places.</p><p>TABLE 1 | (Continued) 1365294x, 0, Downloaded from <ref type="url">https://onlinelibrary.wiley.com/doi/10.1111/mec.17822</ref> by University Of Chicago Library, Wiley Online Library on [29/09/2025]. See the Terms and Conditions (<ref type="url">https://onlinelibrary.wiley.com/terms-and-conditions</ref>) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p><p>other than inferred F1) individuals within each species at the 0.60 &lt; Q &lt; 0.95 threshold was 14 out of 217 samples (6.5%) in Q. macrocarpa, 1 out of 58 (1.7%) in Q. alba, 2 out of 29 (6.9%) in Q. bicolor, 1 out of 22 (4.5%) in Q. stellata, and 0 out of 19 (0%) in Q. muehlenbergii or 18 out of 345 (5.2%) for the whole dataset (Table <ref type="table">S4</ref>). Within Q. macrocarpa, samples were much more likely to have assignment to both MAC N and MAC S than to exhibit admixture with different species: of the 203 samples with MAC N + MAC S &gt; 0.95, 119 (58.6%) had Q &lt; 0.95 membership in one of the two Q. macrocarpa genetic clusters. However, genetic differentiation between MAC N and MAC S is lower than interspecific variation: pairwise F ST from ADMIXTURE between inferred Q. macrocarpa genetic clusters was 0.103, compared to 0.292-0.468 for interspecific comparisons (Figure <ref type="figure">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">| Hybrid Class Assignment</head><p>362 out of 364 individuals had a summed Q &gt; 0.95 across at most two species clusters and were included in pairwise NewHybrids analyses. Using a 0.9 posterior probability threshold of hybrid class assignment, 314 individuals (86.7%) were confidently classified as "pure" (unadmixed) members of their determined species, 16 (4.4%) were classified as F1s, 7 (1.9%) were classified as backcrosses (BC1, BC3, or BC of undetermined generation), and 2 (0.55%) were classified as F2s (Table <ref type="table">S5</ref>; Data Summary S1). Only three of the seven backcrosses (0.8%) could be confidently assigned to a specific backcross generation. Additionally, discrimination between putatively pure individuals and thirdgeneration backcrosses proved difficult for NewHybrids; 23 (6.3%) individuals could not be unambiguously placed into either class (Table <ref type="table">S5</ref>). The percentage of admixed individuals in each pairwise comparison ranged from 0% (Q. stellata-Q. muehlenbergii, Q. bicolor-Q. stellata, Q. bicolor-Q. muehlenbergii, Q. alba-Q. stellata, Q. alba-Q. bicolor) to 11.6% (29 out of 248)</p><p>for the Q. macrocarpa-Q. bicolor pair (Table <ref type="table">S6</ref>; Data Summary S1). NewHybrids analyses converged across all MCMC chains, with effective sample sizes of &gt; 100,000 for all parameters across species pairs (Table <ref type="table">S7</ref>). Triangle plots for Q. macrocarpa &#215; bicolor and Q. macrocarpa &#215; alba (Figure <ref type="figure">6</ref>) demonstrate that F1s FIGURE 6 | Triangle plots. Interancestry heterozygosity plotted against hybrid index (i.e., ancestry proportion) for the (a) Q. macrocarpa-Q. alba and (b) Q. macrocarpa-Q. bicolor species pairs. Samples are coloured according to their NewHybrids consensus hybrid class assignments. Grey plus signs indicate the expected position of specified hybrid classes on the triangle plots. Expected heterozygosity as a function of ancestry fraction under Hardy-Weinberg conditions is given by the dashed parabola.</p><p>and F2s are strongly concordant with evidence from heterozygosity for both crosses. However, the inferred backcrosses beyond BC1 are not readily distinguishable from pure Q. macrocarpa in the cross with Q. bicolor, whereas the BC of undetermined generation toward Q. alba appears to be a plausible BC1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">| Co-Occurrence and Predictors of Admixture and Population Structure</head><p>Of the FIA plots subsetted to just those that had at least one oak, 39.5% had Q. alba, 14.3% had Q. stellata, 12.7% had Q. macrocarpa, 5.6% had Q. muehlenbergii and 2.2% had Q. bicolor. The most common associate of Q. macrocarpa was Q. alba, at 19.4%, followed by Q. muehlenbergii at 6.0%, then Q. stellata and Q. bicolor, both at 3.5%. The most common associate of Q. alba was Q. stellata (23.2%), followed by Q. muehlenbergii (7.5%), Q. macrocarpa (6.3%) and Q. bicolor (2.6%).</p><p>Proportional membership in ADMIXTURE-inferred Q. macrocarpa genetic clusters displayed significant spatial autocorrelation (Moran's I = 0.088391, p = 0.009), broadly corresponding to northern (MAC N ) and southern (MAC S ) genetic clusters (Figure <ref type="figure">7</ref>). Cross-validation tuning of the FEEMS hyperparameter &#955; suggested that a modest degree of smoothing (&#955; = 2.069) yielded the best model fit (Figure <ref type="figure">S2</ref>). Congruent with ADMIXTURE results, analysis of spatial genetic connectivity via FEEMS revealed a 300-500 km latitudinal band of reduced effective migration stretching from southern Ohio in the east to northern Kansas and southern Iowa in the west (Figure <ref type="figure">8</ref>). Effective migration rates to the north and south were approximately 1-2 orders of magnitude larger than rates within the band of reduced effective migration.</p><p>Model selection via AICc favoured species distribution models (SDMs) with small regularisation multipliers and many feature classes (Table <ref type="table">S8</ref>). SDMs projected across eastern North America corresponded closely to published range maps and known occurrence data for each species in our dataset <ref type="bibr">(Little 1971</ref>) (Figure <ref type="figure">S3</ref>). Range edges displayed steep habitat suitability FIGURE 8 | Bur oak effective migration landscape estimated by FEEMS. Black points represent sampled populations of Quercus macrocarpa. Line segments forming the lattice are coloured (and thickened) according to log-transformed migration rates (w). Migration estimates are reported as the log 10 -transformed edge weights (w ik ) standardised against a scaling parameter w 0 estimated under a pure isolation-bydistance model, i.e., a constant value of w ik across all edges.</p><p>gradients for most species, especially toward northern and western margins (Figure <ref type="figure">S3</ref>).</p><p>In the all-species admixture GLMM, distance from range edge exhibited a strong negative association with heterospecific Qvalue ([-0.48, -0.13], 95% CI) (Figure <ref type="figure">9a</ref>). No other environmental predictors were significantly correlated with admixture rates, nor were any of the species fixed effects, though the small number of hybrids found may limit our power to detect any such effects. Conversely, the Q. macrocarpa population structure GLMM revealed little predictive power for any individual environmental covariate (Figure <ref type="figure">9b</ref>). As assessed by traceplots, posterior predictive checks and visual examination of parameter posterior distributions, our GLMMs exhibited convergence across all MCMC chains, with R-hat = 1.00 and cumulative Bulk/Tail ESS &gt; 82,000 across all model parameters for all models (Tables <ref type="table">S9</ref> and <ref type="table">S10</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">| Genetic Differentiation Across Functional Classes</head><p>Rates of genetic differentiation across candidate gene classes for bud-break phenology and drought tolerance did not diverge significantly from the null distributions constructed from background (non-candidate) genes (Figure <ref type="figure">S4</ref>, Table <ref type="table">S11</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">| Discussion</head><p>Our study demonstrates that the five most prevalent species of the eastern North American white oak syngameon are genomic cluster species <ref type="bibr">(Mallet 1995</ref><ref type="bibr">(Mallet , 2020))</ref>, largely predictable by their ecology and morphology. All, however, are connected by gene flow with the widespread bur oak or directly with one another. Although the species form clear, discontinuous genomic clusters, ca. 13% of the individuals we sampled are hybrids or introgressed. The eastern North American white oaks are a syngameon.</p><p>Our study also shows that genetic variation in Q. macrocarpa is due in part to introgression and in part to a substantial phylogeographic split separating the northern and southern populations. Our work thus supports the analysis of the eastern North American white oak syngameon made by <ref type="bibr">Hardin (1975, 336)</ref> on the basis of morphology alone, but here supported by genomic data: while "a significant, although relatively minor, component of the variation is due to hybridization and localized introgression," most of the variation in the species is due to intraspecific variation that apparently has nothing to do with hybridization.</p><p>Understanding the relative importance of these sources of genetic diversity will be key to predicting how bur oak, one of eastern North America's foundational oak species, will adapt to evolving environmental conditions. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">| Genomic Coherence of Eastern North American White Oaks</head><p>Both the NewHybrids and ADMIXTURE analyses demonstrated that the eastern North American white oaks in our study are genomically distinct species that nonetheless hybridize and, in some cases, introgress. Of 362 samples, the NewHybrids analyses identified 16 (4.4%) as F1 hybrids and 9 (2.5%) as backcrosses or F2, for a total of 6.9% confirmed hybrids. In addition, a further 23 (6%) are of undetermined ancestry (meaning they could be either unadmixed or backcross), making 315 samples (87%) confirmed "pure" (unadmixed).</p><p>The estimated frequency of hybrids in our study is much higher than estimates from historical, morphology-based studies, which generally estimated the frequency of hybrids in eastern North American oaks as less than 1% <ref type="bibr">(Jensen 2002;</ref><ref type="bibr">Palmer 1948</ref>). This discrepancy may be due to both sampling bias-in at least one study, oak hybrids were found to be undersampled in herbaria relative to their prevalence in natural populations (Wu et al. 2023)-and the increased power of genomic data to detect admixture. Nonetheless, our finding recalls the claim made 50 years ago based on morphology alone that "the term 'syngameon' [in the eastern North American white oaks] does not imply unlimited and widespread gene exchange between the species. To the contrary, the gene 'flow' is but a 'trickle' in most cases" <ref type="bibr">(Hardin 1975, 360)</ref>.</p><p>Our admixture findings are in line with other DNA-based studies of North American oaks (e.g., Cavender- <ref type="bibr">Bares et al. 2015;</ref><ref type="bibr">Craft et al. 2002;</ref><ref type="bibr">Kim et al. 2018;</ref><ref type="bibr">Ortego et al. 2017;</ref><ref type="bibr">Sullivan et al. 2016</ref>). Higher hybridization rates have been reported for oaks of Europe (e.g., <ref type="bibr">Degen et al. 2023;</ref><ref type="bibr">Jensen et al. 2009;</ref><ref type="bibr">Reutimann et al. 2020;</ref><ref type="bibr">Valbuena-Carabana et al. 2005)</ref>, East Asia (e.g., Liu et al. 2018; Pei et al. 2022; Qi et al. 2024; Shi et al. 2023) and Mexico (e.g., Albarr&#225;n-Lara et al. 2019; Castillo-Mendoza et al. 2019; Gonz&#225;lez-Rodr&#237;guez et al. 2004; Morales-Salda&#241;a et al. 2022; Pe&#241;aloza-Ram&#237;rez et al. 2010), but this may be influenced in part by human interplanting of species that would not grow together naturally in Europe (Stace et al. 2015) and substantially younger clade ages in Mexico <ref type="bibr">(Hipp et al. 2018)</ref>.</p><p>21 apparent backcrosses from Q. bicolor to Q. macrocarpa are scored as ambiguous in the NewHybrids analysis, but most make sense geographically: all Q. macrocarpa with &#8805; 3.0% admixture from Q. bicolor are located within the range of Q. bicolor except for two individuals (QUE001882 from Oklahoma; QUE001635, cultivated from acorn collected in New Mexico); and four of the five Q. macrocarpa individuals with &#8805; 2.0% admixture from Q. muehlenbergii are in the area of sympatry with that species (the exception being QUE002581, from South Dakota). However, ancestry estimation methods can fail to recover signals of ancient introgression, particularly when the magnitude of gene flow is low and occurred long enough in the past for drift, selection and recombination to fix variants and reshuffle species-specific combinations of allele frequencies <ref type="bibr">(Kong and Kubatko 2021)</ref>. The rates and spatial patterns of admixture observed in our dataset should be interpreted as a snapshot of one point in the history of the eastern North American white oak syngameon and possibly an under-representation of the importance of admixture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">| Spatial Patterns of Admixture</head><p>Proximity to species' range edge most strongly predicted admixture in our dataset, concordant with prior rangewide findings on introgression in Q. alba based on morphological data <ref type="bibr">(Hardin 1975)</ref>. Similar geographic patterns have been observed in poplar, pinyon pine and other temperate forest trees <ref type="bibr">(Buck et al. 2023;</ref><ref type="bibr">Chhatre et al. 2018;</ref><ref type="bibr">Mimura et al. 2024)</ref> as well as clades of butterflies, toads, grasshoppers and other taxa <ref type="bibr">(Barton and</ref><ref type="bibr">Hewitt 1985, 1989;</ref><ref type="bibr">Kawakami et al. 2009;</ref><ref type="bibr">Shepherd et al. 2022)</ref>. These patterns align well with predictions and observations of environmental intermediacy favouring genetic admixture <ref type="bibr">(Anderson 1948;</ref><ref type="bibr">Muller 1952</ref>). However, unlike these systems, which are largely characterised by parapatric hybridization, the eastern North American white oak syngameon contains many species occurring in widespread sympatry.</p><p>Range-marginal hybridization may be due to pollen swamping from larger heterospecific populations <ref type="bibr">(Lepais et al. 2009)</ref>, as populations reach their geographic limits, or reproductive failure due to increased stress in ecologically marginal populations, as demonstrated in Quercus gambelii and Q. grisea, where hybridization was driven in part by pollen abortion due to heat and drought stress <ref type="bibr">(Williams et al. 2001)</ref>. A disproportionate number of admixed individuals in our study were identified in the Hudson Valley-Lake Champlain corridor in New York and Vermont, where the eastern North American white oak species present consist of disjunct or range-marginal populations confined to lower elevation river valleys between climatically unsuitable mountain ranges. Under these circumstances, compressed distributions along topographically diverse river valleys might also facilitate higher rates of gene flow between species (Vallejo-Mar&#237;n and Hiscock 2016). However, increased gene flow in the north also raises the question of whether adaptive introgression might contribute to the large geographic range and ecological variation in Q. macrocarpa. Adaptive introgression can facilitate species migration and persistence in novel environments <ref type="bibr">(Burge et al. 2019;</ref><ref type="bibr">Cronk and Suarez-Gonzalez 2018;</ref><ref type="bibr">Dodd and Afzal-Rafii 2004;</ref><ref type="bibr">Nagamitsu et al. 2020;</ref><ref type="bibr">O'Donnell et al. 2021;</ref><ref type="bibr">Pfennig et al. 2016</ref>) and has been argued to play a key role in the northward extension of the widespread European species Q. petraea <ref type="bibr">(Leroy et al. 2020)</ref>. Climate suitability was not a significant partial predictor of admixture in our study (Figure <ref type="figure">9a</ref>, "habitat suitability"), but edaphic factors at a finer scale than we were able to measure using the soil layers we had may be more important, as has been suggested, for example, in eastern North American Q. ellipsoidalis and Q. rubra <ref type="bibr">(Khodwekar and Gailing 2017)</ref>.</p><p>Even along range margins, hybrids were largely confined to a small subset of sites. Since early-generation hybridization appears rare, this heterogeneity could simply be explained by a low probability of a hybrid occurring at any given site. However, heterogeneity in local factors, such as habitat structure, disturbance <ref type="bibr">(Anderson 1948;</ref><ref type="bibr">Bray 1960)</ref> or community composition, may exert strong effects on the generation and persistence of F1s and their backcrossed progeny. More detailed data on micro-scale habitat variation could help clarify whether habitat intermediacy promotes the survival of F1s and facilitates introgressive gene flow back into one or more parent populations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">| Admixture Between Q. bicolor and Q. macrocarpa</head><p>The two most closely related species in our study, bur oak (Q. macrocarpa) and swamp white oak (Q. bicolor) <ref type="bibr">(Hipp et al. 2020;</ref><ref type="bibr">Larson et al. 2024)</ref>, have long been thought to exchange genes relatively freely <ref type="bibr">(Bray 1960;</ref><ref type="bibr">Burger 1975;</ref><ref type="bibr">Stebbins 1950, 63-64;</ref><ref type="bibr">Van Valen 1976)</ref>. While our linear mixed models (GLMMs) suggest that the partial effect of species identity on admixture is non-significant (Figure <ref type="figure">9</ref>), this may be partially due to low within-population sampling. By contrast, ADMIXTURE and NewHybrids analyses both show exceptionally high interspecific gene flow between Q. bicolor and Q. macrocarpa: well-supported F2s, 4 of the 7 demonstrated backcrosses, and 21 of the 23 samples of undetermined ancestry (unadmixed or backcross) were derived from these two species. This is particularly remarkable in light of the fact that in U.S. Forest Inventory and Analysis data, 19.4% of Q. macrocarpa plots also include Q. alba, while only 3.5% of Q. macrocarpa plots include Q. bicolor. This suggests that contemporary gene flow is much more frequent between Q. macrocarpa and Q. bicolor than expected by co-occurrence alone.</p><p>Given the close evolutionary relationship between the two species and high gene tree discordance observed among white oaks in general <ref type="bibr">(Crowl et al. 2020;</ref><ref type="bibr">Larson et al. 2024;</ref><ref type="bibr">McVay, Hauser, et al. 2017;</ref><ref type="bibr">McVay, Hipp, et al. 2017)</ref>, it is conceivable that our inferred admixture estimates could represent either gene flow or unsorted ancestral polymorphism. Incomplete lineage sorting between recently diverged species can masquerade as admixture when phylogeny is not accounted for <ref type="bibr">(Lexer et al. 2006;</ref><ref type="bibr">Muir and</ref><ref type="bibr">Schl&#246;tterer 2005, 2006;</ref><ref type="bibr">Thomson et al. 2015)</ref>. However, in our dataset, admixed individuals of Q. macrocarpa and Q. bicolor were distributed unevenly across the landscape, clustering at specific sites and extending no further than the zone of sympatry between parent species. This pattern supports interspecific gene flow over incomplete lineage sorting: under the latter, inferred admixture is expected to be widespread across the ranges of parental populations <ref type="bibr">(Muir and Schl&#246;tterer 2005)</ref>.</p><p>The causes of high gene flow between Q. bicolor and Q. macrocarpa as well as the strong asymmetry in introgression, almost exclusively from Q. bicolor to Q. macrocarpa, are unknown. The two recently diverged species may have lower reproductive barriers due to insufficient time for intrinsic incompatibilities to evolve. Moreover, Q. macrocarpa and Q. bicolor co-occur closely in mixed stands throughout the Midwest and portions of the Northeast, though relatively infrequently (3.5% of Q. macrocarpa plots). Close co-occurrence could provide ample opportunity for heterospecific pollen flow and potentially relax postzygotic selection against hybrids with intermediate ecological niches. However, most of the eastern North American white oak species in our study commonly grow in codominant mixed stands, particularly Q. macrocarpa and Q. alba (Cavender-Bares et al. 2018; Cho and Boerner 1991; Hardin 1975); and as noted above, more than 5 times as many Q. macrocarpa plots in the U.S. FIA data include Q. alba as include Q. bicolor. Disentangling the mechanisms maintaining genomic coherence of Q. macrocarpa and Q. bicolor would require controlled crossing experiments to tease apart the relative contributions of pollen competition, gametic incompatibility, hybrid viability and ecological selection on hybrids, among other factors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">| Homogeneity of Differentiation Across the Genome</head><p>Genetic differentiation did not vary significantly across candidate functional gene classes for any of the species pairs in our study (Figure <ref type="figure">S4</ref>). We may have lacked adequate sampling of Quercus individuals and target loci-our post-filtering dataset includes only 445 of the ca. 31,000-42,000 gene models inferred in a variety of annotated oak genomes to date (e.g., <ref type="bibr">Ai et al. 2022;</ref><ref type="bibr">Fu et al. 2022;</ref><ref type="bibr">Kapoor et al. 2023;</ref><ref type="bibr">Larson et al. 2024;</ref><ref type="bibr">Sork et al. 2022;</ref><ref type="bibr">Zhou, Liu, et al. 2022)</ref>-to detect differential rates of divergence or gene flow across classes: adaptive differentiation or introgression may occur at a small number of loci, yielding insignificant results when averaging over an entire functional gene class consisting mostly of loci not under positive selection. Additionally, the drought-tolerance candidate genes in our marker set were selected based on orthology and literature search for California oaks (Oney-Birol et al. 2018); the bud-break phenology candidate genes were selected based on a transcriptomic study in European white oaks (Lesur et al. 2015; Ueno et al. 2013). It is unknown whether these loci are all under selection, particularly in eastern North America. Alternatively, because we combined all samples of a given species for pairwise analyses, spatially heterogeneous patterns of gene flow or differentiation could obscure regionally significant divergence, gene flow or local adaptation. For example, adaptive introgression of drought tolerance alleles from Q. stellata into southwestern populations of Q. macrocarpa could be counterbalanced by introgression of flooding-tolerance alleles from Q. bicolor at northern sites. Further studies using larger marker sets and denser population sampling could provide the power to geographically partition jackknife analyses or conduct genotype-environment association tests to examine finer-scale, locus-or environment-specific patterns of (potentially) adaptive genetic differentiation across the eastern North American white oak syngameon. Finally, adaptive divergence and/or gene flow may be focused around non-coding regions, as has been demonstrated in east Asian Quercus (Fu et al. 2022), potentially producing null results when testing for adaptive introgression using target-capture methods for protein-coding genes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">| Phylogeography and Species Coherence of Q. macrocarpa</head><p>Genomic coherence of Q. macrocarpa is strongly supported by NMDS and ADMIXTURE analyses. Divergence between the northern and southern populations of the species is supported by ADMIXTURE and FEEMS, the latter showing a reduction in gene flow near the middle of the species' range, thus suggesting that the phylogeographic structure within Q. macrocarpa is not simply an outcome of isolation-by-distance. Moreover, our inclusion of a phylogenetically diverse set of widespread eastern North American white oak species <ref type="bibr">(Hipp et al. 2018</ref>) suggests that population structure within Q. macrocarpa, beyond evidence of admixture and hybridization with the included species, represents intraspecific variation rather than gene flow from an unsampled eastern North American white oak species. Are the bur oak north and south genomic clusters distinct enough to be good species? This seems unlikely, as genetic differentiation between inferred "pure" Q. macrocarpa north and south clusters (F ST = 0.103) is approximately one-third as great as that between the most closely related species in our dataset, Q. bicolor and Q. macrocarpa (F ST = 0.292-0.304). In combination, these analyses support our inference that, despite population structure, Q. macrocarpa is a single genomic cluster species, distinct from sympatric congeners.</p><p>The phylogeographic signal within Q. macrocarpa is not significantly predicted by climate or soil, leaving open the question of whether selection along an environmental gradient from northern upland savannas to southern bottomlands might explain genetic divergence within the species. Genetic clustering within Q. macrocarpa also does not correspond to known natural barriers to reproductive connectivity in eastern North America, nor to any widely documented phylogeographic patterns in the region <ref type="bibr">(Soltis et al. 2006</ref>). It also does not align with a previous phylogenetic study of chloroplast haplotypes across the eastern North American white oak syngameon <ref type="bibr">(Pham et al. 2017)</ref>, which in fact showed weak geographic clustering, in contrast with the strong geographic structure in European oak chloroplast haplotypes <ref type="bibr">(Petit et al. 2002)</ref>. These findings are consistent with the previous inferences that North American oaks occupied more and smaller northern refugia and underwent less dramatic late-Pleistocene population bottlenecks than European oaks <ref type="bibr">(Lumibao et al. 2017;</ref><ref type="bibr">McLachlan et al. 2005)</ref>.</p><p>The genetic structure within bur oak does not correspond to any known morphological discontinuities. While latitudinal variation exists for bur oak acorn size and leaf lobing, these characters vary continuously <ref type="bibr">(Desmond et al. 2021;</ref><ref type="bibr">Koenig et al. 2009)</ref>. Additionally, none of the varieties previously diagnosed within Q. macrocarpa, which consist mostly of regional morphs throughout the Great Plains <ref type="bibr">(Nixon and Muller 1997)</ref>, align with the pattern of population structure we obtained. Rangewide common garden approaches hold the promise of evaluating what, if any, implications rangewide population structure holds for functional phenotypic diversity in Q. macrocarpa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6">| Conclusions</head><p>Our study is the first genomic investigation of hybridization, introgression and population genetic structure in bur oak and sympatric species of the eastern North American white oak syngameon, a model system for examining hybridization and species coherence in temperate trees. Our study confirms that the traditional species form largely discontinuous genomic clusters, despite the presence of interspecific hybrids. Several interesting trends are suggested by our results. Levels of hybridization vary across the bur oak range, with hybrids establishing at a higher level in range-marginal populations. Other trends suggested in this dataset require further testing, since the small percentage of hybrids prevented strong statistical confirmation. Patterns of hybridization vary among species pairs. For instance, Q. macrocarpa showed little backcrossing with Q. stellata and Q. alba, although F1 hybrids with these species were found. On the other hand, backcrosses and F2s are much commoner for Q. bicolor &#215; Q. macrocarpa. Backcrossing in this combination is asymmetrical, however, with Q. bicolor genes becoming incorporated into Q. macrocarpa but little evidence for the reverse. The demonstration of strong phylogeographic structure within the species suggests the possibility of either secondary contact from multiple refugia or strong divergent selection. The high amount of introgression into Q. macrocarpa raises the question of whether the large ecological, geographic, and morphological range of the species may be due in part to adaptive gene flow.</p><p>Combined with the species distribution models presented, these data are a first step toward understanding how the interplay between genetic diversity within and among populations, genetic diversity among species and interspecific gene flow has shaped one of eastern North America's most widespread oak species.</p><p>and hybridization to managers of participating natural areas where we collected tissue (see Acknowledgments) and will inform conservation decisions in at least one state. Additional benefits accrue from the sharing of our data and results on public databases as described above.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>1365294x, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mec.17822 by University Of Chicago Library, Wiley Online Library on [29/09/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p></note>
		</body>
		</text>
</TEI>
