<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Genetic Diversity and Population Structure in Cities Is Not Consistent Among Cosmopolitan Plant Species</title></titleStmt>
			<publicationStmt>
				<publisher>Wiley</publisher>
				<date>02/01/2026</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10664681</idno>
					<idno type="doi">10.1111/mec.70261</idno>
					<title level='j'>Molecular Ecology</title>
<idno>0962-1083</idno>
<biblScope unit="volume">35</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>Ava M Hoffman</author><author>Jennifer M Cocciardi</author><author>Prothama Manna</author><author>Diego F Alvarado‐Serrano</author><author>Jeannine Cavender‐Bares</author><author>Peter M Groffman</author><author>Sharon J Hall</author><author>Sarah E Hobbie</author><author>Susannah B Lerman</author><author>Josep Padullés_Cubino</author><author>Diane E Pataki</author><author>Tara_L E Trammell</author><author>Meghan L Avolio</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[ABSTRACT. Urbanisation has led to increasing homogenization of plant communities across cities. However, it is unclear whether these patterns extend to cosmopolitan plant species at the genetic level. We examined genome‐wide genetic patterns in six widespread plant species (three Poaceae and three Asteraceae) across five cities in the USA (Boston, Baltimore, Minneapolis‐St. Paul, Phoenix, and Los Angeles) using reduced‐representation sequencing. We assessed genetic structure, differentiation, and patterns of isolation by distance (IBD) and environment (IBE) to determine if species were genetically homogeneous or differentiated by city, percentage of impervious surface, or both. Most species exhibited limited population structure overall, with<styled-content style='fixed-case'><italic>Poa annua</italic></styled-content>(annual bluegrass),<styled-content style='fixed-case'><italic>Taraxacum officinale</italic></styled-content>(dandelion), and<styled-content style='fixed-case'><italic>Cynodon dactylon</italic></styled-content>(Bermuda grass) showing no significant genetic differentiation among cities, a pattern consistent with high gene flow mediated by human activity. Notable exceptions included city‐level differences in<styled-content style='fixed-case'><italic>Erigeron canadensis</italic></styled-content>(horseweed) and<styled-content style='fixed-case'><italic>Lactuca serriola</italic></styled-content>(prickly lettuce), especially in Phoenix. We also observed low genetic diversity in<styled-content style='fixed-case'><italic>Digitaria sanguinalis</italic></styled-content>(crabgrass) from Phoenix, suggesting recent founder effects or selection via environmental filtering.<italic><styled-content style='fixed-case'>Erigeron canadensis</styled-content>,</italic>the only native species studied, displayed stronger differentiation by city, along with significant isolation by temperature and distance. Among all species, we found no evidence for population structure by impervious surface. Our findings indicate that widespread population genetic structure patterns of cosmopolitan plants are likely to depend more on species attributes (e.g., self‐compatibility) and human‐mediated dispersal than on urbanisation per se.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Files for peer review</head><p>All files submitted by the author for peer review are listed below. Files that could not be converted to PDF are indicated; reviewers are able to access them online.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Name</head><p>Type of File Size Page Urban pop structure -Main Text (1).docx Main Document -MS Word 2.4 MB Page 4 Supplement.pdf Supplementary Material for Review 492.1 KB Statements relating to Molecular Ecology ethics and integrity policies &#8226; Data availability statement: see Data Availability below &#8226; Funding statement: see Acknowledgements below &#8226; Conflict of interest disclosure: The authors declare no conflicts of interest. &#8226; Ethics approval statement: not applicable &#8226; Patient consent statement: not applicable &#8226; Permission to reproduce material from other sources: not applicable &#8226; Clinical trial registration: not applicable</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Global ecosystems have been transformed by human activity. Urban environments in particular have been dramatically altered, with a greater presence of impervious surfaces, higher temperatures, and novel combinations of species <ref type="bibr">(Ruas et al., 2022)</ref>. For example, in the USA regional policy, residential income, as well as personal aesthetic values have led to the homogenization of urban plant communities <ref type="bibr">(Groffman et al., 2014;</ref><ref type="bibr">Wheeler et al., 2017)</ref>. This pattern is reinforced by the presence of cosmopolitan plant species, which are widespread and distributed globally across many cities <ref type="bibr">(Aronson et al., 2014;</ref><ref type="bibr">Del Tredici, 2020)</ref>. These species are often historically associated with agricultural or ruderal areas <ref type="bibr">(La Sorte et al., 2007)</ref> and are likely to be introduced (non-native) and spontaneous (not planted by humans) in urban environments <ref type="bibr">(Cavender-Bares et al., 2020;</ref><ref type="bibr">Huang et al., 2025;</ref><ref type="bibr">Knapp et al., 2012;</ref><ref type="bibr">Padull&#233;s Cubino et al., 2019;</ref><ref type="bibr">Wittig &amp; Becker, 2010)</ref>. However, despite the clear presence of common plants that contribute to homogenization across urban environments, less is known about genetic patterns within these species, which ultimately limits our understanding of their origin and spread.</p><p>Humans modify plant dispersal, which directly impacts the degree to which plant populations are related to one another. Specifically, relatedness can change through gene flow (exchange of alleles) and founder effects (populations established by a limited number of individuals). When human activity increases rates of plant dispersal, higher gene flow can result in high genetic similarity among populations. For example, historical trade routes are thought to be pathways through which urban genotypes of Plantago (Plantains) species dispersed, maintaining genetic homogeneity among populations <ref type="bibr">(Iwanycki Ahlstrand et al., 2022;</ref><ref type="bibr">Smith et al., 2020)</ref>. For introduced plant species, including most North American cosmopolitan species which are native to Eurasia, colonization history also impacts how populations are related to one another.</p><p>Plant species that are dispersed outside their native range can be subjected to founder effects or genetic bottlenecks, which reduce genetic diversity and can increase the prevalence of specific traits like selfing and clonality <ref type="bibr">(Estoup et al., 2016;</ref><ref type="bibr">Hern&#225;ndez-Espinosa et al., 2022)</ref>. For instance, the invasive forb Impatiens glandulifera (Himalayan Balsam) has lower genetic diversity in its introduced range compared to its native range, suggesting limited gene flow or founder effects. However, several studies have found that high gene flow through repeated introductions can quickly weaken founder effects in introduced plant species <ref type="bibr">(Gioria et al., 2023;</ref><ref type="bibr">Lambertini, 2019;</ref><ref type="bibr">Shirk et al., 2014;</ref><ref type="bibr">Vandepitte et al., 2017;</ref><ref type="bibr">Vicente et al., 2021)</ref>. With cosmopolitan plant species specifically, repeated human-mediated introductions and intentional planting could mean that high gene flow outweighs other processes, leading to low genetic differentiation <ref type="bibr">(Caizergues et al., 2024;</ref><ref type="bibr">Smith et al., 2020)</ref>.</p><p>In addition to the active human dispersal of species, genetic similarity can also be influenced by environmental differences and phenotypic attributes of the species themselves. Following invasions, plant species can adapt to local environments <ref type="bibr">(Oduor et al., 2016)</ref>, producing a signal of isolation by environment (IBE). For example, the genetic variation of invasive Tanacetum vulgare (common tansy) was found to be largely driven by land use and soil properties rather than distance <ref type="bibr">(Briscoe Runquist &amp; Moeller, 2024)</ref>. Genetic differences can similarly be correlated with geographic distance. Native species may be more strongly affected by geographic and environmental distance compared to introduced species <ref type="bibr">(Long et al., 2009)</ref>, as introduced species are likely to have greater dispersal success, rapid germination, and/or reproduction <ref type="bibr">(Flores-Moreno et al., 2013;</ref><ref type="bibr">Van Etten et al., 2017)</ref>. The strength of IBE and IBD can vary depending on whether the native versus introduced range is sampled. For example, Lycium ferocissimum (African boxthorn) was found to be isolated primarily by distance in its native range, but by environment in its introduced range <ref type="bibr">(McCulloch et al., 2023)</ref>. Phragmites australis (common reed)</p><p>showed a similar pattern, where distance was not as important as the environment in the introduced range <ref type="bibr">(Guo et al., 2018)</ref>. Evolutionary studies frequently use macroclimate and geographic distance to understand IBE and IBD processes. However, urban environments in cities are especially heterogeneous, with human-modified habitats acting as both ecological filters and dispersal barriers. Features like percent impervious surface in cities may be key to explaining genetic patterns in urban environments.</p><p>Urban environmental conditions, including heat islands, altered soil conditions, and/or habitat fragmentation, are assumed to play a major role in genetic differentiation <ref type="bibr">(Johnson &amp; Munshi-South, 2017)</ref>. Specifically, genetic differentiation can occur through direct environmental selection or indirectly via reduced gene flow, as impervious surfaces and fragmented green spaces act as barriers to dispersal <ref type="bibr">(Alberti et al., 2020;</ref><ref type="bibr">Johnson &amp; Munshi-South, 2017;</ref><ref type="bibr">Rivkin et al., 2019;</ref><ref type="bibr">Santangelo et al., 2018;</ref><ref type="bibr">Wood et al., 2021)</ref>. As a result, plants in cities might share phenotypes despite large geographic separation. For example, across global urban environments, Trifolium repens (white clover) has been shown to lose herbivore defense in favor of tolerance to urban drought <ref type="bibr">(Johnson et al., 2018;</ref><ref type="bibr">Santangelo et al., 2022)</ref>.</p><p>Lepidium virginicum (Virginia pepperweed) plants from urban environments showed more similar growth phenotypes and were also more genetically related overall compared to rural plants <ref type="bibr">(Yakub &amp; Tiffin, 2017)</ref>. However, gene flow among populations in different cities is still poorly understood <ref type="bibr">(Rivkin et al., 2019)</ref>. Despite interesting preliminary emerging patterns, few studies have used multiple species in the same study design to compare genetic patterns within and across cities. Studies comparing multiple species and cities simultaneously are key to revealing common patterns and processes underlying urban evolution research.</p><p>We examined the genetic composition of six cosmopolitan weedy plant species within and across five metropolitan areas, hereafter "cities", in the USA. Our study included three species in the Asteraceae family and three in the Poaceae family. The three Poaceae studied were: Cynodon dactylon (Bermuda grass), Digitaria sanguinalis (large crabgrass), and Poa annua (annual bluegrass). The three Asteraceae studied were: Erigeron canadensis (horseweed), Lactuca serriola (prickly lettuce), and Taraxacum officinale (dandelion). The selected cities-Baltimore, Boston, Los Angeles, Minneapolis-Saint Paul, and Phoenix-span diverse climates and geographical regions across the United States. Within each city, we selected sites with different percent impervious surfaces as a proxy for different urban environments. For each plant species, we anticipated several possible genetic patterns. We might observe no genetic differentiation (Figure <ref type="figure">1a</ref>), where all individuals are genetically indistinct due to human-mediated dispersal and/or migration. Alternatively, we might observe genetic differentiation by city (Figure <ref type="figure">1b</ref>), where geographic and/or environmental distance are key drivers of differentiation. Or, we might observe genetic differentiation by urban environment (Figure <ref type="figure">1c</ref>), where impervious surface is correlated with genetic differentiation among all individuals or common urban genotypes. Finally, we might instead observe genetic differentiation by both city and urban environment (Figure <ref type="figure">1d</ref>). Ultimately, this work will help us better understand the consequences of human activities and the environment in shaping the genetic relatedness of cosmopolitan species.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Materials and Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Study species</head><p>Six species (three Poaceae, three Asteraceae) were examined in this study (Table <ref type="table">1</ref>). With the exception of E. canadensis, all species were introduced from Eurasia to the continental USA. These plant species encompass a variety of genome structures; E. canadensis and L. serriola are diploid, D. sanguinalis is hexaploid, P. annua is tetraploid, and T. officinale in North America is largely triploid. Cynodon dactylon can range from diploid to hexaploid, with tetraploidy being the most common. Poa annua and C. dactylon are commonly sold as turf grasses, either in seed mixes or sod, and T. officinale seeds often are contaminants in seed mixes or grown for food (Table <ref type="table">1</ref>). Life form, seasonal life cycle, reproductive strategy, pollination strategy, vegetative growth ability, and phenology also differ among these species <ref type="bibr">(Jones et al., 2021;</ref><ref type="bibr">Natural Resources Conservation Service, 2025;</ref><ref type="bibr">Rita et al., 2012;</ref><ref type="bibr">Stewart-Wade et al., 2002;</ref><ref type="bibr">Warwick, 1979;</ref><ref type="bibr">Weaver, 2001;</ref><ref type="bibr">Weaver &amp; Downs, 2003)</ref> (summarized in Table <ref type="table">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Field collection and DNA extraction</head><p>We collected leaf tissue from five metropolitan areas in the USA, Baltimore, Boston, Los Angeles, Minneapolis-Saint Paul, and Phoenix (Figure <ref type="figure">2a</ref>). These cities vary in climate, including temperature and precipitation normals (Figure <ref type="figure">2b-c</ref>). Minneapolis-St. Paul has the lowest temperatures, routinely lower than 0&#176;C in the winter (Figure <ref type="figure">2b</ref>). Boston and Minneapolis-St. Paul receive the most wintertime snowfall, whereas Los Angeles and Phoenix have no snowfall (Figure <ref type="figure">2c</ref>). Phoenix has the most extreme climate, with maximum temperature normals exceeding 40&#176;C during the summer (Figure <ref type="figure">2b</ref>) and only 183 mm of yearly ambient precipitation (Figure <ref type="figure">2c</ref>).</p><p>We selected collection sites within each city, including yards, parks, vacant lots, roadsides, and natural or unmanaged areas. These sites varied in their percentage of impervious surface (Figure <ref type="figure">3a</ref> and defined in more detail below). At each site, we collected leaves from 1-5 individuals for at least one species. We successfully obtained genotype data (methodology described below) from individuals at 20 to 47 collection sites per city (Figure <ref type="figure">2a</ref>). Figure <ref type="figure">3b</ref> provides an overview of the sampling scheme. Collection took place between April and September 2018. We placed samples immediately in silica gel prior to shipment to Baltimore, Maryland, where DNA extraction began in 2020. Some species could not be collected from all cities: (1) C. dactylon was not collected in Los Minneapolis-St. Paul or Boston; (2) D.</p><p>sanguinalis was not collected in Los Angeles; and (3) E. canadensis was not collected in Minneapolis-St.</p><p>Paul.</p><p>DNA was extracted from leaf tissue using the Omega Bio-Tek E.Z.N.A. Plant DNA DS Mini Kit, which we found to produce greater yields than the basic plant tissue kits. We checked the DNA concentration using a Qubit 4.0 fluorometer (BR dsDNA assay), yielding a mean concentration = 66.78ng/&#181;L. We also checked a subset of DNA for quality using gel electrophoresis. We isolated 200 ng of DNA from each sample prior to library preparation by air drying tubes covered with sterile rayon sealing film (Excel Scientific, AeraSeal) and reconstituting DNA in 10 &#181;L of elution buffer. DNA concentration was under 10 ng/&#181;L for 46 samples; these were concentrated using a vacuum concentrator to accelerate drying time and then reconstituted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Library preparation</head><p>We prepared libraries for sequencing following the quaddRAD method <ref type="bibr">(Franchini et al., 2017)</ref>. We chose this double-digest RAD method for its large-scale multiplexing (use of four barcode sequences per sample) and efficient detection of PCR duplicates. Briefly, we used PstI (rare-cutting) and MspI (frequentcutting) in a combined restriction enzyme digestion and adapter ligation step. We used modified forms of the Illumina i5 and i7 adapters that each incorporated a six-base "inner" barcode and a random four-base stretch to identify PCR duplicates. Following digest-ligation, we pooled either twelve or eight samples and performed double-size selection magnetic bead cleanup (0.5X and 0.8X, Omega Bio-Tek Mag-Bind TotalPure NGS) to filter out adapter dimers and larger DNA fragments &gt; 1000bp. Products were quantified using a Qubit HS dsDNA assay. We reserved 200 ng of this product for amplification.</p><p>Amplification consisted of 11 cycles of PCR using Phusion high-fidelity DNA polymerase. As part of amplification, we used modified forms of the Illumina i5nn and i7nn TruSeq primers that each introduced an eight-base "outer" barcode. We then used Magnetic bead cleanup (0.8X) to remove primer dimers and small DNA fragments. Products were quantified on a Qubit (BR dsDNA assay) and the size distribution of fragments was assessed using a 2100 Bioanalyzer (Agilent Technologies, DNA 1000 kit). Approximately 20% of products were re-amplified with 12 PCR cycles due to small concentrations of fragments in the 600-700bp range. All sub-libraries (193) were then pooled equimolarly based on the concentration of DNA in the 600-700bp range. Finally, we used a Blue Pippin (Sage Science, 1.5% agarose cassettes) to perform size selection for 600-700bp fragments.</p><p>Illumina sequencing was performed at the Johns Hopkins University Genomics Resources Core Facility on a NovaSeq 6000 S4 flow cell (paired-end, 2 x 150 cycles). Because our samples were highly multiplexed, we ensured that all samples had a unique combination of at least two out of four barcodes to minimize index hopping <ref type="bibr">(Costello et al., 2018)</ref>. Samples were demultiplexed according to the outer barcodes by the sequencing facility.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sequence data pre-processing</head><p>Removal of PCR duplicates from RAD datasets via degenerate base sequences inside adapters filters out redundant data <ref type="bibr">(Andrews et al., 2014;</ref><ref type="bibr">Euclide et al., 2020)</ref> and can improve genotype calling accuracy <ref type="bibr">(Tin et al., 2015)</ref>, although see <ref type="bibr">(Euclide et al., 2020)</ref>. We removed a small quantity of PCR duplicates (Figure <ref type="figure">S1</ref>) using the clone_filter module of the Stacks software <ref type="bibr">(Rochette et al., 2019)</ref>. We then demultiplexed inner barcodes using the process_radtags module, with flags to filter out any reads with low-quality scores (phred score of &lt;10 in a sliding window) or uncalled bases. This resulted in an average of 6,974,119 reads per sample (Figure <ref type="figure">S2</ref>). We also manually discarded any samples with fewer than 1,000,000 reads or that comprised less than 1% of the sequenced sub-library as these correspond to low-coverage samples (Figure <ref type="figure">S3</ref>). See the Supplemental Information for more details on the code and options used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Catalog creation and genotype calling</head><p>Prior to running the Stacks pipeline, we ran a parameter search (final parameters listed in Table <ref type="table">S1</ref>) for each species to optimize locus calling using the iterative denovo_map method included with the software.</p><p>Using these parameters, we ran the `ustacks` module where 13 samples were excluded because loci could not be identified, potentially due to lower read counts and comprising a smaller proportion of the sub-library (Table <ref type="table">S2</ref>). We next selected a subset of 276 samples to build the loci catalog for each species (Table <ref type="table">S3</ref>) using the `cstacks` module. This was followed by the `sstacks` module, which matches individual samples against the loci catalogs of their respective species. We used the polyRAD v2.0.0 package <ref type="bibr">(Clark et al., 2019)</ref> within R 4.3.0. to call genotypes because many of our species are polyploid or have historical genome duplications. Using polyRAD, we discarded any loci not found in at least 20% of samples. We calculated overdispersion of loci for each species and filtered loci based on the expected Hind/He statistic <ref type="bibr">(Clark et al., 2022)</ref>, which reflects posterior probabilities for genotypes. We removed an additional 24 samples at this stage due to low coverage (Table <ref type="table">S4</ref>).</p><p>This left us with the following sample count: 185 C. dactylon, 224 D. sanguinalis, 107 E. canadensis, 184 L. serriola, 178 P.</p><p>annua, and 238 T. officinale (Table <ref type="table">2</ref>). All sites from which polymorphic markers were recovered are shown in Figure <ref type="figure">1a</ref>. We chose to use the whole marker sequence to distinguish alleles (haplotype) rather than selecting a single nucleotide polymorphism (SNP) per locus at random.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sample genotyping and distribution</head><p>The Stacks pipeline, sample filtering, and genotype calling produced a sample size of between 107 and 238 individual plants per species (Table <ref type="table">2</ref>). We generated between 559 and 2,665 polymorphic markers per species (Table <ref type="table">2</ref>). We genotyped C. dactylon from three cities (Baltimore, Los Angeles, and Phoenix), D. sanguinalis from four cities (Baltimore, Boston, Minneapolis-Saint Paul, and Phoenix), and E.</p><p>canadensis from three cities (Baltimore, Los Angeles, and Phoenix). Lactuca serriola, P. annua, and T.</p><p>officinale were genotyped from all five cities. This left us with a total of 20-43 sites per city. For each city and species combination, we had between 8-22 sites with two exceptions. Only one E. canadensis individual could be genotyped for Boston and was excluded from subsequent analysis except PCA. Two P. annua individuals were genotyped from Minneapolis-Saint Paul. These two samples were excluded from pairwise rho calculations and within-city statistics below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genetic structure</head><p>To explore genetic clustering of individuals, we conducted an iterative principal components analysis (PCA) using the `IteratePopStruct` function in the polyRAD package. We also inferred genetic structure (i.e., genetic variation within and among cities) using Structure v2.3.4 <ref type="bibr">(Falush et al., 2003;</ref><ref type="bibr">Pritchard et al., 2000)</ref>. We ran five replicates each of each species and K={1-5}, for the five cities, discarding the first 10,000 iterations as burn-in, followed by 20,000 iterations retained. We set USEPOPINFO to zero to take a geographically agnostic clustering approach. To determine the optimal K for each species, we used the Delta-K method implemented by Structure Harvester <ref type="bibr">(Earl &amp; vonHoldt, 2012)</ref>. Using a geographically agnostic clustering approach in Structure, we found the optimal cluster number was K=3, for C. dactylon, D. sanguinalis, L. serriola, and T. officinale. We found optimal K=2 for E. canadensis and K=4 for P.</p><p>annua. Optimal K was lower than the number of cities sampled except for C. dactylon. After determining the optimal K, we re-ran Structure with 100,000 iterations retained for each species. We validated these results using a different algorithmic technique, sparse Non-Negative Matrix Factorization in R 4.4.2 (sNMF, <ref type="bibr">Frichot et al., 2014)</ref>. R `SessionInfo` can be found in the Supplemental Information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genetic differentiation and variation</head><p>We calculated differentiation among cities for each species using Jost's D <ref type="bibr">(Jost, 2008)</ref>, GST <ref type="bibr">(Nei &amp; Chesser, 1983)</ref>, and FST <ref type="bibr">(Nei, 1973)</ref>. These statistics have different assumptions, and can therefore offer a more complete picture when reported together. Jost's D statistic is useful for assessing diversity in polyploid species because it uses the effective number of alleles rather than expected heterozygosity, making it independent of ploidy level and sample size <ref type="bibr">(Meirmans et al., 2018)</ref>. However, this statistic takes longer to reach mutation-drift equilibrium <ref type="bibr">(Meirmans &amp; Hedrick, 2011)</ref>. Measures of GST are robust to polyploidy, but can be sensitive to rare alleles. Measures of FST, while more straightforward to interpret as the proportion of variance attributable to population differentiation, are underestimated in polyploids.</p><p>These statistics were calculated using the polysat package <ref type="bibr">(Clark &amp; Jasieniuk, 2011;</ref><ref type="bibr">Clark &amp; Schreier, 2017)</ref> in R. We also used GenoDive to calculate pairwise &#9076; (rho) between cities, a metric analogous to FST.</p><p>To investigate further within each city, we calculated the inbreeding coefficient (FIS) using the Hardy-Weinberg permutation test available in GenoDive to accommodate biases for polyploid species. A positive value indicates an abundance of homozygotes relative to expectations whereas a negative value indicates an abundance of heterozygotes relative to expectations. We calculated the standardized index of association of loci (rd) to test for linkage disequilibrium for each species and city combination <ref type="bibr">(Agapow &amp; Burt, 2001)</ref>. Linkage disequilibrium can indicate selection, inbreeding, or other deviations from Hardy-Weinberg equilibrium, though it cannot distinguish which process is causing the deviation. We used 999 resampling iterations to create a distribution of rd and perform a one-sided permutation test using the poppr package in R <ref type="bibr">(Kamvar et al., 2014)</ref>. We calculated % private alleles as the percentage of total alleles unique to a particular city using a custom script (see supplemental information).</p><p>We also conducted a hierarchical analysis of molecular variance, AMOVA, using GenoDive v3.06 <ref type="bibr">(Meirmans, 2020)</ref> based on the rho statistic, which is ploidy-independent. The AMOVA allowed us to partition genetic variance among cities, within cities, and within sites. These methods vary in sensitivity, with PCA more likely to highlight small variations, compared to AMOVA which encompasses overall variance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Isolation by distance and environment</head><p>We investigated isolation by distance (IBD) and isolation by environment (IBE) using multiple matrix regression with randomization (MMRR) <ref type="bibr">(Wang, 2013)</ref> modeled separately for each species across all cities. Briefly, these models help us understand how multiple variables are correlated with genetic distance.</p><p>First, we calculated genetic dissimilarity using GenoDive v3.06 <ref type="bibr">(Meirmans, 2020)</ref> based on the rho statistic. Next, to understand IBD, we calculated geographic distance using geodesic distance as implemented with the geodist package in R. Finally, to understand IBE, we created three dissimilarity matrices to calculate environmental distance using: (1) percent impervious surface, (2) April soil temperature, and (3) July soil temperature. These variables were selected as proxies for the multivariate urban environment as they are expected to drive selection and/or isolation in cities <ref type="bibr">(Johnson &amp; Munshi-South, 2017)</ref>. We also included (4) distance from city center as a predictor that could indicate both IBD and IBE, given plants closer to the city center could have greater urban environmental exposure.</p><p>Percent impervious surface measurements were taken from the US Geological Survey National Land Cover Database (NLCD) 2016 release <ref type="bibr">(Yang et al., 2018)</ref>. These values represent urban impervious surfaces as a percentage of the developed surface over every 30-meter pixel. Distance from the city center was calculated as the Euclidean distance (m) from the coordinates obtained from each of the cities' official government (.gov) website. Soil temperature was estimated using the micro_global model within the NicheMapR package, release v3.3.2 <ref type="bibr">(Kearney &amp; Porter, 2017)</ref> which models climate data over a 10x10 km resolution. Values represent the average soil temperature for each month (typical day) at 2.5 cm below the surface and in the middle of the day (12pm/noon). We chose April and July soil temperature to address potential differences in seasonal phenology and account for changes in soil ecosystems.</p><p>Specifically, we might expect spring-flowering species to be more sensitive to April temperatures and summer-flowering species to be more sensitive to July temperatures. Environmental matrices were generated using Euclidean distance. We performed MMRR with genetic distance as an effect of geographic distance and the four environmental distances as predictors. Each species was modeled with 9,999 permutations and implemented with the algatr package in R <ref type="bibr">(Chambers et al., 2023)</ref>. We subset all matrices and re-ran the same MMRR model to determine if distance and/or environmental effects were present within cities. Within species, p-values were corrected to account for multiple testing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genetic structure across cities</head><p>We observed species-specific patterns of genetic clustering. In the PCA, the most conspicuous clustering by city occurred in E. canadensis and L. serriola (Figure <ref type="figure">4</ref>). Erigeron canadensis from Phoenix was genetically distinct along PC1 (22% of the variation). Similarly, L. serriola from Phoenix also differed along PC1 (14% of the variation). Erigeron canadensis and L. serriola from other cities differed along PC2.</p><p>Results from the PCA were supported by Structure analyses (Figure <ref type="figure">5</ref>). We observed species-specific patterns of genetic structure, with T. officinale and P. annua showing the least structure and E. canadensis the most. There was little structural variation by % impervious surface (Figure <ref type="figure">5</ref>, Figure <ref type="figure">S4</ref>).</p><p>We found a high degree of admixture (i.e., mixed ancestry) in C. dactylon, D. sanguinalis, P. annua, and T. officinale (Figure <ref type="figure">4</ref>), though D. sanguinalis from Phoenix appeared to be distinct with low admixture.</p><p>Lactuca serriola from Boston was admixed, while Los Angeles, Phoenix, and Baltimore plus Minneapolis-Saint Paul were largely distinct. Erigeron canadensis showed almost no admixture, with Phoenix plants genetically distinct from Baltimore and Los Angeles plants. With the sNMF approach, the optimal K was generally greater with more admixture detected (Figure <ref type="figure">S4</ref>). However, E. canadensis and L. serriola again appeared to have the most population structure by city. Phoenix D. sanguinalis lacked admixture, while E. canadensis and L. serriola from Phoenix and Los Angeles appeared unique from other cities using the sNMF approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genetic differentiation</head><p>For all three measures of among-city differentiation (Jost's D, GST , FST) we found the lowest values for T.</p><p>officinale and P. annua, indicating lower differentiation in these species (</p><p>Table 2). We found the greatest values for E. canadensis, indicating this species has greater relative differentiation by city. Lactuca serriola, followed by C. dactylon, and D. sanguinalis had intermediate differentiation by city. Using pairwise &#9076; comparisons, we found E. canadensis and L. serriola from Phoenix to be more distinct (i.e., larger &#9076; statistics) (Table <ref type="table">S5</ref>). In contrast, D. sanguinalis from Phoenix was less distinct from other cities.</p><p>Next, we compared cities using allelic richness, homozygosity, linkage disequilibrium, and private alleles.</p><p>Using effective number of alleles (AE), we found that D. sanguinalis in Phoenix had low allelic richness compared to other cities (Table <ref type="table">3</ref>). Cynodon dactylon in Baltimore and L. serriola in Boston were also less diverse. In all cities, C. dactylon, D. sanguinalis, P. annua, and T. officinale had more homozygotes than expected (positive FIS) while E. canadensis and L. serriola had more heterozygotes (negative FIS) (Table <ref type="table">3</ref>). We observed greater values of linkage disequilibrium (rd) in Phoenix for D. sanguinalis, E. canadensis, and T. officinale (Table <ref type="table">3</ref>). Greater relative linkage disequilibrium was also observed in Boston for D. sanguinalis, L. serriola, and P. annua. Across cities, linkage disequilibrium was greater for C. dactylon and D. sanguinalis compared to E. canadensis and L. serriola. Linkage disequilibrium can be indicative of selective sweeps (i.e., rapid evolution) and/or founder effects, though here we cannot distinguish between these. Finally, we investigated percent of private alleles (alleles that were unique to each city). Within species, we found the percentage to be highest in the Phoenix populations of C. dactylon, E. canadensis, and L. serriola. Percent of private alleles was lower in Baltimore for P. annua and T. officinale, D.</p><p>sanguinalis from Phoenix, and L. serriola from Boston (Table <ref type="table">3</ref>).</p><p>We used AMOVA to determine the proportions of variance among cities, within cities, and within sites. In contrast to PCA and Structure analysis, AMOVA is less sensitive to differences caused by individual loci.</p><p>For all species, most of the variation was found within cities. Specifically, individuals sampled from different sites within each city were proportionally more genetically distinct than individuals from other cities. Differences among individuals within a specific site and city accounted for 22.4% -31.4% of the total variation. Genetic variation among different cities made up the least amount of variation, ranging from 3.1% (T. officinale) to 21.5% (D. sanguinalis) (Figure <ref type="figure">6</ref>, Table <ref type="table">S6</ref>). Specifically, C. dactylon and T.</p><p>officinale were the most similar (8.7% and 3.1%, respectively) among cities. In contrast, E. canadensis, L.</p><p>serriola, and P. annua showed more variation (15.2%, 14.8%, 12.9%) among cities. Digitaria sanguinalis had the greatest among-city genetic variation (21.5%), but like other species, a greater proportion of variation was explained by within-city differences (56.1%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Isolation by distance and environment</head><p>When considering all cities together, geographic and environmental distances explained between 1% -27% of the genetic variation in the six species (Figure <ref type="figure">7</ref>, Table <ref type="table">S7</ref>). Only E. canadensis had a significant overall MMRR model (p &lt; 0.001, Table <ref type="table">S7</ref>), although L. serriola was near our significance cutoff (p = 0.051, Table <ref type="table">S7</ref>). Both geographic and environmental distance were significant predictors for E. canadensis (Figure <ref type="figure">6</ref>, Table <ref type="table">S8</ref>). July soil temperature was the significant environmental predictor, meaning that differences in temperature were positively correlated with genetic differences (coefficient=0.56, Figure <ref type="figure">7</ref>, Table <ref type="table">S8</ref>). In contrast, geographic distance was negatively correlated with genetic distance for E. canadensis. Although the overall model was not significant, L. serriola geographic distance was also negatively correlated with genetic distance. Aspects of the urban environment, such as % impervious surface and distance to the city center, were not significant predictors of genetic distance for any species.</p><p>Effects of IBD and IBE largely disappeared within cities. Only two species-city combinations out of 24 had an overall significant MMRR model, C. dactylon in Phoenix and T. officinale in Los Angeles (Table <ref type="table">S9</ref>).</p><p>For C. dactylon in Phoenix, genetic distance was positively correlated with distance to city center (Table <ref type="table">S10</ref>). For T. officinale in Los Angeles, genetic distance was negatively correlated with environmental distance (April soil temperature, Table <ref type="table">S10</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>Here, we studied the genetic patterns of six cosmopolitan plant species sampled from five cities in the USA with varying environmental conditions. For these widespread species, we investigated the degree to which city populations were genetically homogenous or differentiated by city, degree of urbanization, or a combination of these drivers. Overall we found a few generalizable patterns across species. Most genetic variation was found within cities with less variation explained by city. However, Phoenix populations tended to be distinct for some species. Additionally, we found little evidence for a relationship between genetic differences and impervious surface area as a proxy for urban environments. In general, we found that homogenization at the genetic level depended on the species in question, suggesting that human activities specific to individual plant species play a role in genetic differentiation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Species-dependent differentiation</head><p>Human activity has contributed to species homogenization across cities <ref type="bibr">(Groffman et al., 2014;</ref><ref type="bibr">Padull&#233;s Cubino et al., 2020)</ref>. However, at the genetic level we find mixed patterns. We found the strongest evidence for genetic differentiation by city (i.e., Figure <ref type="figure">1b</ref>) for E. canadensis, the only native species in this study. To a lesser extent, L. serriola was also distinct by city, except for Boston. We found D.</p><p>sanguinalis and C. dactylon to be more genetically homogenous / less distinct (i.e., Figure <ref type="figure">1a</ref>). Although some group assignments were unique to Baltimore, Los Angeles, and Phoenix, many C. dactylon individuals showed mixed ancestry. We found little evidence for genetic differentiation for P. annua or T.</p><p>officinale, either by city or urbanness. These are likely to be dispersed by humans and present in seed mixes <ref type="bibr">(Buddenhagen et al., 2023;</ref><ref type="bibr">Conn, 2012)</ref>.</p><p>We found no evidence for differentiation by urbanization, nor by urbanization and city, for any species <ref type="bibr">(Figure 1c,</ref><ref type="bibr">1d)</ref>. Despite the city patterns we observed, the overall signal of differentiation among cities was weak, with low variation overall and most variation being found among sites within individual cities.</p><p>Low levels of differentiation agree with evidence suggesting that cosmopolitan plants experience high gene flow in urban habitats <ref type="bibr">(Caizergues et al., 2024;</ref><ref type="bibr">Smith et al., 2020)</ref>. Five out of six species lacked evidence for IBE and IBD, also indicating potentially high levels of gene flow facilitated by human activity.</p><p>Importantly, we used 10x10 km resolution for our environmental measures; it is possible that patterns of IBE not detected here are present at smaller scales and/or with different environmental variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Differentiation of Phoenix populations</head><p>Among the cities included in this study, Phoenix's climate is both hot and dry, with water resources modified through irrigation <ref type="bibr">(Hope et al., 2006)</ref>. We observed two different genetic patterns among our species in Phoenix. First, D. sanguinalis had a lower effective number of alleles, higher linkage disequilibrium, and a lower percentage of private alleles in Phoenix relative to other cities. Structure and sNMF showed strong single-group assignment, suggesting a largely genetically uniform sample. These results could indicate either a founder effect, bottleneck, and/or environmental filtering due to the hot, dry climate (Markert et al., 2010). Additionally, D. sanguinalis in Phoenix might have experienced city-specific management practices. In contrast, E. canadensis and L. serriola individuals collected in Phoenix were distinct through a greater percentage of private alleles and somewhat greater linkage disequilibrium. Structure and sNMF results also indicated unique group assignments for Phoenix relative to the other cities. While this might have occurred due to barriers to gene flow, local adaptation might also explain this finding. For E. canadensis, this is consistent with research by Bhattacharya et al. (2022) revealing two distinct, yet diverse subgroups of E. canadensis: plants sampled from cooler areas in northern Alabama, USA formed a distinct genetic cluster from those found in southern Alabama. Greater heterozygosity than expected might also contribute to more rapid environmental response in E. canadensis <ref type="bibr">(Aparecida et al., 2012)</ref>. Lactuca serriola in the western USA (California, Arizona) has also been found to be genetically distinct relative to the rest of the country <ref type="bibr">(Lebeda et al., 2011</ref><ref type="bibr">(Lebeda et al., , 2012))</ref>. There was also some evidence that T. officinale individuals from Phoenix might be less diverse, as suggested by comparatively low effective number of alleles and higher linkage disequilibrium, though this could be due to a smaller number of individuals sampled. Aside from Phoenix's hot and dry climate, several factors could prevent genetic homogenization, including the relatively young age of the city <ref type="bibr">(Potgieter et al., 2024)</ref>, distance from the point of introduction, as well as "island effects" in arid cities caused by irrigation <ref type="bibr">(Grijseels et al., 2023)</ref>.</p><p>These factors would mean fewer introductions, greater isolation by distance from the East coast of the US, and greater resistance to colonization, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Human behavior and genetic patterns</head><p>Human behaviors are a key component of community assembly and distribution of species, especially in cities <ref type="bibr">(Avolio et al., 2021)</ref>. For example, urbanization leads to homogenization of C3 grass species, some homogenization of C4 grass species, and greater abundance of some C4 species in warmer cities <ref type="bibr">(Trammell et al., 2019)</ref>, likely through increased, often commercial, dispersal <ref type="bibr">(Beard, 2012;</ref><ref type="bibr">Busey, 2003)</ref>.</p><p>Poa annua (a C3 grass), C. dactylon, and T. officinale exhibited low overall differentiation among cities with little population structure (i.e., Figure <ref type="figure">1a</ref>). Other studies have also found a lack of population structure, high gene flow, and/or panmixia for P. annua and T. officinale <ref type="bibr">(Androsiuk et al., 2019;</ref><ref type="bibr">Chen et al., 2003;</ref><ref type="bibr">Mazumder &amp; Kesseli, 2021)</ref>. Again, the lack of structure may be due to both P. annua and T.</p><p>officinale being common contaminants of turf grass seeds mixes, which are commercially distributed <ref type="bibr">(Conn, 2012)</ref>. Additionally, T. officinale can be grown for medicinal uses and seeds purchased <ref type="bibr">(Stewart-Wade et al., 2002)</ref>. While C. dactylon differs globally at the genetic level, many North American individuals have mixed ancestry <ref type="bibr">(Singh et al., 2023;</ref><ref type="bibr">Zhang et al., 2019</ref><ref type="bibr">Zhang et al., , 2021))</ref>; C. dactylon is also sold as a turf grass in the US, which might lead to a lack of differentiation <ref type="bibr">(Taliaferro, 1995)</ref>. While our genomewide approach suggests genetic homogenization in these species and cities, it is important to note that phenotypically relevant genetic differentiation may still be occurring through gene sets not captured by our loci.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Native status and genetic structure</head><p>High gene flow resulting from frequent introductions might lead to reduced genetic structure and limited local adaptation within species <ref type="bibr">(Smith et al., 2020)</ref>. Erigeron canadensis, the only native species in this study, had the strongest separation among cities based on the population summary statistics, PCA, and pairwise rho. Among cities where it was present, E. canadensis had among the highest percentages of private alleles. We also observed a significant MMRR model, with July soil temperature and geographic distance being linked to genetic distance in E. canadensis.</p><p>These results could indicate that E. canadensis responds to July soil temperatures, especially in Phoenix. Previous work suggests that genetic differentiation of E. canadensis populations is correlated with climate, particularly aridity (Rosche et al., 2019). Despite the genetic differences observed, a significant negative geographic distance coefficient suggests greater similarity than expected across distances, indicating that gene flow is likely to still be occurring. Interestingly, previous work has shown low diversity and high rates of selfing in E. canadensis (Rosche et al., 2019), which could reinforce small amounts of genetic variation across distance and environment. Overall, these results suggest that observing genetic structure in cosmopolitan species' native ranges can provide valuable information about idiosyncratic responses. More work is needed to understand if this pattern is generalizable or unique to E. canadensis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Importance of local environments</head><p>Local environments and life history traits are likely important for influencing genetic structure and differentiation of the species in this study. For all species, most of the genetic variation could be attributed to differences within cities. Individuals also varied considerably within sites. This could indicate that instead of responding to landscape-level environmental variables, species could be responding to microenvironments within cities <ref type="bibr">(Calfapietra et al., 2015)</ref>, such as soil depth/quality, soil contaminants, local temperature or soil moisture variation, or presence of competing species. Among four of our species, we also found a higher proportion of homozygotes and evidence for linkage disequilibrium indicating potentially high rates of asexual reproduction. Baker's rule states that when plants are colonizing a new area outside of their natural range (e.g., most urban spontaneous plants), selfcompatible fertilization is an advantageous trait given the potential lack of pollinators <ref type="bibr">(Baker, 1974;</ref><ref type="bibr">Kalisz &amp; Vogler, 2003)</ref>. Low genetic diversity and differentiation could be due to self-compatibility or clonal reproduction by the species in this study. Finally, we observed the most city-level population structure in E. canadensis and L. serriola, the two diploid species. Diploids are expected to lose alleles more quickly than polyploids, leading to more rapid differentiation. Alternatively, polypoids could be more successful at colonizing and re-colonizing different cities (Te <ref type="bibr">Beest et al., 2012)</ref>, helping them maintain higher genetic diversity. Ultimately, local environments and functional traits could be more influential to genetic patterns in cosmopolitan species than the regional variables used here.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>Our results suggest that genetic patterns of urban plants might be driven less by urban environments (i.e., percent impervious surface), and more by human actions and species' life history. We detected no genetic patterns associated with percent impervious surface across six different species in our study;</p><p>however, it should be noted that other proxies of urbanization, such as those that focus on fine-scale local environments, might reveal different results. Our work supports the finding that urban evolution is complex, with colonization events and gene flow playing especially important roles for cosmopolitan species <ref type="bibr">(Johnson &amp; Munshi-South, 2017;</ref><ref type="bibr">Miles et al., 2019)</ref>. When performing urban evolution studies, careful consideration should be paid to each species in question, considering its reproductive/dispersal strategy and life history traits that might increase genetic similarity of individuals across cities. In particular, urban evolution studies should consider human activity. Accounting for these differences will ensure better prediction of plant evolution and adaptation to rapidly changing urban landscapes.</p><p>Buyarski, Sophia Hahn, Ben Huber, Hannah Stellrecht, Kyle TePoel, Sara Nelson and Hannah Weisner for field assistance. For sam-pling in Phoenix, we thank Darin Jenke, Erik Nelson, Hannah Heaven-rich, Alyssa Bailey, Caitlin Ribeiro, Christal Beauclaire-Reyes, Matthew Minjares, Randy Fulford, Amy Smeester, Manas Subberaman, Jack Oberhaus, and Laura Steger. We thank Mary Phillips and Erin Sweeney from the National Wildlife Federation in accessing Wildlife Certified&#169; yards. This research was supported by the National Science Foundation Macrosystems Biology program, grants DEB-1638519 (Minneapolis-St. Paul to SEH and JCB), DEB-163872, DEB-1637590, and DEB-1832016 (Phoenix to SJH), DEB-163856 (Boston to Christopher Neill), DEB-1638648 (Baltimore to PMG), DEB-1638606 and EF-1638676 (Los Angeles to TLET), and DEB-1836034 (for the genetic work to MLA) as well as a US NSF Division of Environmental biology grant on evolutionary processes at Long-term ecological research sites to MLA and AMH (DEB-2110351). Computational steps were carried out at the Advanced Research Computing at Hopkins (ARCH) core facility (rockfish.jhu.edu), which is supported by the National Science Foundation (NSF) grant number OAC 1920103.</p><p>Tables <ref type="table">Table 1:</ref> Species-level attributes of the plant species in this study, including life history traits and native status in the USA. Note that pollination tracks functional type among these species: grasses are wind-pollinated while forbs are insect pollinated.        </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Species</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>In this experiment, we used quaddRAD library prep to prepare the sample DNA. This means that there were both two unique outer barcodes (typical Illumina barcodes) AND two unique inner barcodes (random barcode bases inside the adapters) for each sample -over 1700 to be exact!</p><p>The sequencing facility demultiplexes samples based on the outer barcodes (typically called 5nn and i7nn). Once this is done, each file still contains a mix of the inner barcodes. We will refer to these as "sublibraries" because they are sort of halfway demultiplexed. We separate them out bioinformatically later.</p><p>2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="0.1">Raw Data File Naming -Sublibraries</head><p>Here's a bit of information on the file name convention. The typical raw file looks like this:</p><p>AMH_macro_1_1_12px_S1_L001_R1_001.fastq.gz</p><p>&#8226; These are author initials and "macro" stands for "Macrosystems". These are on every file.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AMH_macro</head><p>&#8226; The first number is the i5nn barcode for the given sublibrary. We know all these samples have a i5nn barcode "1", so that narrows down what they can be. The second number is the i7nn barcode for the given sublibrary. We know all these samples have a i7nn barcode "1", so that further narrows down what they can be.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1_1</head><p>&#8226; This refers to how many samples are in the sublibrary. "12px" means 12-plexed, or 12 samples. In other words, we will use the inner barcodes to further distinguish 12 unique samples in this sublibrary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>12px</head><p>&#8226; This is a unique sublibrary name. S1 = 1 i5nn and 1 i7nn.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>S1</head><p>&#8226; This means this particular file came from lane 1 of the NovaSeq. There are four lanes. All samples should appear across all four lanes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>L001</head><p>&#8226; This is the first (R1) of two paired-end reads (R1 and R2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>R1</head><p>&#8226; The last part doesn't mean anything -it was just added automatically before the file suffix (fastq.gz) 001.fastq.gz</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="0.2">A Note on File Transfers</head><p>There are three main systems at play for file transfer: the local machine, the sequencing facility's (GRCF) Aspera server, and MARCC. The Aspera server is where the data were/are stored immediately after sequencing.</p><p>MARCC is where we plan to do preprocessing and analysis. Scripts and text files are easy for me to edit on my local machine. We used Globus to transfer these small files from my local machine to MARCC.</p><p>Midway through this analyses, we transitioned to another cluster, JHU's Rockfish. Scripts below, with the exception of file transfer from the Aspera server, should reflect the new filesystem, though you will have to adjust the file paths accordingly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="0.3">A Note on Species Names</head><p>Throughout this study, we examined 6 species. Sometimes we used abbreviations for easier file naming. These are:</p><p>1. Cynodon dactylon, "CD", or Bermuda grass 2. Digitaria sanguinalis, "DS", or crabgrass 3. Erigeron canadensis, "EC", or horseweed 4. Lactuca serriola, "LS", or prickly lettuce 5. Poa annua, "PA", or bluegrass 6. Taraxacum officinale, "TO", or dandelion 3</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">File Transfer</head><p>Referred to through files as "Step 1". Files can be found in the 01_transfer_files/ directory.</p><p>This directory contains files named in this convention: 01-aspera_transfer_n.txt. These are text files containing the names of fastq.gz files that we wanted to transfer from the sequencing facility's Aspera server to the computing cluster (MARCC). This was to maximize ease of transferring only certain files over at once, since transferring could take a long time. We definitely did this piecemeal. Possible file names shown in Aspera Transfer File Names. There are multiple of these files so that we could parallelize (replace n with the correct number in the command used below). This text file will need to be uploaded to your scratch directory in MARCC.</p><p>Files were then transferred using the following commands. Before starting, make sure you are in a data transfer node. Then, load the aspera module. Alternatively, you can install the Aspera transfer software and use that.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>module load aspera</head><p>Initiate the transfer from within your scratch directory:</p><p>ascp -T -l8G -i /software/apps/aspera/3.9.1/etc/asperaweb_id_dsa.openssh --file-list=01-aspera_transfer_n.txt --mode=recv --user=&lt;aspera-user&gt; --host=&lt;aspera-IP&gt; /scratch/users/&lt;me&gt;@jhu.edu</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">File Concatenation and Stacks Installation</head><p>Referred to through files as "</p><p>Step 2". Files can be found in the 02_concatenate_and_check/ directory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Concatenate Files for each Sublibrary</head><p>Step 2a. We ran my samples across the whole flow cell of the NovaSeq, so results came in 8 files for each demultiplexed sublibrary (4 lanes * paired reads</p><p>). For example, for sublibrary 1_1, we'd see the following 8 files: AMH_macro_1_1_12px_S1_L001_R1_001.fastq.gz AMH_macro_1_1_12px_S1_L001_R2_001.fastq.gz AMH_macro_1_1_12px_S1_L002_R1_001.fastq.gz AMH_macro_1_1_12px_S1_L002_R2_001.fastq.gz AMH_macro_1_1_12px_S1_L003_R1_001.fastq.gz AMH_macro_1_1_12px_S1_L003_R2_001.fastq.gz AMH_macro_1_1_12px_S1_L004_R1_001.fastq.gz AMH_macro_1_1_12px_S1_L004_R2_001.fastq.gz</p><p>The 02_concatenate_and_check/02-concat_files_across4lanes.sh script finds all files in the working directory with the name pattern *_L001_*.fastq.gz and then concatenates across lanes 001, 002, 003, and 004 so they can be managed further. The "L001" part of the filename is then eliminated. For example the 8 files above would become:</p><p>AMH_macro_1_1_12px_S1_R1.fastq.gz AMH_macro_1_1_12px_S1_R2.fastq.gz</p><p>Rockfish uses slurm to manage jobs. To run the script, use the sbatch command. For example:</p><p>sbatch ~/code/02-concat_files_across4lanes.sh This command will run the script from within the current directory, but will look for and pull the script from the code directory. This will concatenate all files within the current directory that match the loop pattern. Each sub-pooled library also has a demultiplexing file (04-demux/ directory) that contains the sample names and inner (i5 and i7) barcodes. For example, the sublibrary 1_1, we'd see the following barcode file:</p><p>The process_radtags command will demultiplex the data by separating out each sublibrary into the individual samples. It will then clean the data, and will remove low quality reads and discard reads where a barcode was not found.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Organize files</head><p>Step 4b. In a new directory, make sure the files are organized by species. In the process_radtags script, we specified that files be sent to ~/scratch/demux/*sublibrary_name* (reasoning for this is in Step 4c), but files should manually be organized into species folders (i.e., ~/scratch/demux/*SPP*) after process_radtags is performed. For example, the file "DS.MN.L01-DS.M.1.1.fq.gz" should be sent to the ~/scratch/demux/DS directory.</p><p>Note: this is not automated at this point but it would be nice to automate the file moving process so it's not forgotten at this point.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Assess the raw, processed, and cleaned data</head><p>Step 4c. In the script for Step 4a, we have specified that a new output folder be created for each sublibrary. The output folder is where all sample files and the log file will be dumped for each sublibrary. It is important to specify a different output folder if you have multiple sublibraries because we will be assessing the output log for each sublibrary individually (and otherwise, the log is overwritten when the script loops to a new sublibrary).</p><p>The utility stacks-dist-extract can be used to extract data from the log file. First, we examined the library-wide statistics to identify sublibraries where barcodes may have been misentered or where sequencing error may have occurred. We used:</p><p>stacks-dist-extract process_radtags.log total_raw_read_counts to pull out data on the total number of sequences, the number of low-quality reads, whether barcodes were found or not, and the total number of retained reads per sublibary. Look over these to make sure there are no outliers or sublibraries that need to be checked and rerun.</p><p>Next, we used:</p><p>stacks-dist-extract process_radtags.log per_barcode_raw_read_counts to analyze how well each sample performed. There are three important statistics to consider for each sample.</p><p>1. The proportion of reads per sample for each sublibrary indicates the proportion that each individual was processed and sequenced within the overall library. This is important to consider as cases where a single sample dominates the sublibrary may indicate contamination. (Field prop_sample_per_library).</p><p>2. The number of reads retained for each sample can be an indicator of coverage. It is most likely a good idea to remove samples with a very low number of reads. Where you decide to place the cutoff for low coverage samples is dependent on your dataset. For example, a threshold of 1 million reads is often used but this is not universal. (Field retained_reads).</p><p>3. The proportion of reads retained for each sample can also indicate low-quality samples and will give an idea of the variation in coverage across samples. (Field prop_reads_retained_per_sample).</p><p>Output for sublibraries for this step are summarized in process_radtags-library_output.csv.</p><p>Output for individual samples for this step are summarized in process_radtags-sample_output.csv.</p><p>The script 04c-process_radtags_stats.R was used to create many plots for easily assessing each statistic.</p><p>Output from this step can be found in figures/process_radtags/ where figures are organized by species.</p><p>The script 04c-radtags_filter_summary.R summarizes the filtering results from all samples.</p><p>source("04_demux_filter/04c-radtags_filter_summary.R") make_filterplot() </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Raw data availability</head><p>This is the point at which raw data are available, since it is where they were demultiplexed. Originally, our inventory had 1736 envelopes/samples indexed. Four envelopes/samples had no leaves in them, and 8 5 samples were the wrong species (Taraxacum erythrospermum), and were excluded. We also had one sample that failed to yield any reads (LS.LA.MAR.U.1); it's unclear what happened with this sample (DNA concentration, amplification looked fine -perhaps accidentally given the wrong barcodes). This gives 1726 samples available on SRA. Note that many of these are excluded due to low coverage, etc., in subsequent steps.</p><p>Raw data is available here: <ref type="url">https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1359434</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Identify low-coverage and low-quality samples</head><p>Step 4d. Using the same output log and the above statistics, we removed low-coverage and low-quality samples that may skew downstream analyses.</p><p>Samples were identified and removed via the following procedure:</p><p>1. First, samples that represented less than 1% of the sequenced sublibrary were identified and removed. These samples correlate to low-read and low-coverage samples.</p><p>2. Next, a threshold of 1 million retained reads per sample was used to remove any remaining low-read samples. Low-read samples correlate to low coverage and will lack enough raw reads to contribute to downstream analyses.</p><p>Good/kept samples are listed in process_radtags-kept_samples.csv.</p><p>Discarded samples are listed in process_radtags-discarded_samples.csv.</p><p>source("04_demux_filter/04c-radtags_filter_summary.R") make_manual_discard_plot()</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Samples kept</head><p>Samples discarded: Less than 1% of sublibrary 9</p><p>Note: At this point, we started using Stacks 2.62 for its multi-threading capabilities. Functionality of the previous steps should be the same, however.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Stacks: Metapopulation Catalog Building and Parameter Search</head><p>Files can be found in the 05_ustacks_and_params/ directory.</p><p>Going forward, when we use the term metapopulation, we are referring to the collection of all samples within species among all cities where the species was present.</p><p>It is important to conduct preliminary analyses that will identify an optimal set of parameters for the dataset (see Step 5a). Following the parameter optimization, the program ustacks can be run to generate a catalog of loci.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Run denovo_map</head><p>Step 5a. Stack assembly will differ based on several different aspects of the dataset (such as the study species, the RAD-seq method used, and/or the quality and quantity of DNA used). So it is important to use parameters that will maximize the amount of biological data obtained from stacks.</p><p>There are three main parameters to consider when doing this:</p><p>1. m = controls the minimum number of raw reads required to form a stack (implemented in ustacks)</p><p>2. M = controls the number of mismatches between stacks to to merge them into a putative locus (implemented in ustacks)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">n = controls the number of mismatches allowed between stacks to merge into the catalog (implemented in cstacks)</head><p>There are two main ways to optimize parameterization:</p><p>1. an iterative method were you sequentially change each parameter while keeping the other parameters fixed (described in Paris et al. 2017 ), or 2. an iterative method were you sequentially change the values of M and n (keeping M = n) while fixing m = 3, and then test m = 2, 4 once the optimal M = n is determined (described in Rochette and Catchen 2017, Catchen 2020 ).</p><p>We performed the second method and used the denovo_map.sh script to run the denovo_map.pl command to perform iterations. This script requires that we first choose a subset of samples to run the iterations on. The samples should be representative of the overall dataset; meaning they should include all populations and have similar read coverage numbers. Read coverage numbers can be assessed by looking at the descriptive statistics produced from Step 4c.</p><p>Place these samples in a text file (popmap_test_samples.txt) with the name of the sample and specify that all samples belong to the same population. For example, popmap_test_samples.txt should look like. . .</p><p>DS.BA.GA.U.1 A DS.PX.BUF.M.5 A DS.B0.HC4.M.1 A ...</p><p>It is important to have all representative samples treated as one population because you will assess outputs found across 80% of the individuals. The script will read this text file from the --popmap argument.</p><p>The script also requires that you specify an output directory after -o. This should be unique to the parameter you are testing. . . for example, if you are testing M = 3, then you could make a subdirectory labeled stacks.M3 where all outputs from denovo_map.sh will be placed. Otherwise, for each iteration, the outputs will be overwritten and you will lose the log from the previous iteration. The denovo_map.sh script also 10 requires that you direct it toward where your samples are stored, which is your directory built in Step 4b. Make sure to run the --min-samples-per-pop 0.80 argument.</p><p>To decide which parameters to use, examine the following from each iteration:</p><p>1. the average sample coverage: This is obtained from the summary log in the ustacks section of denovo_map.log. If samples have a coverage &lt;10x, you will have to rethink the parameters you use here.</p><p>2. the number of assembled loci shared by 80% of samples: This can be found in the haplotypes.tsv by counting the number of loci:</p><p>populations.haplotypes.tsv | grep -v &#710;"#" | wc -l 3. the number of polymorphic loci shared by 80% of samples: This can be found in populations.sumstats.tsv or by counting populations.hapstats.tsv: cat populations.hapstats.tsv | grep -v "&#710;#" | wc -l 4. the number of SNPs per locus shared by 80% of samples: found in denovo_map.log or by counting the number of SNPs in populations.sumstats.tsv: cat populations.sumstats.tsv | grep -v &#710;"#" | wc -l</p><p>The script 05a-param_opt-figures_script.R was used to create plots for assessing the change in shared loci across parameter iterations.</p><p>Based on this optimization step, we used the following parameters:</p><p>Table S1: Final parameter optimization values for the Stacks pipeline. Species M (locus mismatches) n (catalog mismatches) m (minimum reads) CD 8 8 3 DS 10 10 3 EC 8 8 3 LS 7 7 3 PA 5 5 3 TO 6 6 3</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Run ustacks</head><p>Step 5b. ustacks builds de novo loci in each individual sample. We have designed the ustacks script so that the process requires three files:</p><p>&#8226; 05-ustacks_n.sh : the shell script that executes ustacks &#8226; 05-ustacks_id_n.txt : the sample ID number &#8226; 05-ustacks_samples_n.txt : the sample names that correspond to the sample IDs</p><p>The sample ID should be derived from the order_id column (first column) on the master sample spreadsheet.</p><p>It is unique (1-1736) across all of the samples.</p><p>The sample name is the corresponding name for each sample ID in the spreadsheet. E.g., sample ID "9" corresponds to sample name "DS.BA.DHI.U.4". Sample naming convention is:</p><p>species.city.site.management_type.replicate_plant 05-ustacks_n.sh should have an out_directory (-o option) that will be used for all samples (e.g., stacks/ustacks). Files can be processed piecemeal into this directory. There should be three files for every sample in the output directory:</p><p>11 &#8226; &lt;samplename&gt;.alleles.tsv.gz &#8226; &lt;samplename&gt;.snps.tsv.gz &#8226; &lt;samplename&gt;.tags.tsv.gz</p><p>Multiple versions of the 05-ustacks_n.sh script can be run in parallel (simply replace n in the three files above with the correct number).</p><p>A small number of samples ( <ref type="formula">13</ref>) were discarded at this stage as the ustacks tool was unable to form any primary stacks corresponding to loci. See output/ustacks-discarded_samples.csv.</p><p>Table S2: Summary of samples discarded at the ustacks step of the Stacks pipeline. Numbers reflect the mean per sample. ustacks discarded Retained reads Proportion of sub-library no 10075192 0.1608982 yes 7490510 0.1230658</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Correct File Names</head><p>Step 5c. This step contains a script 05b-fix_filenames.sh which uses some simple regex to fix filenames that are output in previous steps. Stacks adds an extra "1" at some point at the end of the sample name which is not meaningful. For example, the following files:</p><p>&#8226; DS.</p><p>MN.L02-DS.M.3.1.alleles.tsv.gz &#8226; DS.MN.L03-DS.U.2.1.tags.tsv.gz &#8226; DS.MN.L09-DS.U.1.1.snps.tsv.gz become: &#8226; DS.MN.L02-DS.M.3.alleles.tsv.gz &#8226; DS.MN.L03-DS.U.2.tags.tsv.gz &#8226; DS.MN.L09-DS.U.1.snps.tsv.gz</p><p>The script currently gives some strange log output, so it can probably be optimized/improved. The script should be run from the directory where the changes need to be made. Files that have already been fixed will not be changed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Choose catalog samples/files</head><p>Step 5d. In the next step, we will choose the files we want to go into the catalog. This involves a few steps:</p><p>1. Create a meaningful directory name. This could be the date (e.g., stacks_22_01_25).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Copy the</head><p>ustacks output for all of the files you want to use in the reference from Step 5b. Remember this includes three files per sample. So if you have 20 samples you want to include in the reference catalog, you will transfer 3 x 20 = 60 files into the meaningful directory name. The three files per sample should follow this convention: &#8226; &lt;samplename&gt;.alleles.tsv.gz &#8226; &lt;samplename&gt;.snps.tsv.gz &#8226; &lt;samplename&gt;.tags.tsv.gz 3. Remember the meaningful directory name. You will need it in Step 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Stacks: Metapopulation catalog with cstacks</head><p>Files can be found in the 06_cstacks/ directory.</p><p>12 cstacks builds the locus catalog from all the samples specified. The accompanying script, cstacks_SPECIES.sh is relatively simple since it points to the directory containing all the sample files. It follows this format to point to that directory:</p><p>cstacks -P ~/directory ... Make sure that you use the meaningful directory from Step 5c and that you have copied all the relevant files over. Otherwise this causes problems downstream. For example, you might edit the code to point to ~/scratch/stacks/stacks_22_01_25. cstacks -P ~/scratch/stacks/stacks_22_01_25 ...</p><p>The tricky thing is ensuring enough compute memory to run the entire process successfully. There is probably space to optimize this process.</p><p>The cstacks method uses a "population map" file, which in this project is cstacks_popmap_SPECIES.txt. This file specifies which samples to build the catalog from and categorizes them into your 'populations', or in this case, cities using two tab-delimited columns, e.g.:</p><p>.BA.GA.U.1 Baltimore DS.BA.GA.U.2 Baltimore DS.BA.GA.U.3 Baltimore DS.BA.GA.U.4 Baltimore DS.BA.GA.U.5 Baltimore ... Make sure the samples in this file correspond to the input files located in e.g., ~/scratch/stacks/stacks_22_01_25. cstacks builds three files for use in all your samples (in this pipeline run), mirroring the sample files output by ustacks: &#8226; catalog.alleles.tsv.gz &#8226; catalog.snps.tsv.gz &#8226; catalog.tags.tsv.gz Table S3: Subset of samples used in SNP catalog creation. Sample Species City DS.BA.PIK.U.1 DS BA DS.BA.GA.U.4 DS BA DS.BA.LH-1.M.4 DS BA DS.BA.LH-3.M.1 DS BA DS.BA.WB.U.2 DS BA DS.BA.LL-4.M.5 DS BA DS.BA.LH-2.M.5 DS BA DS.BA.TRC.U.3 DS BA DS.BA.W3.M.2 DS BA DS.BA.RG-1.M.1 DS BA DS.BA.LL-3.M.3 DS BA DS.BA.RG-2.M.4 DS BA DS.BO.HC1.M.3 DS BO DS.BO.HC4.M.5 DS BO DS.BO.LC1.M.3 DS BO DS.BO.LC2.M.2 DS BO DS.BO.LC3.M.5 DS BO DS.BO.WL1.M.2 DS BO DS.BO.WL2.M.1 DS BO DS.BO.WL3.M.5 DS BO 13 Sample Species City DS.BO.I4.U.1 DS BO DS.BO.R1.U.4 DS BO DS.BO.R2.U.2 DS BO DS.BO.R4.U.4 DS BO DS.MN.L05-DS.M.3 DS MN DS.MN.L09-DS.M.3 DS MN DS.MN.L11-DS.M.1 DS MN DS.MN.L02-DS.U.1 DS MN DS.MN.L02-DS.M.4 DS MN DS.MN.L03-DS.U.3 DS MN DS.MN.L04-DS.U.5 DS MN DS.MN.L06-DS.U.3 DS MN DS.MN.L07-DS.U.3 DS MN DS.MN.L09-DS.U.3 DS MN DS.MN.L11-DS.U.1 DS MN DS.MN.L11-DS.U.5 DS MN DS.PX.BUF.M.1 DS PX DS.PX.PIE.M.2 DS PX DS.PX.ALA.M.1 DS PX DS.PX.MTN.M.6 DS PX DS.PX.LAP.M.3 DS PX DS.PX.NUE.M.4 DS PX DS.PX.WES.M.2 DS PX DS.PX.DF1.M.1 DS PX DS.PX.ENC.M.1 DS PX DS.PX.DOW.M.1 DS PX DS.PX.DOW.M.4 DS PX DS.PX.DF2.M.3 DS PX CD.BA.LA.U.2 CD BA CD.BA.TRC.U.3 CD BA CD.BA.WGP.M.2 CD BA CD.BA.LH-2.M.2 CD BA CD.BA.LL-4.M.1 CD BA CD.BA.PIK.U.2 CD BA CD.BA.WB.U.2 CD BA CD.BA.CP.U.4 CD BA CD.BA.FH.U.1 CD BA CD.BA.PSP.M.4 CD BA CD.BA.AA.U.4 CD BA CD.BA.RG-1.M.2 CD BA CD.BA.W3.M.3 CD BA CD.BA.GA.U.3 CD BA CD.BA.WBO.U.5 CD BA CD.LA.WHI.M.3 CD LA CD.LA.SEP.M.3 CD LA CD.LA.SEP.M.4 CD LA CD.LA.ROS.M.5 CD LA CD.LA.MR2.M.2 CD LA CD.LA.ALL.M.2 CD LA CD.LA.ALL.M.5 CD LA CD.LA.VAL.M.5 CD LA CD.LA.HAR.M.4 CD LA 14 Sample Species City CD.LA.LUB.M.3 CD LA CD.LA.GLO.M.4 CD LA CD.LA.ZOO.M.3 CD LA CD.LA.NWH.M.5 CD LA CD.LA.KIN.M.3 CD LA CD.LA.KIN.M.5 CD LA CD.PX.CAM.U.5 CD PX CD.PX.MON.U.5 CD PX CD.PX.PKW.U.5 CD PX CD.PX.LAP.M.4 CD PX CD.PX.NES.U.4 CD PX CD.PX.PAL.M.3 CD PX CD.PX.ASU.M.1 CD PX CD.PX.NUE.M.5 CD PX CD.PX.WES.M.3 CD PX CD.PX.MAN.M.4 CD PX CD.PX.CLA.M.3 CD PX CD.PX.DF1.M.5 CD PX CD.PX.COY.M.5 CD PX CD.PX.RPC.M.3 CD PX CD.PX.ENC.M.2 CD PX EC.BA.LH-2.M.2 EC BA EC.BA.WBO.U.4 EC BA EC.BA.WB.U.5 EC BA EC.BA.FH.U.3 EC BA EC.BA.CP.U.2 EC BA EC.BA.TRC.U.3 EC BA EC.BA.LL-4.M.4 EC BA EC.BA.WB.U.1 EC BA EC.BA.PIK.U.5 EC BA EC.BA.PSP.M.4 EC BA EC.BA.GA.U.2 EC BA EC.BA.LL-3.M.3 EC BA EC.BA.ML.U.1 EC BA EC.BA.TRC.U.5 EC BA EC.BA.ML.U.3 EC BA EC.LA.SGB.U.2 EC LA EC.LA.SGB.U.5 EC LA EC.LA.DUR.U.2 EC LA EC.LA.HOW.U.2 EC LA EC.LA.SAN.U.2 EC LA EC.LA.VER.U.1 EC LA EC.LA.VER.U.4 EC LA EC.LA.VB2.U.4 EC LA EC.LA.AC2.U.2 EC LA EC.LA.AC1.U.1 EC LA EC.LA.VB1.U.1 EC LA EC.LA.VB1.U.3 EC LA EC.LA.SGR.U.4 EC LA EC.LA.SGR.U.5 EC LA EC.LA.HOW.U.3 EC LA EC.PX.BUF.M.1 EC PX 15 Sample Species City EC.PX.BUF.M.3 EC PX EC.PX.ALA.M.3 EC PX EC.PX.MTN.M.2 EC PX EC.PX.WES.M.1 EC PX EC.PX.WES.M.2 EC PX EC.PX.MAN.M.1 EC PX EC.PX.CLA.M.1 EC PX EC.PX.PSC.M.1 EC PX EC.PX.DF1.M.1 EC PX EC.PX.DOW.M.1 EC PX EC.PX.DOW.M.2 EC PX EC.PX.COY.M.2 EC PX EC.PX.COY.M.3 EC PX EC.PX.ALA.M.5 EC PX LS.BA.WB.U.1 LS BA LS.BA.WB.U.2 LS BA LS.BA.DHI.U.2 LS BA LS.BA.GA.U.1 LS BA LS.BA.PIK.U.3 LS BA LS.BA.PIK.U.5 LS BA LS.BA.CP.U.2 LS BA LS.BA.ML.U.2 LS BA LS.BA.WBO.U.3 LS BA LS.BO.WL3.M.4 LS BO LS.BO.I1.U.1 LS BO LS.BO.I2.U.1 LS BO LS.BO.WL2.M.2 LS BO LS.BO.R1.U.2 LS BO LS.BO.R2.U.4 LS BO LS.BO.R3.U.3 LS BO LS.BO.HC4.M.3 LS BO LS.BO.LC4.M.2 LS BO LS.LA.VET.M.4 LS LA LS.LA.SSV.M.1 LS LA LS.LA.NAV.M.4 LS LA LS.LA.SHO.M.2 LS LA LS.LA.WES.M.3 LS LA LS.LA.GLO.M.3 LS LA LS.LA.HOW.U.5 LS LA LS.LA.SAN.U.2 LS LA LS.LA.ARR.U.2 LS LA LS.MN.L06-LS.U.2 LS MN LS.MN.L06-LS.U.5 LS MN LS.MN.L07-LS.U.4 LS MN LS.MN.L08-LS.U.5 LS MN LS.MN.L09-LS.U.3 LS MN LS.MN.L01-LS.M.4 LS MN LS.MN.L01-LS.U.3 LS MN LS.MN.L02-LS.U.1 LS MN LS.MN.L05-LS.U.2 LS MN LS.PX.MON.U.2 LS PX LS.PX.PKW.U.5 LS PX 16 Sample Species City LS.PX.PIE.M.4 LS PX LS.PX.ALA.M.3 LS PX LS.PX.PAL.M.3 LS PX LS.PX.MAN.M.2 LS PX LS.PX.NUE.M.1 LS PX LS.PX.ENC.M.4 LS PX LS.PX.COY.M.3 LS PX PA.BA.PIK.U.1 PA BA PA.BA.LH-3.M.2 PA BA PA.BA.LH-3.M.3 PA BA PA.BA.WB.U.1 PA BA PA.BA.AA.U.1 PA BA PA.BA.WGP.M.3 PA BA PA.BA.LL-4.M.3 PA BA PA.BA.LA.U.2 PA BA PA.BA.LH-2.M.2 PA BA PA.BA.W3.M.3 PA BA PA.BA.RG-1.M.2 PA BA PA.BA.LL-3.M.5 PA BA PA.BO.I2.U.3 PA BO PA.BO.HC1.M.4 PA BO PA.BO.R3.U.2 PA BO PA.BO.HC4.M.5 PA BO PA.BO.R4.U.2 PA BO PA.BO.WL2.M.5 PA BO PA.BO.WL4.M.4 PA BO PA.BO.LC4.M.4 PA BO PA.BO.HC2.M.1 PA BO PA.BO.R1.U.2 PA BO PA.BO.WL1.M.1 PA BO PA.BO.I1.U.5 PA BO PA.LA.ALL.M.5 PA LA PA.LA.SEP.M.1 PA LA PA.LA.SEP.M.5 PA LA PA.LA.WHI.M.2 PA LA PA.LA.ROS.M.5 PA LA PA.LA.LUB.M.2 PA LA PA.LA.GLO.M.2 PA LA PA.LA.ZOO.M.4 PA LA PA.LA.ZOO.M.5 PA LA PA.LA.NWH.M.2 PA LA PA.LA.KIN.M.4 PA LA PA.LA.POP.M.4 PA LA PA.PX.BUF.M.3 PA PX PA.PX.PIE.M.4 PA PX PA.PX.LAP.M.5 PA PX PA.PX.ALA.M.1 PA PX PA.PX.PAP.M.2 PA PX PA.PX.PAP.M.5 PA PX PA.PX.DF1.M.2 PA PX PA.PX.RPP.U.3 PA PX PA.PX.ENC.M.4 PA PX Sample Species City PA.PX.ENC.M.5 PA PX PA.PX.COY.M.1 PA PX PA.PX.BUF.M.2 PA PX TO.BA.WBO.U.4 TO BA TO.BA.CP.U.1 TO BA TO.BA.FH.U.1 TO BA TO.BA.LH-3.M.4 TO BA TO.BA.WGP.M.3 TO BA TO.BA.GA.U.4 TO BA TO.BA.PIK.U.4 TO BA TO.BA.PSP.M.1 TO BA TO.BA.RG-2.M.2 TO BA TO.BO.HC1.M.4 TO BO TO.BO.HC2.M.5 TO BO TO.BO.HC3.M.1 TO BO TO.BO.HC4.M.5 TO BO TO.BO.LC1.M.1 TO BO TO.BO.LC2.M.5 TO BO TO.BO.LC3.M.1 TO BO TO.BO.WL2.M.1 TO BO TO.BO.I2.U. Files can be found in the 07_sstacks/ directory.</p><p>All samples in the population (or all samples you want to include in the analysis) are matched against the catalog produced in cstacks with sstacks, run in script stacks_SPECIES.sh and stacks_SPECIES_additional.sh. It runs off of the samples based in the output directory and the listed samples in sstacks_samples_SPECIES.txt and sstacks_samples_SPECIES_additional.txt (respectively), so make sure all your files (sample and catalog, etc.) are there and match. sstacks_samples_SPECIES.txt takes the form:</p><p>.1 DS.BA.GA.U.2 DS.BA.GA.U.3 DS.BA.GA.U.4 DS.BA.GA.U.5 ...</p><p>There should be a new file produced at this step for every sample in the output directory:</p><p>&#8226; &lt;samplename&gt;.matches.tsv.gz</p><p>A small number of samples generated very few matches to the catalog (such as only 4 loci matching, obviously not enough to draw any conclusions) and therefore aren't used in the next step. See output/sstacks-discarded_samples.csv.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Genotype probabilities with polyRAD</head><p>Files can be found in the 08_polyRAD/ directory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.1">Make RADdata object</head><p>We used the polyRAD package to call genotypes because many of our species are polyploid or have historical genome duplication. PolyRAD takes the catalog output (catalog.alleles.tsv.gz) and accompanying matches to the catalog (e.g., CD.BA.AA.U.1.matches.tsv.gz) to create genotype likelihoods for species with diploidy and/or polyploidy.</p><p>We used the catalog and match files to create a RADdata object class in R for each species. We ran this on the Rockfish HPC at Johns Hopkins University, with the make_polyRAD_&lt;spp&gt;.R script doing the brunt of the work. The R script was wrapped by polyrad_make_&lt;spp&gt;.sh to submit the script to the SLURM scheduler.</p><p>Relevant Parameters:</p><p>&#8226; min.ind.with.reads was set to 20% of samples. This means we discarded any loci not found in at least 20% of samples for each species. &#8226; min.ind.with.minor.allele was set to 2. This means a locus must have at least this many samples with reads for the minor allele in order to be retained.</p><p>Requires:</p><p>&#8226; popmap_&lt;spp&gt;_polyrad.txt, a list of samples and population &#8226; output from sstacks Outputs:</p><p>&#8226; &lt;spp&gt;_polyRADdata.rds, RDS object (the RADdata object)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.2">Calculate overdispersion</head><p>Next, we calculated overdispersion using the polyRAD_overdispersion_&lt;spp&gt;.R script, wrapped by polyrad_overd_&lt;spp&gt;.sh to submit the script to the SLURM scheduler.</p><p>Requires:</p><p>&#8226; popmap_&lt;spp&gt;_polyrad.txt, a list of samples and population &#8226; &lt;spp&gt;_polyRADdata.rds, RDS object (the RADdata object) output from the previous step Outputs:</p><p>&#8226; &lt;spp&gt;_overdispersion.rds, RDS object (the overdispersion test output)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.3">Estimate genotypes</head><p>Next, we calculated filtered loci based on the expected Hind/He statistic and estimated population structure/genotypes using the polyRAD_filter_&lt;spp&gt;.R script, wrapped by polyrad_filt_&lt;spp&gt;.sh to submit the script to the SLURM scheduler.</p><p>We used the table in this tutorial, which estimated an inbreeding based on the ploidy, optimal overdispersion value, and mean Hind/He. These values are hardcoded in polyRAD_filter_&lt;spp&gt;.R.</p><p>Requires:</p><p>&#8226; popmap_&lt;spp&gt;_polyrad.txt, a list of samples and population &#8226; &lt;spp&gt;_polyRADdata.rds, RDS object (the RADdata object) output from the previous step &#8226; &lt;spp&gt;_overdispersion.rds, RDS object (the overdispersion test output) output from the previous step Outputs:</p><p>&#8226; &lt;spp&gt;_filtered_RADdata.rds, RDS object (RADdata object filtered for appropriate Hind/He) &#8226; &lt;spp&gt;_IteratePopStructPCA.csv, data output from the genotype estimate PCA, suitable for plotting &#8226; &lt;spp&gt;_estimatedgeno_RADdata.rds, RDS object (RADdata object with genotype estimates)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.4">Final filter and file cleanup</head><p>The output &lt;spp&gt;_estimatedgeno_RADdata.rds needs to be converted to genind and structure format for further analysis and steps. There is a little cleanup involved so the population information is retained. For example, Structure needs the population identity to be an integer, not a string. This set of functions can be run on a laptop.</p><p>At this stage, we also visually assessed the H ind /H e statistic versus the locus depth (see check_coverage inside the convert_genomics.R script). We removed the following samples from further analysis:</p><p>Table <ref type="table">S4</ref>: Subset of samples discarded after genotype estimation using polyRAD. Within each species, we compressed the result files for all K and reps and submitted to Structure Harvester to choose the optimal K using the Delta-K method (see this article). Once the optimal K was selected per species, we re-ran Structure using a greater number of iterations (100000) for final output and plotting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10">Conceptual Figure</head><p>We made a conceptual figure to help readers understand PCA patterns we might expect.</p><p>source("R/10-Fig1-conceptual_fig.R") make_fig1() # Plot Fig 1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="11">NLCD Data and Site Plots</head><p>NLCD is used in Fig. <ref type="figure">2</ref> and IBE analysis, described below.</p><p>From the USGS:</p><p>The U.S. Geological Survey (USGS), in partnership with several federal agencies, has developed and released four National Land Cover Database (NLCD) products over the past two decades: <ref type="bibr">NLCD 1992</ref><ref type="bibr">NLCD , 2001</ref><ref type="bibr">NLCD , 2006</ref><ref type="bibr">NLCD , and 2011</ref>. This one is for data from 2016 and describes urban imperviousness. <ref type="url">https://www.mrlc.gov/data/type/urban-imperviousness</ref> NLCD imperviousness products represent urban impervious surfaces as a percentage of developed surface over every 30-meter pixel in the United States. NLCD 2016 updates all previously released versions of impervious products for CONUS <ref type="bibr">(NLCD 2001</ref><ref type="bibr">, NLCD 2006</ref><ref type="bibr">, NLCD 2011)</ref> along with a new date of impervious surface for 2016. New for NLCD 2016 is an impervious surface descriptor layer. This descriptor layer identifies types of roads, core urban areas, and energy production sites for each impervious pixel to allow deeper analysis of developed features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="11.1">Preparing NLCD Data</head><p>First, we trimmed the large data. This makes a smaller .rds file for each city. source("R/10-trim_NLCD_spatial_data.R") create_spatial_rds_files() create_spatial_rds_files(spp = "CD")</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="11.2">Climate normals</head><p>We obtained climate normals data for plotting from <ref type="url">https://www.ncei.noaa.gov/access/us-climate-normals</ref>. We used the latest 30-year period (1991-2020): Most recent standard climatological period (2021 release); which is recommended for most purposes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="11.3">Maps of sampling locations</head><p>Next, we made plots for each city's sampling locations. Note that these only include sites that had viable polymorphic loci. source("R/10-Fig2-plot_map_of_samples.R") make_all_urban_site_plots_with_clim_normals() # Plot Fig 2</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="12">Principal components analysis &amp; plots</head><p>The following creates PCA plots from polyRAD data. source("R/11-plot_pca.R")</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head># Plot Fig 4 make_pca_city_and_pctimp_all()</head><p>In addition to coloring points by city in the main manuscript, we also colored points by % impervious surface, derived from the NLCD data.</p><p>Note that in the pdf version of this document, the figure might appear on the next pages. 13 Genetic structure using Structure and sNMF 13.1 Structure optimal K</p><p>Within each species, we compressed the result files for all K and reps and submitted to Structure Harvester to choose the optimal K using the Delta-K method.</p><p>The results were:</p><p># This file contains output from various K from Structure.. read_csv("output/structure/structure_k_Pr.csv")</p><p>The code below generates plots of various K (e.g., K={1-5}) vs likelihood, but did not end up being used in the manuscript.</p><p>source("R/12-structure_k.R")</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="13.2">Plotting Structure output</head><p>The code below generates plots for Structure results.</p><p>source("R/12-plot_structure.R") make_structure_multi_plot() # Plot Fig 5</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="13.3">Validation of Structure results with sNMF</head><p>We ran sNMF as an alternative to Structure to validate the results. We coerced all polyploid data to diploid data to make the file types compatible with the sNMF function in R. The snmf() function computes an entropy criterion that evaluates the quality of fit of the statistical model to the data by using a cross-validation technique. We plotted the cross-entropy criterion for K=[2:10] for all species. Using the best K, we then selected the best of 10 runs in each K using the which.min () function.</p><p>source("R/12-sNMF.R")</p><p>The following runs sNMF and generates the figure.</p><p>Note that in the pdf version of this document, the figure might appear on the next pages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>do_all_sNMF()</head><p>14 Genetic Differentiation</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="14.1">Sample size, sites per city per species</head><p>We used the following scripts for sample size and sites per city per species.</p><p>source("R/13-n.R") # sample size source("R/13-get_unique_site_post_genotyping.R") # sites per city per species</p><p>14.2 Among city -Jost's D, G ST , F ST We used polyrad::calcPopDiff() to calculate population statistics for each species. source("R/13-calc_popdiff_stats.R") do_all_continental_stats() # CD as an example read.csv("output/population_stats/popdiff_stats_CD.csv") ## X statistic value ## 1 1 JostD 0.30579677 ## 2 2 Gst 0.02735719 Baltimore Los Angeles Phoenix C. dactylon K = 3 Baltimore Boston Minneapolis Phoenix D. sanguinalis K = 3 Baltimore Los Angeles Phoenix E. canadensis K = 4 Baltimore Boston Los Angeles Minneapolis Phoenix L. serriola K = 3 Baltimore Boston Los Angeles Minneapolis Phoenix P. annua K = 4 Baltimore Boston Los Angeles Minneapolis Phoenix T. officinale K = 4 Figure S5: Ancestry coefficients obtained using snmf(). As with the Structure analysis, E. canadensis (horseweed) and L. serriola (prickly lettuce) appear to have the most population structure. D. sanguinalis, E. canadensis, and L. serriola from Phoenix appear unique. In general, sNMF produced larger K for most species, which will create more sensitivity to admixture. 25 ## 3 3 Fst 0.02812163</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="14.3">Among city -pairwise &#961;</head><p>We used GenoDive v.3.0.6 to calculate pairise &#961; (rho) among cities within species. Note that there is a p-value correction for testing multiple cities (species are treated as independent, however).</p><p>This can be run in GenoDive by selecting Analysis &gt; Pairwise Differentiation and selecting the "rho" statistic from the dropdown.</p><p>We used the following script to clean up the results.</p><p>source("R/13-rho.R") compile_rho_table()</p><p>Table S5: Rho statistics for pairwise comparison between cities. Bold+underlined adjusted p-values are significant at the p&lt;0.05 threshold. Species City1 City2 rho p-value adjusted p-value CD PX BA 0.050 0.001 0.001 CD LA BA 0.046 0.001 0.001 CD PX LA 0.015 0.001 0.001 DS MN BA 0.031 0.001 0.0015 DS BO BA 0.018 0.001 0.0015 DS PX BA 0.012 0.001 0.0015 DS MN BO 0.007 0.001 0.0015 DS PX BO -0.002 0.875 0.955 DS PX MN -0.002 0.955 0.955 EC PX BA 0.098 0.001 0.001 EC PX LA 0.087 0.001 0.001 EC LA BA 0.038 0.001 0.001 LS PX BA 0.077 0.001 0.0011 LS PX MN 0.069 0.001 0.0011 LS PX LA 0.061 0.001 0.0011 LS PX BO 0.056 0.001 0.0011 LS MN LA 0.039 0.001 0.0011 LS LA BA 0.038 0.001 0.0011 LS BO BA 0.032 0.001 0.0011 LS MN BO 0.021 0.001 0.0011 LS LA BO 0.010 0.001 0.0011 LS MN BA 0.009 0.002 0.002 PA PX BO 0.028 0.001 0.0015 PA LA BO 0.024 0.001 0.0015 PA PX BA 0.015 0.001 0.0015 PA LA BA 0.011 0.001 0.0015 PA BO BA 0.008 0.002 0.0024 PA PX LA -0.002 0.972 0.972 TO PX BA 0.023 0.001 0.0014 TO PX MN 0.015 0.001 0.0014 TO PX BO 0.013 0.002 0.0025 TO LA BA 0.011 0.001 0.0014 TO LA BO 0.009 0.001 0.0014 TO MN LA 0.009 0.001 0.0014 TO PX LA 0.009 0.027 0.03 Species City1 City2 rho p-value adjusted p-value TO BO BA 0.008 0.001 0.0014 TO MN BO 0.008 0.001 0.0014 TO MN BA 0.001 0.098 0.098</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="14.4">Within city -allelic richness</head><p>We used GenoDive v.3.0.6 to calculate several additional statistics.</p><p>This can be run in GenoDive by selecting Analysis &gt; Genetic Diversity, and selecting "Calculate indices separately for every population" and selecting "Correct for unknown dosage of alleles" for the polyploid species.</p><p>&#8226; Num: Number of alleles</p><p>&#8226; Eff_num: Effective number of alleles &#8226; Ho: Observed Heterozygosity &#8226; Hs: Heterozygosity within populations &#8226; Gis: Inbreeding coefficient head(read.csv("output/population_stats/genodive_genetic_diversity.csv")) ## spp city Num Eff_num Ho Hs Gis ## 1 CD BA 8.242 4.530 0.627 0.748 0.163 ## 2 CD LA 12.664 6.470 0.683 0.827 0.174 ## 3 CD PX 14.192 6.978 0.672 0.830 0.191 ## 4 DS BA 10.611 5.069 0.637 0.768 0.171 ## 5 DS BO 10.588 5.230 0.612 0.778 0.214 ## 6 DS MN 11.696 5.233 0.623 0.771 0.192 14.5 Within city -F IS (homozygosity within population) We used GenoDive v.3.0.6 to calculate F IS . This gives a good estimate of whether there are more homozygotes than expected (positive number) or more heterozygotes than expected (negative number). Notably, GenoDive accommodates polyploids and reduces the bias on F IS by performing a permutation test. By default, there are 999 permutations. This can be run in GenoDive by selecting Analysis &gt; Hardy-Weinberg &gt; Heterozygosity-based (Nei) method. head(read.csv("output/population_stats/genodive_output_Fis.csv")) ## Species Population n Fis ## 1 CD BA 55 0.166 ## 2 CD LA 48 0.186 ## 3 CD PX 82 0.200 ## 4 CD Overall NA 0.187 ## 5 DS BA 55 0.208 ## 6 DS BO 52 0.252 14.6 Within city -rd -Linkage disequilibrium We used poppr::ia() to calculate the standardized index of association of loci in the dataset (r d or rbarD). We use the standardized index of association to avoid the influence of different sample sizes, as described by Agapow and Burt 2001. When p.rD is small (&lt;0.05) and rbarD is (relatively) higher, that is a sign that the population could be in linkage disequilibrium. 27 Species Source of Variation Nested in SSD d.f. MS Var-comp %Var F-value P-value EC Among Population City 44957.413 31 1450.239 391.507 0.546 0.644 0.001 EC Among City -10368.036 2 5184.018 108.704 0.152 0.152 0.001 LS Within Population -10889.549 120 90.746 90.746 0.308 0.692 -LS Among Population City 32081.032 59 543.746 159.914 0.544 0.638 0.001 LS Among City -8721.721 4 2180.430 43.569 0.148 0.148 0.001 PA -no MN Within Population -9129.284 128 71.323 71.323 0.230 0.770 -PA -no MN Among Population City 34474.530 44 783.512 194.429 0.628 0.732 0.001 PA -no MN Among City -7968.939 3 2656.313 43.730 0.141 0.141 0.002 PA -w/ MN Within Population -10167.015 129 78.814 78.814 0.278 0.722 -PA -w/ MN Among Population City 30922.681 44 702.788 168.366 0.593 0.681 0.001 PA -w/ MN Among City -6999.878 4 1749.970 36.750 0.129 0.129 0.002 TO Within Population -15727.071 162 97.081 97.081 0.314 0.686 -TO Among Population City 51744.430 71 728.795 202.502 0.655 0.676 0.001 TO Among City -4740.183 4 1185.046 9.469 0.031 0.031 0.001</p><p>The following code plots the figure in the main manuscript.</p><p>source("R/14-AMOVA.R") make_amova_plot() # Plot Fig 6</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15">Isolation by distance and environment</head><p>We used multiple matrix regression with randomization (MMRR) to determine the relative contributions of isolation by distance (i.e., an association between genetic and geographic distances) and isolation by environment (i.e., an association between genetic and environmental distances).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15.1">Genetic distance</head><p>We calculated the genetic dissimilarity matrix (among sites) using the Cavalli-Sforza (Chord) distance metric in GenoDive. While this can also be done using adegenet, we don't want to make assumptions about ploidy. We used the "*_estimatedgeno_sitesaspops.structure" files so that sites (not cities) were treated here as populations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15.2">Geographic distance</head><p>We took the traditional approach to creating a geographic dissimilarity matrix (based on latitude and longitude) using euclidean distance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15.3">Environmental data and distance</head><p>Environmental variables include the monthly averages in the middle of the day for:</p><p>&#8226; air temperature at 5cm above ground</p><p>&#8226; air temperature at 1.2m above ground &#8226; soil temperature at 2.5cm below ground &#8226; RH (relative humidity) at 5cm above ground &#8226; RH at 1.2m above ground Variables were extracted from historic datasets and modeled using a microclimate model. More information can be found on the NicheMapR website (how the model works, what variables can be manipulated and what you can model, vignettes for running models in R).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>29</head><p>This method was chosen because it takes data from global datasets (you can use both historic and current or pick specific years) but then accounts for site-specific variables (we can change the % shade, the slope or aspect of the landscape, and it considers elevation, average cloud cover, etc.). This is the list of all the different models/datasets we're able to can pull from. It's meant for mechanistic niche modeling.</p><p>Variables in the file site_data_DUC_environvars.csv are all for the monthly averages at noon (12pmhottest part of the day!) and are extreme. In other words, they are maximums.</p><p>Note that this Stack Overflow post is helpful with installing NicheMapR. Environmental distance was generated the same way as geographic distance above (euclidean distance). These are the four environmental variables mentioned in the main manuscript, although more environmental variables are present in the raw data: % Urban cover, Distance to city center, April soil temperature, and July soil temperature. In the raw data these appear as:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15.4">Overall MMRR models</head><p>Note that for this analysis, we treated each sampling site as a distinct location. There would not be enough power to do a distance matrix among 3-5 cities. Code for generating matrices, running the MMRR, and generating figures can be found in the source code below.</p><p>source("R/15-IBD_IBE_MMRR.R") make_mmrr_plot() # Plot Fig 7</p><p>Below are the results of the MMRR, with all cities in the same model. Species are treated as independent, separate models.</p><p>Table S7: Overall model p-values from MMRR. spp R-Squared: F-Statistic: F p-value: p CD 0.0295554 7.126583 0.1852 0.1852 DS 0.0442241 21.515710 0.0817 0.0817 EC 0.2707911 40.996947 0.0001 1e-04 LS 0.0465956 19.109304 0.0513 0.0513 PA 0.0051090 1.200614 0.8889 0.8889 TO 0.0283096 16.449250 0.2327 0.2327 30 Table S8: Full parameter estimates and statistics by species from running 9999 permutations ('Reps') via MMRR. var estimate p 95% Lower 95% Upper spp distance_to_city_center 0.1200000 0.1375 0.0694968 0.1664468 CD geodist -0.0200000 0.9155 -0.2050334 0.1583669 CD Intercept 0.0000000 0.9433 -0.0473892 0.0473892 CD nlcd_urban_pct -0.1100000 0.0614 -0.1590538 -0.0639585 CD soiltemp_Apr -0.1300000 0.7102 -0.4542546 0.1920574 CD soiltemp_Jul 0.1500000 0.4652 -0.0440647 0.3406560 CD R-Squared: 0.0295554 NA NA NA CD F-Statistic: 7.1265831 NA NA NA CD F p-value: 0.1852000 NA NA NA CD distance_to_city_center -0.0200000 0.7678 -0.0534529 0.0157892 DS geodist 0.0400000 0.1746 -0.0126394 0.0981550 DS Intercept 0.0000000 0.2102 -0.0340591 0.0326544 DS nlcd_urban_pct -0.0400000 0.4238 -0.0739547 -0.0072012 DS soiltemp_Apr 0.1600000 0.614 -0.0547196 0.3687584 DS soiltemp_Jul -0.3900000 0.2492 -0.6148273 -0.1702235 DS R-Squared: 0.0442241 NA NA NA DS F-Statistic: 21.5157101 NA NA NA DS F p-value: 0.0817000 NA NA NA DS distance_to_city_center -0.0100000 0.8312 -0.0774980 0.0490001 EC geodist -0.2500000 0.0051 -0.3654204 -0.1282000 EC Intercept 0.0000000 7e-04 -0.0578968 0.0617663 EC nlcd_urban_pct 0.0400000 0.5146 -0.0242274 0.1006413 EC soiltemp_Apr 0.1000000 0.3515 -0.1124231 0.3032273 EC soiltemp_Jul 0.5600000 2e-04 0.4082284 0.7097679 EC R-Squared: 0.2707911 NA NA NA EC F-Statistic: 40.9969475 NA NA NA EC F p-value: 0.0001000 NA NA NA EC distance_to_city_center -0.1400000 0.1218 -0.1757714 -0.0999720 LS geodist -0.2000000 0.0014 -0.2598413 -0.1496999 LS Intercept 0.0000000 0.0561 -0.0389433 0.0337495 LS nlcd_urban_pct -0.0100000 0.8394 -0.0467749 0.0276001 LS soiltemp_Apr 0.0800000 0.4509 -0.0127788 0.1821182 LS soiltemp_Jul 0.0500000 0.6937 -0.0276419 0.1341635 LS R-Squared: 0.0465956 NA NA NA LS F-Statistic: 19.1093037 NA NA NA LS F p-value: 0.0513000 NA NA NA LS distance_to_city_center 0.0500000 0.5062 -0.0008095 0.0955394 PA geodist 0.0200000 0.8645 -0.1193973 0.1664705 PA Intercept 0.0000000 0.9891 -0.0480046 0.0480012 PA nlcd_urban_pct 0.0400000 0.3853 -0.0061648 0.0899176 PA soiltemp_Apr -0.0700000 0.7756 -0.3417075 0.2071037 PA soiltemp_Jul 0.0800000 0.7054 -0.1025172 0.2564145 PA R-Squared: 0.0051090 NA NA NA PA F-Statistic: 1.2006139 NA NA NA PA F p-value: 0.8889000 NA NA NA PA distance_to_city_center 0.0100000 0.856 -0.0165192 0.0450751 TO geodist -0.1100000 0.044 -0.1579717 -0.0673055 TO Intercept 0.0000000 0.6465 -0.0308818 0.0301618 TO nlcd_urban_pct 0.0100000 0.7993 -0.0163475 0.0452622 TO 31 var estimate p 95% Lower 95% Upper spp soiltemp_Apr 0.2700000 0.1202 0.1743641 0.3667842 TO soiltemp_Jul -0.3200000 0.1184 -0.4043787 -0.2342648 TO R-Squared: 0.0283096 NA NA NA TO F-Statistic: 16.4492496 NA NA NA TO F p-value: 0.2327000 NA NA NA TO</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="15.5">MMRR models within city</head><p>We also repeated this within city.</p><p>Table S9: Overall model p-values from MMRR, subset by each city. Adjusted p-values are corrected using the Benjamini and Hochberg method within each species' results. spp city R-Squared: F-Statistic: F p-value: p adjusted CD BA 0.2161593 5.4602354 0.1124 0.1686 CD LA 0.1221912 1.6704036 0.5615 0.5615 CD PX 0.2671775 16.4064136 0.0141 0.0423 DS BA 0.0730573 2.0491996 0.6366 0.9613 DS BO 0.0428518 1.1550721 0.7293 0.9613 DS MN 0.0239167 1.2104360 0.8725 0.9613 DS PX 0.0301349 0.3355696 0.9613 0.9613 EC BA 0.0801989 1.2555586 0.5154 0.5154 EC LA 0.1542581 1.4226717 0.4961 0.5154 EC PX 0.1802413 2.1107626 0.3254 0.5154 LS BA 0.6200687 6.8546312 0.1892 0.473 LS BO 0.0447270 0.7397752 0.8838 0.8838 LS LA 0.1902178 6.9060584 0.1242 0.473 LS MN 0.1245043 1.5358687 0.4772 0.7953 LS PX 0.0551478 0.7003986 0.8244 0.8838 PA BA 0.1429473 2.0014720 0.2072 0.4144 PA BO 0.0405967 1.1001781 0.7435 0.8569 PA LA 0.0752459 0.6346749 0.8569 0.8569 PA PX 0.5738665 8.0800948 0.0427 0.1708 TO BA 0.0340749 1.0371416 0.8999 0.8999 TO BO 0.0833431 2.3639386 0.4830 0.7145 TO LA 0.8796290 83.3072026 0.0003 0.0015 TO MN 0.0887157 3.9719783 0.3268 0.7145 TO PX 0.2290994 1.1292997 0.5716 0.7145 Table S10: Parameter estimates and statistics from running 9999 permutations ('Reps') via MMRR, subset by each city. Note that p-values are not adjusted for multiple testing. var estimate p 95% Lower 95% Upper spp city distance_to_city_center -0.1000000 0.4185 -0.2684666 0.0632947 CD BA geodist 0.1800000 0.2432 0.0183554 0.3382114 CD BA Intercept 0.0000000 0.0051 -0.1470377 0.1470377 CD BA nlcd_urban_pct -0.4200000 0.001 -0.5689671 -0.2638461 CD BA soiltemp_Apr 0.0200000 0.9443 -0.2536769 0.2879531 CD BA 32 var estimate p 95% Lower 95% Upper spp city soiltemp_Jul 0.1300000 0.5869 -0.1372739 0.4066275 CD BA R-Squared: 0.2161593 NA NA NA CD BA F-Statistic: 5.4602354 NA NA NA CD BA F p-value: 0.1124000 NA NA NA CD BA distance_to_city_center 0.2900000 0.1362 0.0584736 0.5275481 CD LA geodist -0.3300000 0.2694 -0.6122966 -0.0485802 CD LA Intercept 0.0000000 0.3495 -0.2005369 0.2005369 CD LA nlcd_urban_pct -0.0300000 0.8776 -0.2309849 0.1796022 CD LA soiltemp_Apr -0.0700000 0.7776 -0.3389143 0.2056573 CD LA soiltemp_Jul 0.1000000 0.6473 -0.1130946 0.3099422 CD LA R-Squared: 0.1221912 NA NA NA CD LA F-Statistic: 1.6704036 NA NA NA CD LA F p-value: 0.5615000 NA NA NA CD LA distance_to_city_center 0.5400000 0.0112 0.4048935 0.6809183 CD PX geodist 0.2400000 0.1288 0.1176737 0.3580622 CD PX Intercept 0.0000000 0.8859 -0.0940557 0.0940557 CD PX nlcd_urban_pct -0.0200000 0.8422 -0.1204062 0.0794245 CD PX soiltemp_Apr -0.3400000 0.1006 -0.5503081 -0.1239513 CD PX soiltemp_Jul -0.0700000 0.6137 -0.2703666 0.1311920 CD PX R-Squared: 0.2671775 NA NA NA CD PX F-Statistic: 16.4064136 NA NA NA CD PX F p-value: 0.0141000 NA NA NA CD PX distance_to_city_center -0.0800000 0.6566 -0.2519155 0.0908257 DS BA geodist 0.2100000 0.1108 0.0335974 0.3810348 DS BA Intercept 0.0000000 0.761 -0.1393752 0.1393752 DS BA nlcd_urban_pct -0.1500000 0.2505 -0.2857403 -0.0045114 DS BA soiltemp_Apr -0.2200000 0.4722 -0.4847959 0.0528128 DS BA soiltemp_Jul 0.2900000 0.3347 0.0227011 0.5523010 DS BA R-Squared: 0.0730573 NA NA NA DS BA F-Statistic: 2.0491996 NA NA NA DS BA F p-value: 0.6366000 NA NA NA DS BA distance_to_city_center 0.4700000 0.1593 0.1137323 0.8172063 DS BO geodist -0.4500000 0.0645 -0.7938183 -0.0989208 DS BO Intercept 0.0000000 0.8498 -0.1425615 0.1418102 DS BO nlcd_urban_pct 0.0500000 0.7466 -0.1048400 0.1967690 DS BO soiltemp_Apr -0.1200000 0.6561 -0.3734660 0.1401180 DS BO soiltemp_Jul 0.1400000 0.5493 -0.1201335 0.3927905 DS BO R-Squared: 0.0428518 NA NA NA DS BO F-Statistic: 1.1550721 NA NA NA DS BO F p-value: 0.7293000 NA NA NA DS BO distance_to_city_center -0.0400000 0.7746 -0.1840544 0.1117149 DS MN geodist -0.0400000 0.7876 -0.1818746 0.1105641 DS MN Intercept 0.0000000 0.4679 -0.1035844 0.1035844 DS MN nlcd_urban_pct 0.0400000 0.699 -0.0727871 0.1502068 DS MN soiltemp_Apr -0.1800000 0.4033 -0.3555524 -0.0143666 DS MN soiltemp_Jul 0.0600000 0.7516 -0.1097516 0.2385213 DS MN R-Squared: 0.0239167 NA NA NA DS MN F-Statistic: 1.2104360 NA NA NA DS MN F p-value: 0.8725000 NA NA NA DS MN distance_to_city_center -0.0300000 0.9108 -0.3758396 0.3139799 DS PX geodist -0.0800000 0.7542 -0.3420557 0.1911879 DS PX Intercept 0.0100000 0.6325 -0.2137801 0.2332494 DS PX 33 var estimate p 95% Lower 95% Upper spp city nlcd_urban_pct 0.0800000 0.6269 -0.1509445 0.3132693 DS PX soiltemp_Apr -0.0800000 0.8003 -0.4400966 0.2847726 DS PX soiltemp_Jul -0.0600000 0.7774 -0.3749012 0.2644566 DS PX R-Squared: 0.0301349 NA NA NA DS PX F-Statistic: 0.3355696 NA NA NA DS PX F p-value: 0.9613000 NA NA NA DS PX distance_to_city_center 0.1000000 0.5104 -0.1352494 0.3274849 EC BA geodist 0.1700000 0.2716 -0.0622310 0.3983733 EC BA Intercept 0.0000000 0.6783 -0.1871244 0.1871244 EC BA nlcd_urban_pct 0.1200000 0.3852 -0.0792998 0.3107074 EC BA soiltemp_Apr 0.0500000 0.8657 -0.2985420 0.3896308 EC BA soiltemp_Jul 0.0600000 0.8204 -0.2871230 0.3984088 EC BA R-Squared: 0.0801989 NA NA NA EC BA F-Statistic: 1.2555586 NA NA NA EC BA F p-value: 0.5154000 NA NA NA EC BA distance_to_city_center -0.9800000 0.0737 -1.6450191 -0.3099034 EC LA geodist 0.2400000 0.2494 -0.0685768 0.5511398 EC LA Intercept 0.0000000 0.3457 -0.2453434 0.2453434 EC LA nlcd_urban_pct 0.5500000 0.1414 0.0320693 1.0648316 EC LA soiltemp_Apr -0.0200000 0.9446 -0.2807172 0.2483734 EC LA soiltemp_Jul 0.4200000 0.232 0.0302705 0.8043884 EC LA R-Squared: 0.1542581 NA NA NA EC LA F-Statistic: 1.4226717 NA NA NA EC LA F p-value: 0.4961000 NA NA NA EC LA distance_to_city_center 0.0100000 0.9703 -0.2834277 0.3034585 EC PX geodist -0.3900000 0.0395 -0.6389291 -0.1363801 EC PX Intercept 0.0100000 0.3856 -0.2107459 0.2237152 EC PX nlcd_urban_pct 0.0000000 0.9772 -0.2250492 0.2151403 EC PX soiltemp_Apr -0.0100000 0.9779 -0.4668132 0.4495769 EC PX soiltemp_Jul -0.0900000 0.7174 -0.5232281 0.3458628 EC PX R-Squared: 0.1802413 NA NA NA EC PX F-Statistic: 2.1107626 NA NA NA EC PX F p-value: 0.3254000 NA NA NA EC PX distance_to_city_center 1.1900000 0.106 0.7662079 1.6094443 LS BA geodist -1.0400000 0.1159 -1.4829287 -0.6016138 LS BA Intercept -0.0300000 0.1484 -0.2622639 0.1947604 LS BA nlcd_urban_pct 0.2700000 0.2101 0.0266277 0.5221911 LS BA soiltemp_Apr 0.6400000 0.1682 0.2034475 1.0693071 LS BA soiltemp_Jul -0.5100000 0.1641 -0.9457492 -0.0759057 LS BA R-Squared: 0.6200687 NA NA NA LS BA F-Statistic: 6.8546312 NA NA NA LS BA F p-value: 0.1892000 NA NA NA LS BA distance_to_city_center -0.2600000 0.5926 -0.7986157 0.2851344 LS BO geodist 0.3500000 0.3781 -0.1877151 0.8820930 LS BO Intercept 0.0000000 0.7776 -0.1855937 0.1791482 LS BO nlcd_urban_pct -0.0500000 0.7507 -0.2336700 0.1374681 LS BO soiltemp_Apr 0.3100000 0.3858 -0.0610043 0.6888627 LS BO soiltemp_Jul -0.3500000 0.3456 -0.7122678 0.0129835 LS BO R-Squared: 0.0447270 NA NA NA LS BO F-Statistic: 0.7397752 NA NA NA LS BO F p-value: 0.8838000 NA NA NA LS BO distance_to_city_center 0.0000000 0.9992 -0.2315808 0.2320620 LS LA 34 var estimate p 95% Lower 95% Upper spp city geodist -0.3500000 0.1233 -0.5764863 -0.1159599 LS LA Intercept 0.0000000 0.0865 -0.1224544 0.1224544 LS LA nlcd_urban_pct -0.0900000 0.3924 -0.2277126 0.0497195 LS LA soiltemp_Apr 0.1300000 0.2848 -0.0039581 0.2738997 LS LA soiltemp_Jul 0.3300000 0.0401 0.2024289 0.4634342 LS LA R-Squared: 0.1902178 NA NA NA LS LA F-Statistic: 6.9060584 NA NA NA LS LA F p-value: 0.1242000 NA NA NA LS LA distance_to_city_center -0.2200000 0.2576 -0.4522034 0.0138133 LS MN geodist 0.2200000 0.2056 -0.0135483 0.4484999 LS MN Intercept 0.0000000 0.7834 -0.2081530 0.2173090 LS MN nlcd_urban_pct 0.2000000 0.2129 -0.0369112 0.4346262 LS MN soiltemp_Apr -0.0700000 0.828 -0.4214409 0.2856407 LS MN soiltemp_Jul 0.0500000 0.8648 -0.3132217 0.4139720 LS MN R-Squared: 0.1245043 NA NA NA LS MN F-Statistic: 1.5358687 NA NA NA LS MN F p-value: 0.4772000 NA NA NA LS MN distance_to_city_center 0.0800000 0.6914 -0.2026496 0.3675307 LS PX geodist -0.1000000 0.7047 -0.4513560 0.2458115 LS PX Intercept 0.0000000 0.8133 -0.2080541 0.2080541 LS PX nlcd_urban_pct 0.1600000 0.2157 -0.0485440 0.3744609 LS PX soiltemp_Apr -0.3000000 0.664 -1.1674789 0.5580362 LS PX soiltemp_Jul 0.2100000 0.7654 -0.5359827 0.9536103 LS PX R-Squared: 0.0551478 NA NA NA LS PX F-Statistic: 0.7003986 NA NA NA LS PX F p-value: 0.8244000 NA NA NA LS PX distance_to_city_center -0.2400000 0.1406 -0.4569014 -0.0211339 PA BA geodist 0.0400000 0.7215 -0.1864189 0.2760527 PA BA Intercept 0.0000000 0.3402 -0.1981518 0.1981518 PA BA nlcd_urban_pct 0.3400000 0.0292 0.1245412 0.5455189 PA BA soiltemp_Apr 0.1200000 0.4757 -0.1805703 0.4123389 PA BA soiltemp_Jul -0.0900000 0.7229 -0.3920738 0.2124508 PA BA R-Squared: 0.1429473 NA NA NA PA BA F-Statistic: 2.0014720 NA NA NA PA BA F p-value: 0.2072000 NA NA NA PA BA distance_to_city_center 0.3000000 0.3375 -0.0653382 0.6739094 PA BO geodist -0.2600000 0.3184 -0.6268734 0.1128175 PA BO Intercept 0.0000000 0.476 -0.1417946 0.1417946 PA BO nlcd_urban_pct 0.1100000 0.396 -0.0432134 0.2625454 PA BO soiltemp_Apr 0.0600000 0.8052 -0.1923722 0.3159761 PA BO soiltemp_Jul 0.0800000 0.7166 -0.1751357 0.3318766 PA BO R-Squared: 0.0405967 NA NA NA PA BO F-Statistic: 1.1001781 NA NA NA PA BO F p-value: 0.7435000 NA NA NA PA BO distance_to_city_center 0.0500000 0.8015 -0.2385409 0.3440297 PA LA geodist -0.0100000 0.9725 -0.3525149 0.3279088 PA LA Intercept 0.0000000 0.2965 -0.2565480 0.2565480 PA LA nlcd_urban_pct -0.2300000 0.1389 -0.4996017 0.0309663 PA LA soiltemp_Apr -0.1600000 0.5729 -0.4959573 0.1684456 PA LA soiltemp_Jul -0.0200000 0.9582 -0.2837329 0.2532441 PA LA R-Squared: 0.0752459 NA NA NA PA LA F-Statistic: 0.6346749 NA NA NA PA LA 35 var estimate p 95% Lower 95% Upper spp city F p-value: 0.8569000 NA NA NA PA LA distance_to_city_center -0.3400000 0.1264 -0.5736448 -0.1102836 PA PX geodist 0.3000000 0.0395 0.0831452 0.5173315 PA PX Intercept 0.0000000 0.3987 -0.1994544 0.1994544 PA PX nlcd_urban_pct 0.0800000 0.4762 -0.1282009 0.2914797 PA PX soiltemp_Apr 0.5700000 0.0869 0.1810453 0.9640457 PA PX soiltemp_Jul 0.1800000 0.4031 -0.2095945 0.5604381 PA PX R-Squared: 0.5738665 NA NA NA PA PX F-Statistic: 8.0800948 NA NA NA PA PX F p-value: 0.0427000 NA NA NA PA PX distance_to_city_center -0.0600000 0.7548 -0.2527562 0.1278394 TO BA geodist 0.1700000 0.4582 -0.0175679 0.3484622 TO BA Intercept 0.0000000 0.9428 -0.1337402 0.1337402 TO BA nlcd_urban_pct 0.0200000 0.8871 -0.1201015 0.1545664 TO BA soiltemp_Apr 0.0200000 0.925 -0.2456204 0.2892900 TO BA soiltemp_Jul -0.1200000 0.6149 -0.3955572 0.1534940 TO BA R-Squared: 0.0340749 NA NA NA TO BA F-Statistic: 1.0371416 NA NA NA TO BA F p-value: 0.8999000 NA NA NA TO BA distance_to_city_center 0.2500000 0.4583 -0.0866987 0.5906567 TO BO geodist -0.3200000 0.1915 -0.6505818 0.0188440 TO BO Intercept 0.0000000 0.9424 -0.1385997 0.1385997 TO BO nlcd_urban_pct 0.2200000 0.056 0.0797738 0.3683782 TO BO soiltemp_Apr 0.0200000 0.9385 -0.2101130 0.2471239 TO BO soiltemp_Jul -0.1400000 0.5419 -0.3690613 0.0898358 TO BO R-Squared: 0.0833431 NA NA NA TO BO F-Statistic: 2.3639386 NA NA NA TO BO F p-value: 0.4830000 NA NA NA TO BO distance_to_city_center -0.0600000 0.5889 -0.2085607 0.0944370 TO LA geodist 0.0100000 0.9256 -0.1584939 0.1794819 TO LA Intercept -0.0300000 0.11 -0.1062405 0.0465560 TO LA nlcd_urban_pct -0.0300000 0.7391 -0.1111975 0.0553448 TO LA soiltemp_Apr -0.9400000 2e-04 -1.0183148 -0.8521554 TO LA soiltemp_Jul 0.0600000 0.5799 -0.0386642 0.1617719 TO LA R-Squared: 0.8796290 NA NA NA TO LA F-Statistic: 83.3072026 NA NA NA TO LA F p-value: 0.0003000 NA NA NA TO LA distance_to_city_center -0.0300000 0.8513 -0.1973255 0.1299167 TO MN geodist 0.0200000 0.9075 -0.1406409 0.1746362 TO MN Intercept 0.0000000 0.6183 -0.1101741 0.1101741 TO MN nlcd_urban_pct 0.0400000 0.7863 -0.0699047 0.1518388 TO MN soiltemp_Apr 0.4800000 0.0338 0.2780902 0.6849983 TO MN soiltemp_Jul -0.3200000 0.1707 -0.5199557 -0.1188595 TO MN R-Squared: 0.0887157 NA NA NA TO MN F-Statistic: 3.9719783 NA NA NA TO MN F p-value: 0.3268000 NA NA NA TO MN distance_to_city_center 0.2700000 0.3084 -0.1389457 0.6732755 TO PX geodist 0.0300000 0.9349 -0.4735969 0.5318689 TO PX Intercept -0.0100000 0.8142 -0.3570472 0.3342439 TO PX nlcd_urban_pct 0.1600000 0.5206 -0.2774734 0.6011279 TO PX soiltemp_Apr -0.5000000 0.3107 -1.1136111 0.1065639 TO PX soiltemp_Jul 0.0900000 0.8399 -0.5216801 0.7115915 TO PX 36 var estimate p 95% Lower 95% Upper spp city R-Squared: 0.2290994 NA NA NA TO PX F-Statistic: 1.1292997 NA NA NA TO PX F p-value: 0.5716000 NA NA NA TO PX 16 SessionInfo() sessionInfo() ## R version 4.4.2 (2024-10-31) ## Platform: aarch64-apple-darwin20 ## Running under: macOS Sonoma 14.4.1 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK ve ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## time zone: America/New_York ## tzcode source: internal ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] polyRAD_2.0.0 polysat_1.7-7 ggh4x_0.2.8 LEA_3.16.0 ## [5] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 purrr_1.0.2 ## [9] tibble_3.2.1 tidyverse_2.0.0 cowplot_1.2.0 readr_2.1.5 ## [13] here_1.0.1 dplyr_1.1.4 magrittr_2.0.3 tidyr_1.3.1 ## [17] ggplot2_4.0.0 ## ## loaded via a namespace (and not attached): ## [1] fastmatch_1.1-4 gtable_0.3.6 xfun_0.52 tzdb_0.4.0 ## [5] vctrs_0.6.5 tools_4.4.2 generics_0.1.3 parallel_4.4.2 ## [9] fansi_1.0.6 highr_0.11 pkgconfig_2.0.3 RColorBrewer_1.1-3 ## [13] S7_0.2.0 lifecycle_1.0.4 compiler_4.4.2 farver_2.1.2 ## [17] textshaping_0.4.0 tinytex_0.51 htmltools_0.5.8.1 yaml_2.3.10 ## [21] pillar_1.9.0 crayon_1.5.2 viridis_0.6.5 commonmark_1.9.1 ## [25] tidyselect_1.2.1 digest_0.6.35 stringi_1.8.4 labeling_0.4.3 ## [29] rprojroot_2.0.4 fastmap_1.2.0 grid_4.4.2 cli_3.6.3 ## [33] dichromat_2.0-0.1 utf8_1.2.4 withr_3.0.2 scales_1.4.0 ## [37] bit64_4.0.5 timechange_0.3.0 rmarkdown_2.27 bit_4.0.5 ## [41] ggtext_0.1.2 gridExtra_2.3 ragg_1.3.2 hms_1.1.3 ## [45] kableExtra_1.4.0 evaluate_1.0.5 knitr_1.47 viridisLite_0.4.2 ## [49] markdown_1.13 rlang_1.1.4 gridtext_0.1.5 Rcpp_1.0.12 ## [53] glue_1.8.0 formatR_1.14 xml2_1.3.6 svglite_2.1.3 ## [57] rstudioapi_0.16.0 vroom_1.6.5 R6_2.5.1 systemfonts_1.1.0</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>This is supplementary material intended to be paired with files on GitHub at the repository https://github.c om/avahoffman/urban-weed-genomics.</p></note>
		</body>
		</text>
</TEI>
