<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Chromosome‐scale reference genome of &lt;i&gt;Pectocarya recurvata&lt;/i&gt; , the species with the smallest reported genome size in Boraginaceae</title></titleStmt>
			<publicationStmt>
				<publisher>Wiley Periodicals LLC on behalf of Botanical Society of America</publisher>
				<date>05/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10632939</idno>
					<idno type="doi">10.1002/aps3.70008</idno>
					<title level='j'>Applications in Plant Sciences</title>
<idno>2168-0450</idno>
<biblScope unit="volume">13</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>Poppy C Northing</author><author>Jessie A Pelosi</author><author>D Lawrence Venable</author><author>Katrina M Dlugosch</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <sec><title>Premise</title><p><italic>Pectocarya recurvata</italic>(Boraginaceae, subfamily Cynoglossoideae), a species native to the Sonoran Desert (North America), has served as a model system for a suite of ecological and evolutionary studies. However, no reference genomes are currently available in Cynoglossoideae. A high‐quality reference genome for<italic>P. recurvata</italic>would be valuable for addressing questions in this system and across broader taxonomic scales.</p></sec> <sec><title>Methods</title><p>Using PacBio HiFi sequencing, we assembled a reference genome for<italic>P. recurvata</italic>and annotated coding regions with full‐length transcripts from an Iso‐Seq library. We assessed genome completeness with BUSCO and<italic>k</italic>‐mer analysis, and estimated the genome size of six individuals using flow cytometry.</p></sec> <sec><title>Results</title><p>The chromosome‐scale genome assembly for<italic>P. recurvata</italic>was 216.0 Mbp long (N50=12.1 Mbp). Previous observations indicated<italic>P. recurvata</italic>is 2<italic>n</italic>=24. Our assembly included 12 primary contigs (158.3 Mbp) containing 30,655 genes with telomeres at 23 out of 24 ends. Flow cytometry measurements from the same population included two plants with 1C=196.9 Mbp, the smallest measured for Boraginaceae, and four with 1C=385.8 Mbp, which is consistent with tetraploidy in this population.</p></sec> <sec><title>Discussion</title><p>The<italic>P. recurvata</italic>genome assembly and annotation provide a high‐quality genomic resource in a sparsely represented area of the angiosperm phylogeny. This new reference genome will facilitate answering open questions in ecophysiology, biogeography, and systematics.</p></sec>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Resultados: Nuestro ensemble de referencia para P. recurvata mide a nivel de cromosoma 216.0 Mbp (N50 = 12.1 Mbp). Observaciones previas indican que P. recurvata es 2n = 24. Nuestro ensemble incluye 12 contig primarios (158.3 Mbp) que contiene 30,655 genes con tel&#243;meros en 23 de 24 de los extremos. Medidas de citometr&#237;a de flujo en la misma poblaci&#243;n incluyeron dos plantas con 1C = 196.9 Mbp, el valor m&#225;s peque&#241;o medido para Boraginaceae, y cuatro plantas con 1C = 385.8 Mbp, cual es consistente con la tetraploid&#237;a de esta poblaci&#243;n. Discusi&#243;n: El ensemble y la anotaci&#243;n del genoma de P. recurvata provee un recurso gen&#243;mico de alta calidad en una &#225;rea escasamente representada de la filogenia de las angiospermas. Esto nuevo genoma de referencia facilitar&#225; el estudio de preguntas abiertas en ecofisiolog&#237;a, biogeograf&#237;a, y sistem&#225;tica.</p><p>Pectocarya recurvata I.M. Johnst. (curvenut combseed) is a winter annual forb in the Cynoglossidae subfamily and Amsinckiinae subtribe of Boraginaceae (Boraginales) (Figure <ref type="figure">1</ref>) <ref type="bibr">(Chac&#243;n et al., 2016;</ref><ref type="bibr">Luebert et al., 2016;</ref><ref type="bibr">Simpson et al., 2017a)</ref>. Pectocarya recurvata is distributed across the Sonoran, Chihuahuan, and Mojave deserts of northern Mexico and the American Southwest, regions known for unpredictable winter precipitation regimes <ref type="bibr">(Noy-Meir, 1973;</ref><ref type="bibr">Huxman et al., 2004)</ref>. As a winter annual plant in this region, P. recurvata relies on cues from late fall and early winter rains to appropriately time its germination, growth, and reproduction to take advantage of the cooler, wetter conditions that occur immediately after rain events <ref type="bibr">(Mulroy and Rundel, 1977;</ref><ref type="bibr">Huxman et al., 2008)</ref>. The Sonoran Desert winter annual plant community, including P. recurvata, has served as an important model system in understanding how species adapt to variable environments and subsequently how these environments promote species coexistence <ref type="bibr">(Pake and</ref><ref type="bibr">Venable, 1995, 1996;</ref><ref type="bibr">Chesson et al., 2004;</ref><ref type="bibr">Angert et al., 2009;</ref><ref type="bibr">Huxman et al., 2013)</ref>. These investigations have been largely facilitated by the long-term vegetation plots at the Desert Laboratory at Tumamoc Hill (Tucson, Arizona, USA), where detailed vital rates of the most abundant species (including P. recurvata) have been recorded since 1982 <ref type="bibr">(Venable, 2007;</ref><ref type="bibr">Huxman et al., 2013)</ref>. Despite its importance to long-term eco-evolutionary studies, there is no published genome or transcriptome for P. recurvata or any species in the Cynoglossidae subfamily.</p><p>Stable coexistence within the Sonoran Desert winter annual plant community is maintained through temporal variability in precipitation and temperature acting on variation in a fundamental physiological trade-off between growth and reproduction <ref type="bibr">(Angert et al., 2007</ref><ref type="bibr">(Angert et al., , 2009;;</ref><ref type="bibr">Huxman et al., 2008)</ref>. Species are separated along a specific trade-off between water use efficiency (the amount of carbon fixed per unit of water lost) and relative growth rate <ref type="bibr">(Angert et al., 2007;</ref><ref type="bibr">Huxman et al., 2008)</ref>. As a more resourceconservative member of this community, P. recurvata exhibits a slow growth rate and highly efficient water use, the latter of which is associated with the maintenance of photosynthesis at lower temperatures to optimize growth after winter rains <ref type="bibr">(Huxman et al., 2008;</ref><ref type="bibr">Barron-Gafford et al., 2013)</ref>. Two areas of active interest in this system are the genetic architecture underlying the physiological trade-off between relative growth rate and water use efficiency, and the potential for these traits to be involved in adaptive responses to climate change <ref type="bibr">(Kimball et al., 2010;</ref><ref type="bibr">Angert et al., 2014)</ref>.</p><p>More broadly, evolutionary relationships among taxa in Boraginaceae have served as models to investigate the biogeographical processes underlying commonly observed disjunctions in plant distributions <ref type="bibr">(Gottschling et al., 2004;</ref><ref type="bibr">Moore et al., 2006;</ref><ref type="bibr">Guilliams et al., 2017;</ref><ref type="bibr">Mabry and Simpson, 2018)</ref>. Several genera in the Amsinckiinae subtribe, including Pectocarya, contain species distributed across the subtropical and temperate regions of North and South America (but not within the tropics themselves), a distribution pattern referred to as the American amphitropical disjunction (AAD) <ref type="bibr">(Raven, 1963;</ref><ref type="bibr">Guilliams et al., 2017;</ref><ref type="bibr">Simpson et al., 2017b)</ref>. It is inferred that longdistance dispersal events cause AAD distributions <ref type="bibr">(Raven, 1963;</ref><ref type="bibr">Wen and Ickert-Bond, 2009</ref>), yet the types of propagule morphology that might promote long-distance dispersal and the features associated with successful establishment after colonization are less clear <ref type="bibr">(Nathan, 2006;</ref><ref type="bibr">Chac&#243;n et al., 2017;</ref><ref type="bibr">Harris et al., 2018)</ref>. In Amsinckiinae alone, at least 18 distinct long-distance dispersal events across the American tropics have occurred in the relatively recent past <ref type="bibr">(Chac&#243;n et al., 2017;</ref><ref type="bibr">Guilliams et al., 2017)</ref>.</p><p>In Boraginaceae, there are currently (as of October 2024) three publicly available genome assemblies: Lithospermum erythrorhizon Siebold &amp; Zucc. <ref type="bibr">(Auber et al., 2020)</ref>, Echium plantagineum L. <ref type="bibr">(Tang et al., 2020)</ref>, and Pentaglottis sempervirens (L.) Tausch ex L.H.Bailey (Darwin Tree of Life Project Consortium, 2022), which are all taxa placed within the Boraginoideae subfamily. There are no genome assemblies currently available within the remaining subfamilies of Echiochiloideae and Cynoglossoideae, despite the Cynoglossoideae being the largest subfamily, with over 1000 species across 50 genera <ref type="bibr">(Chac&#243;n et al., 2016)</ref>. Moreover, the relationships of the major orders that comprise the core Lamiidae phylogeny (Boraginales, Gentianales, Solanales, Lamiales) remain unresolved <ref type="bibr">(Bremer et al., 2002;</ref><ref type="bibr">Soltis et al., 2011;</ref><ref type="bibr">Zhang et al., 2020)</ref>. Therefore, expanding the availability of nuclear genomic resources into additional areas of the Lamiidae phylogeny could also help clarify the relationships between these orders.</p><p>Here, we present a chromosome-scale genome assembly for P. recurvata, the first reference genome in the Cynoglossoideae subfamily of Boraginaceae. The P. recurvata genome assembly is a near telomere-to-telomere chromosome-level assembly accompanied by structural and functional gene annotation as well as a complete chloroplast genome assembly and annotation. We also report the smallest genome size to date for a species in Boraginaceae (reported 1C-values range from 270-16,000 Mbp), as well as putative ploidy variation in P. recurvata. Additionally, we find evidence of a whole genome duplication (WGD) in the evolutionary history of P. recurvata congruent with previously described WGDs in Boraginaceae <ref type="bibr">(Ren et al., 2018;</ref><ref type="bibr">Tang et al., 2020)</ref>. We analyze gene synteny to compare genome structure between E. plantagineum and our assembled P. recurvata chromosomes. Our annotated P. recurvata genome and plastome assemblies provide essential genomic tools for future opportunities to investigate outstanding questions in physiological ecology, biogeography, and phylogenetics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>METHODS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Species description</head><p>Pectocarya recurvata is a small herbaceous forb with basally branching, strigose stems arranged with alternate, strigose to bristle-haired linear leaves (Figure <ref type="figure">1A</ref>) <ref type="bibr">(Veno, 1979;</ref><ref type="bibr">Guilliams and Kelley, 2021)</ref>. It has racemose inflorescences consisting of self-pollinating, inconspicuous flowers with small, white, funnelform corollas and bilateral calyces (Figure <ref type="figure">1C</ref>). Pectocarya recurvata is distinguished from other species in the genus by its characteristic fruit, which are four fused, recurved nutlets with toothed margins that enable epizoochoric dispersal (Figure <ref type="figure">1A</ref>, <ref type="figure">B</ref>) <ref type="bibr">(Veno, 1979;</ref><ref type="bibr">Guilliams and Kelley, 2021)</ref>. Within its range spanning the Chihuahuan, Mojave, and Sonoran deserts of the Southwestern United States and northwestern Mexico, P. recurvata commonly occurs in patches at low elevations (10-1600 m), often growing abundantly in the shelter of rocks (Figure <ref type="figure">1A</ref>) and larger plants such as Larrea tridentata (Sess&#233; &amp; Moc. ex DC.) Coville (creosote) <ref type="bibr">(Guilliams and Kelley, 2021)</ref>. The chromosome number of P. recurvata is 2n = 24, as counted by <ref type="bibr">Veno (1979)</ref> in individuals from five widely distributed populations in Arizona and California.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genome size estimation</head><p>The genome size of P. recurvata was estimated using flow cytometry with a FACSCanto II flow cytometer (BD Biosciences, San Jose, California, USA). The 2C DNA content was measured from six individuals grown from seed sourced from the Desert Laboratory at Tumamoc Hill (32&#176;13&#8242;N, 111&#176;01&#8242;W; Tucson, Arizona, USA) in 2016 and 2019.</p><p>Sample preparation and analysis by flow cytometry was conducted following a modified two-step protocol described in <ref type="bibr">Dole&#382;el et al. (2007)</ref> and <ref type="bibr">Cang et al. (2024)</ref>. In brief, nuclei were isolated from approximately 20 mg of freshly collected leaf tissue from each P. recurvata sample. Raphanus sativus 'Saxa' (2C DNA content = 1.11 pg, provided by the Institute of Experimental Botany, Prague, Czech Republic; <ref type="bibr">Dole&#382;el et al., 2007)</ref> was used as an internal reference standard. Leaf tissue from both species was chopped simultaneously on an ice-cold glass Petri dish in 200 &#956;L of ice-cold Otto I buffer, using a new razor blade for each plant. The nuclear suspension was filtered through 18-&#956;m nylon mesh into a flow cytometry tube and kept on ice. Subsequently, 280 &#956;L of Otto II buffer was added to the suspension, followed by 20 &#956;L of room-temperature 1 mg/mL RNase and 23 &#956;L of ice-cold 1 mg/mL propidium iodide (PI) stain. Each sample was gently vortexed and incubated for 5 min before analysis on the FACSCanto II with a medium flow rate. Measurements of the six samples took place across four different days (Table <ref type="table">S9</ref> in Appendix S1; see Supporting Information with this article). The haploid (1C) genome size was calculated using the G1 peak mean PI fluorescence of the sample and standard, following the formula described by <ref type="bibr">Dole&#382;el and Barto&#353; (2005)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reference tissue</head><p>Plant tissue for the reference genome was sourced from a P. recurvata seed collected in spring 2019 at the Desert Laboratory. The seed was germinated in soil in a D25L deepot (Steuwe and Sons, Tangent, Oregon, USA) at 10&#176;C and grown in a greenhouse at the University of Arizona (Tucson, Arizona, USA) in ambient light conditions and hand-watered twice daily. After 77 days, 0.21 g of stem and leaf tissue was collected from the individual (Figure <ref type="figure">1D</ref>), flash frozen in liquid nitrogen, and stored at -80&#176;C at the Arizona Genomics Institute (AGI); this tissue was used to prepare genomic sequencing libraries. After 66 days of additional growth, 0.52 g of leaf, stem, and reproductive tissue was flash frozen in liquid nitrogen and stored at -80&#176;C at the AGI; this tissue was used to prepare the transcriptome sequencing library. A reference voucher of the remaining plant tissue from this individual is archived in the University of Arizona herbarium (ARIZ 444648).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genome and transcriptome sequencing</head><p>Genome sequencing was conducted by the AGI following PacBio's HiFi protocols for library preparation and sequencing as follows. After high-molecular-weight DNA was extracted following the <ref type="bibr">Doyle and Doyle (1987)</ref> protocol with minor modifications and assessed for quality control with a Femto Pulse System (Agilent Technologies, Santa Clara, California, USA), DNA was sheared to 10-20 kbp using a Megaruptor 3 sonicator (Diagenode, Denville, New Jersey, USA) and treated with SMRTbell cleanup beads (PacBio, Menlo Park, California, USA). The final HiFi libraries were constructed using a SMRTbell Prep Kit v3.0 (PacBio), size selected to 10-25 kbp using a PippinHT size selector (Sage Science, Beverly, Massachusetts, USA), and validated with the Femto Pulse System (Agilent Technologies). This final, size-selected library was prepared for sequencing using the PacBio SequelII Sequencing Kit v3.1 for HiFi libraries, loaded onto one PacBio Revio SMRT cell, and sequenced for 24 h in circular consensus sequencing mode, yielding 65.6 Gbp of HiFi sequencing data.</p><p>An Iso-Seq library was also constructed by the AGI to enable gene identification and annotation of the P. recurvata genome assembly. Total RNA was isolated from the flash frozen sample of pooled leaf, stem, and reproductive tissue using the PureLink Plant RNA Reagent kit (Thermo Fisher Scientific, Waltham, Massachusetts, USA), and the RNA integrity number (RIN) was measured for quality control using an Agilent 2100 Bioanalyzer (sample RIN = 7.8). The Kinnex Iso-Seq library was prepared by first synthesizing cDNA from 300 ng of total RNA using the Iso-Seq express 2.0 kit (PacBio), and the size distribution of the resulting cDNA was checked with a Bioanalyzer (Agilent Technologies). The final Iso-Seq library was constructed following the PacBio Kinnex PCR protocol and sequenced on one Revio SMRT cell, generating 75.7 Gbp of HiFi reads comprising 85 million transcripts that reduced to 1.65 million isoform transcripts after clustering was performed using the IsoSeq software toolkit (<ref type="url">https://isoseq.how/</ref>, PacBio).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Assembly</head><p>A k-mer analysis of the HiFi reads was used to estimate genome size, abundance of repetitive elements, heterozygosity, and ploidy level using KMC v3.1.0 <ref type="bibr">(Kokot et al., 2017)</ref>, GenomeScope v1.0 <ref type="bibr">(Vurture et al., 2017)</ref>, and SmudgePlot v1.0 (Ranallo-Benavidez et al., 2020) with a maximum k-mer coverage of 10,000 to count and analyze 17-mers, respectively. Values of k = 21, 25, 31, 41, and 71 were investigated in the same manner, and the results of these analyses are reported in Table <ref type="table">S1</ref> in Appendix S1 and in Figures <ref type="figure">S2</ref> and <ref type="figure">S8</ref> in Appendix S2. The initial genome assembly was constructed de novo from the PacBio HiFi reads using Hifiasm v0.16.1 <ref type="bibr">(Cheng et al., 2021)</ref>. Based on low estimated heterozygosity from the k-mer analysis (see Results) and the selfing mating system of P. recurvata, purging of haplotypic duplications was skipped in the assembly (using the -l0 flag). The resulting assembly graph was visualized using Bandage v0.8.1 <ref type="bibr">(Wick et al., 2015)</ref> assembly errors using Inspector with the --pacbio-hifi flag <ref type="bibr">(Chen et al., 2021)</ref>. Foreign contaminants were identified in the assembly using BlobToolKit v4.3.5 <ref type="bibr">(Challis et al., 2020)</ref>, which identifies potential contaminants from raw read coverage, GC content, and BLAST hits. The HiFi reads were aligned to the initial assembly to calculate coverage using Minimap2 v2.28 with the -map-hifi flag <ref type="bibr">(Li, 2018)</ref>, and BLAST hits from the nucleotide database were generated using BLASTN with BLAST v2.13.0 <ref type="bibr">(Altschul et al., 1990)</ref>. This initial assembly (i.e., every contig &gt;10 kbp) was assessed for completeness using 2326 benchmarking universal single-copy orthologs (BUSCOs) from the eudicots dataset eudicots_odb10 (1 August 2024) using BUSCO v5.6.1 <ref type="bibr">(Manni et al., 2021)</ref>.</p><p>The initial assembly consisted of 12 large contigs and 6449 smaller, trailing contigs (see Results; Table <ref type="table">1</ref>). Three approaches were taken to identify the sequence content of these contigs. First, the trailing contigs were mapped to our assembled P. recurvata reference plastome (see below) using Geneious Prime 2024.0.5 (<ref type="url">https://www.geneious.com</ref>); any contigs aligned to our reference plastome were removed from the assembly. Second, publicly available mitochondrial DNA sequences from species in Boraginaceae in the National Center for Biotechnology Information (NCBI) nucleotide database (accessed June 2024) were mapped to the trailing contigs to identify contigs containing mitochondrial DNA. Finally, the trailing contigs were mapped to the 12 large contigs to identify any repetitive genomic sequences in the contigs. While the contigs that mapped to the chloroplast were removed from the initial assembly (generating the final assembly), the contigs that mapped to mitochondrial sequences and those that mapped to the genome were retained due to ambiguity in their placements. Upon uploading this final assembly to NCBI, the NCBI contamination screen was used to detect and remove any remaining foreign DNA <ref type="bibr">(Astashyn et al., 2024)</ref>.</p><p>The chloroplast genome was constructed de novo using the PacBio HiFi reads. Potential chloroplast reads were identified by aligning the raw HiFi reads to the E. plantagineum (Boraginaceae) reference plastome (GenBank accession: OL335188.1; Carvalho Leonardo et al., 2022) using Minimap2 v2.28 with default settings <ref type="bibr">(Li, 2018)</ref>. Reads that did not align to the reference were removed using SAMtools v1.10 <ref type="bibr">(Li et al., 2009)</ref>. The remaining, mapped reads had an exceptionally high estimated coverage (159,683&#215;; see Results); accordingly, reads were randomly subsampled with SeqKit v2.8.1 <ref type="bibr">(Shen et al., 2016)</ref> to approximately 150&#215; coverage to mitigate assembly errors. These subsampled reads were then used to generate a circular P. recurvata chloroplast genome assembly using Hifiasm v0.16.1 with default settings <ref type="bibr">(Cheng et al., 2021)</ref>. The assembly was evaluated using Bandage v0.8.1 <ref type="bibr">(Wick et al., 2015)</ref> and Geneious Prime 2024.0.5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Annotation</head><p>Repetitive elements were identified, classified, and softmasked in the final assembly. First, a custom species-specific library of identified and classified repetitive element families was generated with RepeatModeler v2.0.3 using default parameters <ref type="bibr">(Flynn et al., 2020)</ref>, which identifies known and unknown repetitive elements de novo using RECON v1.08 <ref type="bibr">(Bao and Eddy, 2002)</ref> and RepeatScout v1.0.6 <ref type="bibr">(Price et al., 2005)</ref>. Distinct repeats were identified and soft-masked (using the -xsmall flag) in the assembly iteratively with Re-peatMasker v4.1.3 (<ref type="url">https://www.repeatmasker.org/  RepeatMasker/</ref>), starting with simple repeats (specifying the -noint flag), then repeats identified from the nrTEplants2020 curated repetitive sequence library (Contreras-Moreira et al., 2021), followed by known and unknown repeats from the P. recurvata repeat library. After repeats were masked in the final assembly, telomeric regions of the chromosomes were predicted using TeloExplorer from QuarTeT v1.1.6 with -m 85 <ref type="bibr">(Lin et al., 2023)</ref>.</p><p>Gene structural annotation of the P. recurvata genome assembly was performed on the soft-masked assembly using the BRAKER3 annotation pipeline <ref type="bibr">(Hoff et al., 2016</ref><ref type="bibr">(Hoff et al., , 2019;;</ref><ref type="bibr">Br&#367;na et al., 2021;</ref><ref type="bibr">Gabriel et al., 2021</ref><ref type="bibr">Gabriel et al., , 2024))</ref>, which uses evidence from homologous proteins and transcripts to train AUGUSTUS <ref type="bibr">(Stanke et al., 2006)</ref> and GeneMark-EP+ <ref type="bibr">(Lomsadze et al., 2005;</ref><ref type="bibr">Br&#367;na et al., 2020)</ref> to produce high-confidence gene models. Whole transcripts from the Iso-Seq library were aligned to the P. recurvata reference assembly using Minimap2 v2.28 with -ax splice:hq specified <ref type="bibr">(Li, 2018)</ref>. A custom protein database containing 456,224 peptides was constructed from the proteomes of 10 lamiids species with well-annotated genomes (Table <ref type="table">S7</ref> in Appendix S1). We used the braker3_lr singularity container to accommodate the use of the P. recurvata Iso-Seq transcriptome as evidence (alongside the custom protein database). The resulting set of predicted proteins was assessed for completeness against the eudicots_odb10 (8 January 2024) benchmarking ortholog database with BUSCO v5.6.1 in euk_tran mode (specifying -m 'tran'; <ref type="bibr">Manni et al., 2021)</ref>. Summary statistics describing the protein annotation were generated using the agat_sp_statistics.pl script from AGAT v1.0.0 <ref type="bibr">(Dainat, 2022)</ref>. The final protein set was functionally annotated using InterProScan v5.66-98.0 <ref type="bibr">(Jones et al., 2014)</ref> with the --goterms flag to include gene ontology terms.</p><p>The chloroplast assembly was annotated using GeSeq v2.03 <ref type="bibr">(Tillich et al., 2017)</ref>. GeSeq utilizes HMMER v3.4 (<ref type="url">http://hmmer.org</ref>), Chlo&#235; <ref type="bibr">(Zhong, 2020)</ref>, and BLAST <ref type="bibr">(Altschul et al., 1990)</ref> to annotate the inverted repeat regions, protein-coding sequences, rRNAs, and tRNAs in plastid genomes. The map of the annotated P. recurvata chloroplast genome (Figure <ref type="figure">S7</ref> in Appendix S2) was generated using OGDRAW v1.3.1 <ref type="bibr">(Greiner et al., 2019)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Synteny analysis</head><p>Regions of conserved gene synteny were identified among the assembled P. recurvata chromosomes and with the E. plantagineum chromosomes (GCA_003412495.2), the only other chromosome-level assembly in Boraginaceae that has been annotated <ref type="bibr">(Tang et al., 2020)</ref>. While the E. plantagineum annotation described by Tang et al. ( <ref type="formula">2020</ref>) is not publicly available, the raw transcripts used to annotate the reference genome are available on NCBI. Three of these available transcriptome libraries (SRR4034891, SRR4034890, and SRR7076848) were used to generate a draft gene structural annotation of the E. plantagineum assembly. Using the standard braker3 singularity container, the same annotation protocol described above was followed to incorporate short-read RNA-Seq transcriptome data as evidence of gene structure <ref type="bibr">(Gabriel et al., 2024)</ref>. Groups of orthologous genes from the resulting proteomes from each genome annotation were identified using OrthoFinder v2.5.4 with default settings <ref type="bibr">(Emms and Kelly, 2019)</ref>. The groups of shared orthologs identified by OrthoFinder were used to identify blocks of conserved gene order that cluster</p><p>T A B L E 1 Pectocarya recurvata assembly and annotation statistics. Metric Assembly (initial) Assembly (final) 12 Putative chromosomes Coverage 264&#215; --GC content (%) 37.8% 38.4% 35.1% Contiguity and completeness Total length (bp) 393,226,675 215,950,562 158,303,007 Total length of contigs &gt; 1 Mbp 158,303,007 158,303,007 158,303,007 Total number of contigs 6461 1531 12 Number of contigs &gt; 1 Mbp 12 12 12 Length of longest contig (bp) 18,090,075 18,090,075 18,090,075 N50 (bp) 61,471 12,148,414 -L50 451 8 -N90 (bp) 27,701 36,944 -L90 4656 561 -Eudicots complete BUSCOs 2208 (94.9%) 2208 (94.9%) 2208 (94.9%) Eudicots complete and single-copy BUSCOs 2051 (88.2%) 2052 (88.2%) 2054 (88.3%) Eudicots duplicated BUSCOs 157 (6.7%) 156 (6.7%) 154 (6.6%) Eudicots fragmented BUSCOs 25 (1.1%) 25 (1.1%) 25 (1.1%) Annotation Total repetitive DNA 277,362,488 (70.54%) 99,482,072 (46.07%) 41,832,491 (26.43%) Transposable elements 224,364,318 (57.06%) 75,748,802 (35.08%) 38,159,365 (24.11%) Simple repeats 3,299,783 (0.84%) 2,223,469 (1.03%) 2,137,440 (1.35%) Total number of gene models -47,030 30,655 Total number of proteins -49,558 33,020 Total coding region (bp) -51,374,723 40,679,859 Average gene length (bp) -1987 2538 Eudicots complete BUSCOs -2234 (96.0%) 2232 (95.9%) Eudicots complete and single-copy BUSCOs -1920 (82.5%) 1913 (82.2%) Eudicots duplicated BUSCOs -314 (13.5%) 319 (13.7%) Eudicots fragmented BUSCOs -10 (0.4%) 10 (0.4%) De novo TE library Known repeats 118 113 103 Unknown repeats 538 473 438 Total repeats 656 686 541</p><p>Note: BUSCOs = benchmarking universal single-copy orthologs; TE = transposable element.</p><p>into regions of synteny using the GENESPACE v1.4 R package with default settings <ref type="bibr">(Lovell et al., 2022)</ref>. The syntenic depth ratio of each syntenic block (collinear sets of genes &gt;5) was calculated using MCscan (Python version), which depends on LAST <ref type="bibr">(Tang et al., 2008;</ref><ref type="bibr">Kie&#322;basa et al., 2011)</ref>. Additionally, evidence for ancient WGDs in the P. recurvata and E. plantagineum genomes was identified by analyzing the frequency distribution of synonymous divergence (K s ) between paralogs for each genome individually. K s was calculated for each pair of duplicate genes using wgd v2 <ref type="bibr">(Chen et al., 2024</ref>). The K s plot was filtered by only retaining syntenic duplicates, i.e., those duplicates that most represent duplication from a genome-wide multiplication rather than small-scale multiplications such as tandem duplicates, within wgd using i-ADHoRe v3.0 <ref type="bibr">(Proost et al., 2012)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Assembly</head><p>The k-mer analysis using the HiFi reads predicted a haploid genome size of 152,502,236 bp, genome-wide heterozygosity of 0.0475%, and a repeat content length of 34,877,869 bp (comprising 22.9% of the genome) for the sequenced P. recurvata individual. The Hifiasm assembly contained 6573 contigs covering 394,090,231 bp with no structural or small-scale errors detected by Inspector <ref type="bibr">(Chen et al., 2021)</ref>. After removing contigs smaller than 10 kbp, 6461 contigs covering 393,226,675 bp remained in the initial assembly (Table <ref type="table">1</ref>). Twelve of these contigs were greater than 1 Mbp (total length = 158,303,007 bp), corresponding to the expected number of haploid chromosomes (n = 12) in P. recurvata <ref type="bibr">(Veno, 1979)</ref>. Of the 6449 trailing contigs in the initial assembly, 4930 (76.4%) aligned to our P. recurvata plastome assembly (with 94.1% identical sequence) and were subsequently removed from the initial assembly, creating the final assembly. The Boraginaceae mitochondrial sequences from NCBI mapped to 678 unique contigs (10.5% of the trailing contigs), and 268 other contigs (4.4% of the trailing contigs) mapped to the 12 large contigs, all to the end of chromosome 12. No external contamination was detected in our final assembly through our BlobToolKit screen (Figure <ref type="figure">S9</ref> in Appendix S2). The NCBI contamination screen detected one trailing contig that contained a small section (47 bp) of foreign DNA sequence on the end of one trailing contig, which was subsequently trimmed.</p><p>The final assembly contains 1531 contigs covering 215,950,562 bp and has an N50 of 12,148,414 bp and an L50 of 8, indicating a high level of contiguity (Table <ref type="table">1</ref>). The total length of the 12 leading contigs is 158,303,007 bp, which is similar to the estimated genome size (152.5 Mbp) from the k-mer analysis and accounts for 73.3% of the final assembly. A total of 94.9% of complete eudicot BUSCOs were found in these 12 contigs, and none of the BUSCOs that were duplicated, fragmented, or missing from either BUSCO dataset were identified in the trailing contigs of the assembly (Table <ref type="table">1</ref>). Hereafter, we refer to these 12 leading contigs as the putative P. recurvata chromosomes, numbered 1-12 by length (Figure <ref type="figure">2A</ref>, Table <ref type="table">S2</ref> in Appendix S1).</p><p>Telomeric regions consisting of greater than 85 tandem repeats of CCCTAAA, the typical plant telomeric repeat sequence <ref type="bibr">(Peska and Garcia, 2020)</ref>, were identified at both ends of the putative chromosomes 1-11, while 64 repeats were identified on one end of chromosome 12 and none were found on the other end. No other telomeric regions were identified in the assembly.</p><p>The chromosomes (158.3 Mbp) consist of 26.43% repetitive content identified by RepeatMasker (<ref type="url">https://www.  repeatmasker.org/RepeatMasker/</ref>), where 24.11% is attributed to transposable elements, 1.66% to tandem repeats, and 0.17% to small RNAs. The most common transposable elements are long terminal repeats (LTRs), which make up 6.29% of the chromosomes, while long interspersed nuclear elements (LINEs) make up 0.81%, short interspersed nuclear elements (SINEs) make up 0.22%, and 15.38% are unclassified. DNA transposons comprise 1.40% of the chromosomes (Table <ref type="table">S5</ref> in Appendix S1). Notably, the repetitive content of the P. recurvata chromosomes (26.43%) is much lower than that reported in either of the described genome assemblies from Boraginaceae: the L. erythrorhizon assembly is 366.7 Mbp, of which 51.78% is repetitive content, and similarly the E. plantagineum assembly is 348.9 Mbp and 43.30% repetitive content (Table <ref type="table">S6</ref> in Appendix S1) <ref type="bibr">(Auber et al., 2020;</ref><ref type="bibr">Tang et al., 2020)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Genome size estimation: Flow cytometry, k-mer analysis, and inference of ploidy level</head><p>Our genome size measurements of six P. recurvata individuals by flow cytometry revealed putative cytotype variation in our sampled population. Two of the individuals had an estimated genome size of 1C = 0.201 &#177; 0.008 pg <ref type="bibr">(196.9 &#177; 7.4</ref> Mbp; Table <ref type="table">S9</ref> in Appendix S1), which is somewhat larger than (but generally consistent with) our k-mer-based genome size estimate (152.5 Mbp) and the length of the 12 chromosomal contigs in our assembly (158.3 Mbp). However, the four other samples analyzed had a mean genome size of 1C = 0.395 &#177; 0.001 pg (385.8 &#177; 1.3 Mbp; Table <ref type="table">S9</ref> in Appendix S1), which is approximately twice the smaller estimates. We ran SmudgePlot <ref type="bibr">(Ranallo-Benavidez et al., 2020)</ref> to detect ploidy level in our sequenced genome from relative k-mer coverages, and this indicated that the individual sequenced for the genome assembly was likely tetraploid (2n = 48) <ref type="bibr">(Figures S2 and S8 in Appendix S2)</ref>. Because the k-mer analysis also estimated a very low level of heterozygosity, consistent with high levels of self-fertilization, this raises the possibility that our assembly represents a haploid assembly of P. recurvata that collapsed two, putatively highly similar, subgenomes of a recent autotetraploid.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Annotation</head><p>The structural gene annotation of the P. recurvata reference chromosomes contains 30,655 predicted gene models (mean gene length = 2538 bp) that code for 33,020 proteins. On average, the predicted genes are 2538 bp long and contain five exons. In total, the predicted genes encompass 77,831,286 bp, of which 40,679,859 bp is attributed to coding sequence. The comparison of the P. recurvata protein annotation against the eudicot_odb10 BUSCO dataset found 95.5% complete BUSCOs (n = 2326), of which 1913 (85.7%) are single copy (Table <ref type="table">1</ref>). Of the 33,020 predicted proteins, 93.8% (30,979) were functionally annotated with Inter-ProScan <ref type="bibr">(Jones et al., 2014)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Gene synteny</head><p>Self-self comparison of gene order and content in the P. recurvata chromosomes revealed several duplicated collinear gene segments (Figure <ref type="figure">2B</ref>). The K s between paralogs in the P. recurvata genome revealed a distinctive peak in the number of syntenic paralogs at K s = 0.4-0.5, reflective of an ancient WGD in the evolutionary history of P. recurvata (Figure <ref type="figure">2C</ref>). There is also a distinctive peak at K s = 0.4-0.5 in the K s distribution for E. plantagineum, as well as a distinctive peak in syntenic paralogs at K s = 0.1, reflective of an additional WGD event in the evolutionary history of E. plantagineum (Figure <ref type="figure">S3</ref> in Appendix S2).</p><p>By comparing the gene order of the P. recurvata and E. plantagineum genomes, we identified a 2:3 syntenic relationship between the genomes of these species (Figure <ref type="figure">3</ref>, Figure <ref type="figure">S6</ref> in Appendix S2). Together, the P. recurvata and E. plantagineum reference chromosomes have 70,183 annotated genes. Nearly all of these genes were placed in 19,150 orthogroups (89.0% of all genes), of which 15,906 orthogroups were shared between the two species. We identified 6339 single-copy orthologs (accounting for 40% of the P. recurvata genome). The shared orthogroups form 1252 syntenic blocks (collinear sets of genes &gt;5) that cluster into 690 syntenic regions (Figure <ref type="figure">3</ref>); nearly half (47%) of the syntenic blocks in the P. recurvata genome are found in duplicate in the E. plantagineum genome. In the E. plantagineum genome, 33% of syntenic blocks are found in   <ref type="bibr">(Krzywinski et al., 2009)</ref>. (B) A dot plot of self-self synteny in the P. recurvata chromosomes, colored according to the synonymous divergence (K s ) between each pair of duplicated genes. (C) A frequency distribution of the K s between paralogs in the P. recurvata chromosome. The gray background distribution includes every paralog, while the green distribution only includes syntenic paralogs. duplicate in the P. recurvata genome and 28% are found in triplicate <ref type="bibr">(Figures S5 and S6 in Appendix S2)</ref>. The excess number of duplicate genes shared in both genomes suggests that these species share a WGD in their evolutionary history. The excess of triplicate genes in the E. plantagineum genome indicates that an additional, independent WGD has occurred in E. plantagineum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Chloroplast assembly and annotation</head><p>Across 1,869,123 reads (mean length = 12,814 bp), 23,952,585,969 bp successfully mapped to the E. plantagineum reference plastome (149,776 bp) with an estimated 159,683&#215; coverage. The initial Hifiasm assembly from these reads, subsampled to 150&#215; per-base coverage, yielded three contigs totaling 187,628 bp, consisting of one 150,172 bp circular contig and two smaller linear contigs (24,314 bp and 13,142 bp). To investigate these linear contigs, both were aligned to the large circular contig using Geneious Prime 2024.0.5. The linear contigs each share 100% sequence identity with regions on the circular contig and were subsequently removed from the assembly. Thus, the final P. recurvata chloroplast genome (hereafter "plastome") assembly solely consists of the large circular contig from the initial assembly.</p><p>The P. recurvata plastome has 37.4% GC content and a quadripartite structure typical of chloroplast genomes, consisting of a small single-copy region (17,143 bp; 31.1% GC), two inverted repeats (IRa: 25,446 bp; IRb: 25,467 bp; 43.1% GC), and a large single-copy region (82,096 bp; 35.3% GC). GeSeq annotation of this circular contig predicted 130 unique genes in the plastome including eight rRNAs, 37 tRNAs, and 85 protein-coding genes (Figure <ref type="figure">S7</ref> in Appendix S2). The length and gene content of the P. recurvata plastid genome assembly are comparable to that of the E. plantagineum plastome (Table <ref type="table">S10</ref> in Appendix S1) <ref type="bibr">(Carvalho Leonardo et al., 2022)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DISCUSSION</head><p>We assembled and annotated the first chromosome-level reference genome of P. recurvata. The final genome assembly (216.0 Mbp) is highly contiguous (N50 = 12.1 Mbp) and complete (94.9% eudicot BUSCOs identified). This completeness is reflected in the 12 reference chromosomes, which make up 158.3 Mbp of the assembly. These chromosomes have 23 out of 24 putative telomeres and contain 30,655 predicted genes that code for 33,020 proteins that comprise highly complete structural and function annotations (95.9% eudicot BUSCOs were identified; 93.8% were assigned functions by InterProScan). Many large syntenic regions were identified between P. recurvata and E. plantagineum, further validating the quality of our gene annotation.</p><p>The K s of putative paralogs in the P. recurvata and E. plantagineum genomes suggests that a WGD event occurred in recent evolutionary history at K s &#8776; 0.4-0.5 (Figure <ref type="figure">2C</ref>). Our finding is congruent with a previously described WGD in the core Boraginales (from analyses of the E. plantagineum and Cordia subcordata Lam. genomes) at K s &#8776; 0.417, which putatively occurred before the divergence of the Boraginoideae and Cynoglossoideae subfamilies in Boraginaceae <ref type="bibr">(Tang et al., 2020;</ref><ref type="bibr">Chen et al., 2023)</ref>. This WGD is likely the source of the large number of genes found reciprocally in duplicate between the P. recurvata and E. plantagineum genomes. While <ref type="bibr">Ren et al. (2018)</ref> and <ref type="bibr">Tang et al. (2020)</ref> also found evidence of another, older WGD in the core Boraginales at K s &#8776; 0.939, there is no distinctive peak at K s = 0.9 in our analyses. Interestingly, some recent analyses using a combination of available genomes and transcriptomes do not identify the more recent WGD at K s &#8776; 0.417 that is evidenced by the P. recurvata genome and other analyses <ref type="bibr">(Ren et al., 2018;</ref><ref type="bibr">Auber et al., 2020;</ref><ref type="bibr">Tang et al., 2020;</ref><ref type="bibr">Zhang et al., 2020)</ref> we find evidence that E. plantagineum has undergone a more recent, independent WGD from P. recurvata <ref type="bibr">(Figures S5 and S6 in Appendix S2)</ref>. Whole genome duplication has played an instrumental role in the evolution the angiosperms-on average, every angiosperm has experienced 3.5 rounds of WGD (and subsequent rediploidization) in their evolutionary history <ref type="bibr">(One Thousand Plant Transcriptomes Initiative, 2019;</ref><ref type="bibr">McKibben et al., 2024)</ref>. It remains a prominent goal in plant biology to determine the phylogenetic placement of WGDs and their role in plant evolution <ref type="bibr">(Van de Peer et al., 2017;</ref><ref type="bibr">Barker et al., 2024)</ref>. Additional genomic resources in Boraginales will be helpful for confidently placing these disputed WGDs and understanding their impact on the evolution of this clade.</p><p>In estimating the genome size and sequencing the P. recurvata genome, we discovered putative cytotype variation in P. recurvata that we infer is the result of very recent autopolyploidy. Our genome size estimate from flow cytometry was 1C = 196.9 Mbp for two individuals, which is larger but consistent with the size of our highly complete assembly of n = 12 chromosomes (159.3 Mbp), yet we also observed 1C = 385.8 Mbp for four different individuals from the same population. SmudgePlot analysis of our genomic data indicated that the sequenced individual was likely a very recent tetraploid with very low divergence among duplicated k-mers. If this is true, we infer that the assembly collapsed the two highly similar subgenomes, resulting in a reference assembly similar in size to the observed diploids. Prior karyological work by <ref type="bibr">Veno (1979)</ref> observed only 2n = 24 in individuals of P. recurvata from five populations across Arizona and Southern California (also see <ref type="bibr">Ward, 1983)</ref>. Species in the genus Pectocarya follow a polyploid series of 2n = 24 (e.g., P. recurvata, P. heterocarpa I.M. Johnst.), 2n = 48 (e.g., P. platycarpa Munz &amp; I.M. Johnst., P. anisocarpa Veno), and 2n = 72 (P. linearis (Ruiz &amp; Pav.) DC.), suggesting that recent WGD has played a role in the evolutionary history of this genus <ref type="bibr">(Veno, 1979)</ref>. Autopolyploidy is common among plants, yet autopolyploid individuals and species are often undistinguished because they appear morphologically very similar to their diploid progenitors <ref type="bibr">(Soltis and Soltis, 2000;</ref><ref type="bibr">Soltis et al., 2007;</ref><ref type="bibr">Barker et al., 2016;</ref><ref type="bibr">Lv et al., 2024)</ref>. Moreover, the short-term ecological consequences of autopolyploidy remain elusive <ref type="bibr">(Stebbins, 1940;</ref><ref type="bibr">Soltis et al., 2014;</ref><ref type="bibr">Assour et al., 2024)</ref>. Quantifying the geographic extent of cytotype variation in P. recurvata would lend insight into the distribution of genetic variation in this species and also provide the opportunity to explore associations between autopolyploidy and relevant ecological factors <ref type="bibr">(Servick et al., 2015;</ref><ref type="bibr">Visger et al., 2016;</ref><ref type="bibr">Barker et al., 2024)</ref>.</p><p>In addition to our discovery of putative ploidy variation in P. recurvata, we uncovered one of the smallest recorded genome sizes in Boraginaceae to date. There are currently 57 primary genome size estimates for taxa in Boraginaceae recorded in the Kew Plant DNA C-values Database (Release 7.1; <ref type="url">https://cvalues.science.kew.org/</ref> [accessed September 2024]), ranging from 1C = 313.0 Mbp (Echium bonnetii Coincy) to 1C = 10,878.0 Mbp (Alkanna leiocarpa Rech.f.) (mean 1C = 1599.07 Mbp). A more recent comprehensive survey of Boraginaceae genome size estimates from species occurring in the Czech Republic <ref type="bibr">(Kobrlov&#225; and Hrone&#353;, 2019)</ref> recorded genome sizes that range from approximately 1C = 270 Mbp for Myosotis sylvatica Ehrh. to 1C = 16,000 Mbp for Lycopsis arvensis L. At 1C = 196.9 Mbp, our flow cytometry measurement is smaller than all current observations in the family. Genome size can vary greatly between related species, in part due to WGD and subsequent re-diploidization <ref type="bibr">(Li et al., 2021)</ref> as well as transposable element activity <ref type="bibr">(Vitte and Panaud, 2005;</ref><ref type="bibr">El Baidouri and Panaud, 2013)</ref>. In Boraginaceae, taxa in the Boraginoideae subfamily are more often diploid than taxa in Cynoglossoideae, yet have larger chromosomes and genome sizes, suggesting that transposable elements might have played an important role in generating the 35-fold variation in genome sizes across clades within this group <ref type="bibr">(Kobrlov&#225; and Hrone&#353;, 2019)</ref>. Notably, we found that the repetitive content of the P. recurvata chromosomes (26.43%) is much lower than that reported in either of the described genome assemblies from Boraginaceae, both members of the Boraginoideae subfamily: the L. erythrorhizon assembly is 366.7 Mbp, of which 51.78% is repetitive content, and similarly the E. plantagineum assembly is 348.9 Mbp, with 43.30% repetitive content (Table <ref type="table">S6</ref> in Appendix S1) <ref type="bibr">(Auber et al., 2020;</ref><ref type="bibr">Tang et al., 2020)</ref>.</p><p>The adaptive significance of plants' genome sizes remains under investigation <ref type="bibr">(Gregory, 2001;</ref><ref type="bibr">Knight and Beaulieu, 2008;</ref><ref type="bibr">Vesel&#253; et al., 2012;</ref><ref type="bibr">Bure&#353; et al., 2024)</ref>. The large genome constraint hypothesis asserts that larger genome sizes are maladaptive due to the physiological consequences associated with larger cell sizes <ref type="bibr">(Knight et al., 2005;</ref><ref type="bibr">Vesel&#253; et al., 2020)</ref>; for example, larger stomatal guard cells open and close more slowly, resulting in lower water use efficiency <ref type="bibr">(Drake et al., 2013)</ref>. Such small changes in physiology can compound and decrease fitness in different climates <ref type="bibr">(Bure&#353; et al., 2024)</ref>. Herbaceous angiosperms with larger genomes are associated with heightened extinction risk, regardless of climate <ref type="bibr">(Soto Gomez et al., 2024)</ref>. Moreover, smaller genome sizes are frequently associated with faster rates of cell division that lead to faster development and flowering times <ref type="bibr">(Grime and Mowforth, 1982;</ref><ref type="bibr">Bilinski et al., 2018;</ref><ref type="bibr">Cacho et al., 2021;</ref><ref type="bibr">Cang et al., 2024)</ref>. Desert annual plants, such as P. recurvata, are under strong selective pressure to rapidly grow and reproduce after germinating to take advantage of ephemeral moisture following precipitation events <ref type="bibr">(Huxman et al., 2008;</ref><ref type="bibr">Barron-Gafford et al., 2013)</ref>. Past selection has favored a P. recurvata phenotype that is highly water use efficient relative to other members of the Sonoran Desert winter annual community <ref type="bibr">(Angert et al., 2007;</ref><ref type="bibr">Huxman et al., 2008)</ref>. Our finding of a small genome size in this species raises the question of whether selection might also favor a faster development time through genome size reduction.</p><p>While the Sonoran Desert winter annual plants are a welldeveloped study system in ecology and evolutionary biology, only a handful of reference genomes have been assembled for these plants. Of the 51 most common Sonoran Desert winter annuals found at Tumamoc Hill (Tucson, Arizona, USA; see <ref type="bibr">Ge et al., 2019)</ref>, only Eschscholzia californica Cham. <ref type="bibr">(Hori et al., 2018)</ref>, Erodium texanum A. Gray (NCBI accession: GCA_036897725.1), Sisymbrium irio L. <ref type="bibr">(Haudry et al., 2013)</ref>, and now P. recurvata, have reference genomes. Much is known about how species coexistence in this community is shaped by variation in species' positions along a physiological trade-off between water use efficiency and relative growth rate <ref type="bibr">(Chesson, 2000;</ref><ref type="bibr">Angert et al., 2009)</ref>; however, little is known about the genetic architecture underlying these traits and how genetic differences may combine to manifest these ecological dynamics <ref type="bibr">(Kimball et al., 2013;</ref><ref type="bibr">Angert et al., 2014)</ref>. In the context of anthropogenic climate change, it is critical to understand the genetic basis of relevant ecological traits being selected on by climate change in order to assess species' extinction risk and downstream ecological consequences <ref type="bibr">(Scheffers et al., 2016;</ref><ref type="bibr">Whiting et al., 2024)</ref>. The Sonoran Desert winter annuals have experienced overall declines in abundance due to recent anthropogenic change, and the community is simultaneously shifting to favor species with higher water use efficiency <ref type="bibr">(Kimball et al., 2010;</ref><ref type="bibr">Huxman et al., 2013)</ref>. The addition of a P. recurvata genome will facilitate investigations into the genetic mechanisms driving responses to climate in these plants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>Pectocarya recurvata is an important model system for a suite of ecological and evolutionary biology studies. Our generation of a new chromosome-level reference genome assembly and annotation for P. recurvata presents the possibility of new lines of inquiry, including exploring the effect of polyploidy and genome size variation on ecological interactions and climate adaptation in this system, as well as elucidating the genetic basis of the physiological trade-off between water use efficiency and relative growth rate exhibited by the Sonoran Desert winter annuals. Establishing such mechanistic links between genetic changes and ecology is not only a key goal in evolutionary ecology but an urgent task to evaluate responses to global climate change.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>21680450, 2025, 3, Downloaded from https://bsapubs.onlinelibrary.wiley.com/doi/10.1002/aps3.70008 by University Of Arizona Library, Wiley Online Library on [01/09/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License</p></note>
		</body>
		</text>
</TEI>
