<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Quantitative trait loci concentrate in specific regions of the Mexican cavefish genome and reveal key candidate genes for cave-associated evolution</title></titleStmt>
			<publicationStmt>
				<publisher>Oxford Academic</publisher>
				<date>07/30/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10535044</idno>
					<idno type="doi">10.1093/jhered/esae040</idno>
					<title level='j'>Journal of Heredity</title>
<idno>0022-1503</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Jonathan Wiese</author><author>Emilie Richards</author><author>Johanna E Kowalko</author><author>Suzanne E McGaugh</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>A major goal of modern biology is connecting phenotype with its underlying genetic basis. The Mexican cavefish (Astyanax mexicanus), a characin fish species comprised of a surface ecotype and a cave-derived ecotype, is well suited as a model to study the genetic mechanisms underlying adaptation to extreme environments. Here we map 206 previously published quantitative trait loci (QTL) for cave-derived traits in A. mexicanus to the newest version of the surface fish genome assembly, AstMex3. These analyses revealed that QTL cluster in the genome more than expected by chance, and this clustering is not explained by the distribution of genes in the genome. To investigate whether certain characteristics of the genome facilitate phenotypic evolution, we tested whether genomic characteristics associated with increased opportunities for mutation, such as highly mutagenic CpG sites, are reliable predictors of the sites of trait evolution but did not find any significant trends. Finally, we combined the QTL map with previously collected expression and selection data to identify 36 candidate genes that may underlie the repeated evolution of cave phenotypes, including rgrb, which is predicted to be involved in phototransduction. We found this gene has disrupted exons in all non-hybrid cave populations but intact reading frames in surface fish. Overall, our results suggest specific regions of the genome may play significant roles in driving adaptation to the cave environment in Astyanax mexicanus and demonstrate how this compiled dataset can facilitate our understanding of the genetic basis of repeated evolution in the Mexican cavefish.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>A major goal of modern evolutionary biology is connecting phenotypic evolution with its underlying genetic basis. Understanding the genetic basis of phenotypes can help reveal whether shared or different genetic mechanisms contribute to repeated trait evolution <ref type="bibr">(Arendt and Reznick, 2008;</ref><ref type="bibr">Manceau, et al., 2010;</ref><ref type="bibr">Martin and Orgogozo, 2013;</ref><ref type="bibr">Waters and McCulloch, 2021)</ref> and how often repeated evolution occurs through the same or divergent genetic variants <ref type="bibr">(Courtier-Orgogozo, et al., 2020;</ref><ref type="bibr">Lee and Coop, 2017;</ref><ref type="bibr">Lee and Coop, 2019)</ref>.</p><p>Further, while studies have revealed that loci contributing to trait evolution can cluster in specific genomic regions, the extent to which specific regions in the genome are commonly used in trait evolution and what features of these regions make them more likely to harbor genetic variants contributing to phenotypic evolution remains understudied <ref type="bibr">(Storz, 2016;</ref><ref type="bibr">Woodhouse and Hufford, 2019;</ref><ref type="bibr">Wortel, et al., 2023;</ref><ref type="bibr">Xie, et al., 2019)</ref>.</p><p>Linking genotypes to specific phenotypes remains the first step in any query of the repeatability of molecular evolution, and quantitative trait locus (QTL) mapping is a powerful technique for discovering statistical associations between genotypes and phenotypes that has been utilized since the 1980's <ref type="bibr">(Paterson, et al., 1988)</ref> for a wide variety of organisms and traits <ref type="bibr">(Nuzhdin, Khazaeli and Curtsinger, 2005;</ref><ref type="bibr">Shaw, Parsons and Lesnick, 2007;</ref><ref type="bibr">Yano, et al., 1997)</ref>. QTL mapping is often the first step to understanding proximate mechanisms for how traits evolve, however, any single QTL study will underestimate the number of loci that influence a given trait and often overestimate their effect sizes <ref type="bibr">(Beavis, 1994;</ref><ref type="bibr">Kearsey and Farquhar, 1998;</ref><ref type="bibr">Xu, 2003)</ref>. Combining the results of several QTL studies for the same trait can identify loci that were missed by a single study and give a more comprehensive estimate of the number of loci underlying the trait <ref type="bibr">(Peichel and Marques, 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head><p>The Mexican cavefish, Astyanax mexicanus, is an ideal model system to investigate questions regarding the genetic basis of repeated adaptation to novel environments. Astyanax mexicanus is a species of freshwater characin fish that exists in two distinct ecotypes: a surface form found in streams across southern Texas and northeastern Mexico, and a cavedwelling form comprised of populations in at least 35 different caves <ref type="bibr">(Elliott, 2018;</ref><ref type="bibr">Espinasa, et al., 2020;</ref><ref type="bibr">Miranda-Gamboa, et al., 2023)</ref>. Phylogeographic analyses suggest that cave populations arose from at least two distinct colonization events from two distinct surface ancestral lineages within the last 200,000 years, and since the initial colonization most cave populations have evolved relatively independently from one another, although there is evidence of gene flow between some cave populations <ref type="bibr">(Bradic, et al., 2012;</ref><ref type="bibr">Coghill, et al., 2014;</ref><ref type="bibr">Herman, et al., 2018;</ref><ref type="bibr">Moran, et al., 2023)</ref>. To varying degrees, cave populations have convergently evolved several morphological traits, most notably degenerated eyes and reduced pigmentation <ref type="bibr">(Jeffery, 2001;</ref><ref type="bibr">Jeffery, 2020)</ref>, as well as behavioral characteristics such as decreased sleep, schooling, and stress behaviors <ref type="bibr">(Chin, et al., 2019;</ref><ref type="bibr">Duboue, Keene and Borowsky, 2011;</ref><ref type="bibr">Kowalko, et al., 2013)</ref>. The most frequently studied cave populations include the Pach&#243;n and Tinaja cave populations, originated from caves located in the Sierra de El Abra region of Mexico, and the Molino cave population, from the nearby Sierra de la Guatemala (Jeffery, 2020). Importantly, fish from cave-dwelling A. mexicanus populations can interbreed with fish from surface populations <ref type="bibr">(&#350;ado&#287;lu, 1957; &#350;ado&#287;lu, 1957)</ref>. The ability to cross cave and surface-dwelling fish to generate viable hybrid offspring allows researchers to utilize QTL mapping to identify statistical associations between genotypes and phenotypes, facilitating the study of the genetic basis of phenotypic evolution in <ref type="bibr">A. mexicanus (Borowsky and Wilkens, 2002;</ref><ref type="bibr">O'Quin and McGaugh, 2015;</ref><ref type="bibr">Wilkens, 1988)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head><p>In the last two decades, 20 QTL studies have been conducted to elucidate the genetic bases of traits in A. mexicanus (Table <ref type="table">1</ref>). The first QTL study in A. mexicanus was published by <ref type="bibr">Borowsky and Wilkens (2002)</ref> who identified QTL for three quantitative traits: eye size, pigmentation, and condition factor (a measure of weight scaled by length). Intriguingly, this study identified two clusters of QTL, one consisting of overlapping QTL for condition factor and pigmentation and another consisting of overlapping QTL for condition factor and eye size. In their discussion, Borowsky and <ref type="bibr">Wilkens (2002)</ref> highlighted the overlapping QTL and raise two possible explanations: (1) that the same genes underlie both traits due to pleiotropic effects, or (2) that different closely linked genes underlie the traits. A series of QTL studies by Protas and colleagues <ref type="bibr">(Protas, et al., 2007;</ref><ref type="bibr">Protas, et al., 2008;</ref><ref type="bibr">Protas, et al., 2006)</ref> corroborated the findings of Borowsky and Wilkens and showed that many QTL for cavederived traits form clusters in the A. mexicanus genome. Since these initial studies, <ref type="bibr">(O'Quin and McGaugh, 2015)</ref> mapped all published QTL through 2014 to the first draft of the A. mexicanus Pach&#243;n cavefish genome <ref type="bibr">(McGaugh, et al., 2014)</ref>, and found 13 significant QTL clusters across eight linkage groups.</p><p>Since this 2015 analysis <ref type="bibr">(O'Quin and McGaugh, 2015)</ref>, the number of published QTL studies for A. mexicanus has nearly doubled, with dozens of new QTL for traits ranging from scleral ossification to heart regeneration (Table <ref type="table">1</ref>). Further, a new surface fish chromosome level genome assembly, AstMex3, was made available in July 2022. Mapping QTL to a more complete surface fish genome, instead of the cavefish genome, will yield more comprehensive results if important loci were missing from the fragmented cavefish assembly.</p><p>The first draft cavefish genome was a scaffold level assembly, comprised of 10,735 scaffolds with a contig N50 of 14.7 kb and a scaffold N50 of 1.775 Mb <ref type="bibr">(McGaugh, et al., 2014)</ref>. In contrast, the AstMex3 surface fish assembly is chromosome level, containing 25 A c c e p t e d M a n u s c r i p t chromosomes, 109 unplaced scaffolds, and greatly improved N50 values of 47.1 Mb for contigs and 51.9 Mb for scaffolds <ref type="bibr">(Warren, et al., 2024)</ref>. Thus, linkage between traits as well as additional candidate genes may be uncovered since the new genome is much more contiguous.</p><p>Here we present an updated analysis of the distribution of QTL across the A. mexicanus genome, incorporating most published QTL studies in this species to date. Genetic markers used in prior studies were re-mapped to the AstMex3 genome to create an integrated genomic map, which was utilized to anchor QTL intervals to the new genome. Putative QTL clusters were then identified. To understand if certain mutational biases predispose regions to contribute to phenotypic evolution, we tested several genomic variables for correlations with the observed QTL. Next, while acknowledging the bias of QTL to identify regions of moderate to large effect <ref type="bibr">(Rockman, 2012)</ref>, we analyzed the percent variance explained (PVE) values of cave-derived QTL for different phenotypic classes. Lastly, we demonstrate the utility of our compiled QTL analysis for the evolutionary genetics community by crossreferencing QTL for cave-derived phenotypes with other sources of genomic data to identify candidate genes for repeated trait evolution in cavefish. In sum, this study contributes to our understanding of adaptation to an extreme environment and provides a resource for expedited candidate gene discovery and understanding potential links between multiple phenotypes.</p><p>A c c e p t e d M a n u s c r i p t</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Literature review</head><p>Quantitative trait locus (QTL) mapping studies in Astyanax mexicanus were identified by querying the online databases PubMed and Web of Science with the following search terms: "astyanax" AND "qtl"; "cavefish" AND "qtl"; "astyanax" AND "quantitative trait loc*"; "cavefish" AND "quantitative trait loc*". The resulting publications were manually filtered to exclude studies that did not map de novo QTL, as well as studies for which genetic marker sequences were not available. Only studies published before 6/6/22 were included in the analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mapping QTL to the AstMex3 genome</head><p>To map previously published QTL to the most recent version of the A. mexicanus genome, AstMex3, all available marker sequences used in prior QTL studies were collected and aligned to the AstMex3 surface genome with BLAST+ version 2.8.1 <ref type="bibr">(Camacho, et al., 2009)</ref> (Table <ref type="table">S1</ref>, S2, S3). For each marker sequence, the best quality BLAST alignment was identified by filtering according to the following criteria: 1. Highest bitscore, 2. Smallest evalue, 3. Highest percent identity. We refer to this match as the "best hit." Markers with multiple BLAST hits that could not be ranked using these criteria were discarded from the analysis. Additionally, markers for which the best BLAST hit mapped to a chromosome discordant with other markers on its original linkage group were discarded. The best BLAST hits for each of the remaining markers were collated into an integrated genomic map, anchored to the AstMex3 surface genome (Table <ref type="table">S4</ref>). Three genes (igf1, shhb also known as twhh, and pax6) that were used as markers in previous QTL studies but did not have A c c e p t e d M a n u s c r i p t sequences included in our marker database were added manually to the integrated map using the known genomic coordinates given in the AstMex3 genome assembly.</p><p>The integrated genomic map was used to identify the location of each QTL in the AstMex3 surface genome assembly. We caution that this is a rough estimate of QTL location, as we were comparing across over 20 years of studies, many of which report different levels of information. Nevertheless, compiling QTL peaks, even if rough approximation across studies, may be useful for future candidate gene identification and to understand traits with potentially co-localizing genomic regions. When available, the genetic markers designating the 95% confidence interval for each QTL (as defined by the study in which the QTL was originally mapped) were used to designate the section of the genome in which loci underlying the trait of interest may be found (see Table <ref type="table">S5</ref> for available metadata for each QTL). If a confidence interval was not available, only the marker representing the QTL peak was used to estimate the location of the QTL in the genome. If the markers designating both the 95% confidence interval and the peak of a given QTL were absent from the integrated genomic map, then that QTL was discarded from the rest of the analysis. We manually assigned the phenotypic category of each QTL that was mapped to the AstMex3 genome into one of eight categories (Eye, Pigment, Skeletal, Non-Eye Sensory, Metabolic, Behavior, Cardiac, Other morphological (this category consisted of traits that did not fit clearly into one of the other seven categories, e.g. length); Tables <ref type="table">S5</ref>, <ref type="table">S6</ref>), and the QTL intervals and peaks mapped to each chromosome were visualized using the karyoploteR package <ref type="bibr">(v. 1.24.0;</ref><ref type="bibr">Gel and Serra, 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Identifying QTL clusters</head><p>To analyze QTL distribution across the genome, QTL were filtered to include only the QTL that were mapped to intervals (not just peaks) in the integrated genomic map. Further, since QTL that were coarsely mapped to vast stretches of the genome may mask any potential trends in genomic QTL distribution, QTL with total interval lengths that were statistical outliers were excluded according to the outlier formula: Upper limit = Upper quartile + (1.5*IQR), where the upper quartile is the 75 th percentile of QTL lengths and the IQR is the interquartile range of QTL lengths (75 th percentile -25 th percentile). Next, the 25 chromosomes of the AstMex3 surface genome were divided into 118 non-overlapping 10 Mb windows using the tileGenome function from the GenomicRanges R package (v. 1.49.0; <ref type="bibr">Lawrence, et al., 2013)</ref>. A window size of 10 Mb was chosen because it approximates the typical length of a QTL interval in the filtered dataset (mean length = 11.4 Mb, median length = 9.64 Mb). Windows were "centered" by finding the largest whole number of 10 Mb windows that could fit in each chromosome, subtracting the total length of these windows from the total length of the chromosome to find the "remaining" chromosome length, and shifting the starting position of the first window to a value equal to one half of the remaining length.</p><p>The number of QTL overlapping each window was then identified using the bed_intersect tool in the R package valr (v. 0.6.6; <ref type="bibr">Riemondy, et al., 2017)</ref>. For the purposes of counting QTL within each window, all QTL for phenotypes relating to eye size (e.g., eye size, lens size, pupil area, etc.) were counted as one "distinct" QTL, as were all QTL for a melanophore phenotype (e.g., eye melanophore, dorsal melanophore, anal melanophore, etc.). The number A c c e p t e d M a n u s c r i p t of QTL overlapping each window after adjusting for these similar phenotypic traits is hereafter referred to as the number of "distinct QTL".</p><p>To test whether the number of observed distinct QTL in each window matches a distribution expected by chance, a Monte-Carlo Chi-square test was conducted with the xmonte function from the XNomial R package (v 1.0.4; <ref type="bibr">Engels, 2015)</ref>. Two expected distributions were tested: (1) a null expectation that QTL are distributed uniformly across the genome, and (2) the expectation that QTL are distributed across the genome proportional to protein-coding gene density. Following the tests, 10 Mb windows with high standardized residual values (&gt;3) against either the uniform distribution or the gene density distribution were grouped together and designated as putative QTL clusters (Table <ref type="table">S7</ref>) (as in Peichel and Marques, 2017). The observed distribution of distinct QTL across the A. mexicanus genome relative to the expected number of QTL under each of the two null expectations was visualized using the karyoploteR package (v. 1.24.0; <ref type="bibr">Gel and Serra, 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Testing potential molecular factors underlying QTL distribution</head><p>Mutational biases or opportunities may predispose regions to contribute to phenotypic evolution <ref type="bibr">(Storz, et al., 2019;</ref><ref type="bibr">Wortel, et al., 2023;</ref><ref type="bibr">Xie, et al., 2019)</ref>. To investigate whether any molecular factors may explain the distribution of cave-associated QTL across the A. mexicanus genome, data was generated for each of the 118 10Mb windows of the AstMex3 surface genome assembly (Table <ref type="table">S8</ref>). For each factor listed below, we conducted a Wilcoxon rank-sum test in R to test for differences between the set of 10 Mb windows that were designated as putative QTL clusters and the set of 10 Mb windows that contain no mapped</p><p>A c c e p t e d M a n u s c r i p t The following variables were investigated:</p><p>&#61623; Repetitive DNA: Repetitive DNA was quantified for each window as the proportion of all bases in the window that were reported as an "N" base in the hard-masked version of the AstMex3 surface fish genome, which was downloaded from the Ensembl Rapid Release FTP site <ref type="bibr">(Cunningham, et al., 2022)</ref>. This method quantifies repetitive DNA in a broad sense, as it captures any sequence considered "repetitive" in the original masking procedures of the AstMex3 genome assembly.</p><p>&#61623; GC content: GC content was calculated for each window as the proportion of all bases in the window that are a "C" or "G" in the unmasked version of the AstMex3 surface fish genome.</p><p>&#61623; CpG sites: CpG sites were quantified for each window as the proportion of all dinucleotides in the window that are "CG" in the unmasked genome. Dinucleotide distributions were calculated using the oligonucleotideFrequency function from the Biostrings R package (v 2.66.0; <ref type="bibr">Pages, et al., 2016)</ref>.</p><p>&#61623; TGTGTG repeats: Similarly, TGTGTG repeats are linked to replication fragile sites <ref type="bibr">(Xie, et al., 2019)</ref> and were quantified for each window as the proportion of all oligonucleotides of length 6 that are "TGTGTG" or "CACACA" in the unmasked genome with the oligonucleotideFrequency function.</p><p>&#61623; Gene number: The total number of genes, both protein-coding and non-proteincoding, overlapping each window was obtained with the bed_intersect function in the A c c e p t e d M a n u s c r i p t valr package <ref type="bibr">(Riemondy, et al., 2017)</ref>, with the genome annotation file from AstMex3 (Ensembl rapid release, GTF format; <ref type="bibr">(Cunningham, et al., 2022)</ref>) providing the genomic coordinates of each gene. Gene counts were also obtained separately for genes of the following biotypes as reported in the GTF file: protein-coding, miRNA, snRNA, lncRNA.</p><p>&#61623; Gene length: The average and median protein-coding gene length in bp was calculated for each window using the start and end points given in the GTF file (i.e., the lengths included the entire length of the gene, not just the coding regions).</p><p>&#61623; Transcription factor gene number: Changes in the coding sequences of transcription factors (TF) may be involved in the evolution of gene regulatory networks, potentially facilitating phenotypic change <ref type="bibr">(Cheatle Jarvela and Hinman, 2015)</ref>. The list of all genes annotated as transcription factors in A. mexicanus was downloaded from the Animal TFDB 4.0 <ref type="bibr">(Shen, et al., 2023)</ref>. The bed_intersect function was then used to identify the number of TF genes that overlap with each window.</p><p>&#61623; Recombination rate: Linkage maps were generated for Pach&#243;n x Surface, Tinaja x Surface, and Molino x Surface F2 crosses as part of a separate ongoing study. For each linkage map, a recombination rate between every consecutive pair of markers was estimated by dividing the distance between the markers in centimorgans (cM) by the distance in Mb, giving a recombination rate in units of cM/Mb. These values were then used to find the average recombination rate across each 10 Mb window using a weighted average, where the weight applied to each individual rate was equal to the number of bp by which the interval overlapped the 10 Mb window.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head><p>Analyzing effect sizes of cave-derived QTL</p><p>The distributions of reported PVE values for all QTL, as well as QTL in the Eye, Pigment, Skeletal, Non-Eye Sensory, Metabolic, and Behavior categories separately, were visualized as a histogram using ggplot2. These distributions were tested for skewness (a measure of the symmetry of a distribution) and kurtosis (a measure of whether the distribution is light-tailed or heavy-tailed) using the respective functions in the R package moments (v. 0.14.1; <ref type="bibr">Komsta and Novomestky, 2015)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Identifying candidate genes underlying repeated cave-derived trait evolution</head><p>Lastly, we demonstrate the utility of our compiled QTL analysis for identifying candidate genes for future functional analysis. By cross-referencing QTL with other sources of genomic data, we identified candidate genes for cave-derived phenotypes in cavefish. First, the set of all genes overlapping with a QTL was obtained using the valr package in R (v. 0.6.6; <ref type="bibr">Riemondy, et al., 2017)</ref>. Second, the set of all genes exhibiting evidence of selective sweeps in three independent cave populations (Pach&#243;n, Tinaja, and Molino) yet no evidence of selection in two surface populations (i.e., R&#237;o Choy, Rasc&#243;n) was defined as in <ref type="bibr">Moran, et al. (2023)</ref>. The Ensembl gene IDs published by <ref type="bibr">Moran, et al. (2023)</ref> were given for version 2 of the A. mexicanus genome. These IDs were converted to the version 3 equivalents using the AstMex3 homology table available on the Ensembl Rapid Release FTP site. Third, RNAseq data, originally collected by <ref type="bibr">(Mack, et al., 2021)</ref> from whole body samples of larval Pach&#243;n, Tinaja, and Molino cavefish as well as R&#237;o Choy surface fish, were reanalyzed. Raw reads were downloaded from the NCBI Sequence Read Archive (accession numbers given in Table S9 <ref type="bibr">(Leinonen, et al., 2010)</ref>), aligned to the AstMex3 surface fish genome using STAR (v. 2.7.1; <ref type="bibr">Dobin, et al., 2013)</ref>, and aligned reads were counted using HTSeq (v. 0.11.0; Anders,</p><p>A c c e p t e d M a n u s c r i p t  <ref type="table">S10</ref>, <ref type="table">S11</ref>, S12), and differentially expressed genes were identified using a cutoff of a Benjamini-Hochberg adjusted p-value &lt; 0.1. The set of genes that were differentially expressed in all three cave populations relative to the surface population, and for which the change in expression followed the same pattern in each population (i.e., upregulated in all three caves or downregulated in all three caves), were designated as genes with shared differential expression. Candidate genes for cave-derived phenotypes (Table <ref type="table">S13</ref>) were identified by finding the subset of genes that 1) Have undergone shared selective sweeps in three separate cave populations and not in surface populations as defined by Moran, et al.</p><p>(2023), 2) Have shared differential expression between three cave populations relative to a surface population, and 3) Are found within a QTL for cave-derived traits.</p><p>While the whole-body RNAseq reads from <ref type="bibr">(Mack, et al., 2021)</ref> provided a starting point to identify candidate genes, we were also interested in investigating gene expression specifically in eye tissue, as eye loss is a major evolutionary change that has occurred in cavefish. To this end, we accessed RNAseq reads that were originally collected from eye tissue of developing embryos at 54 hpf (Table <ref type="table">S9</ref>) <ref type="bibr">(Gore, et al., 2018)</ref>, and re-mapped these reads to the AstMex3 genome assembly as described above for the whole-body RNAseq experiment. Aligned reads were counted using HTSeq and analyzed for differential expression using DESeq2 as described above for the whole-body RNAseq experiment. From these data, genes were identified that were significantly differentially expressed in the developing eye tissue of Pach&#243;n cavefish relative to surface fish (Table <ref type="table">S14</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A map of cave-derived QTL across the Astyanax mexicanus genome</head><p>The first goal of this study was to provide a comprehensive map of previously published QTL for cave-derived traits in A. mexicanus to a new high-quality genome assembly, AstMex3, to facilitate analyses of trait evolution in this species. To this end, a total of 9023 genetic marker sequences from prior QTL studies (Tables 1, S1, S2) were collected and aligned to the AstMex3 surface fish genome using BLASTn (Table <ref type="table">S3</ref>). Two-hundred markers did not have any BLAST hits, 126 markers had a BLAST hit that mapped to a chromosome that was discordant with other markers on the original linkage group, and 96 markers did not have a best BLAST hit that could be unambiguously resolved. The final integrated genomic map consisted of the best BLAST hit for the remaining 8486 genetic markers (representing 94% of the collected markers) (Table <ref type="table">S4</ref>).</p><p>This genomic map was used to associate 206 previously published QTL for cave-derived traits to the AstMex3 surface genome (Tables <ref type="table">S4</ref>, <ref type="table">S5</ref>). For 168 of the QTL, genetic markers marking the ends of the LOD confidence interval were part of the integrated genomic map, and the genomic range spanning the beginning of the first marker to the end of the second marker was denoted as a QTL interval. For the remaining 38 QTL, only a genetic marker associated with the peak of the LOD curve was present in the integrated genomic map, and the genomic range of this individual marker was denoted as a QTL peak. Table <ref type="table">S6</ref> summarizes the phenotypic category of each QTL that was mapped to the AstMex3 genome in this analysis. The resulting QTL map shows a general tendency for QTL for cave-derived traits to overlap with each other across the A. mexicanus genome (Figure <ref type="figure">S1</ref>, Tables <ref type="table">S4</ref>, <ref type="table">S7</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>QTLs cluster in the Astyanax mexicanus genome</head><p>To test whether QTLs cluster in the A. mexicanus genome more than expected by chance, the observed QTL distribution across 10 Mb genomic windows was tested for fit against two expected distributions. First, we found that the observed QTL distribution does not fit the expected distribution under a null model which assumes a uniform distribution of QTL across the genome (chi-square = 192.9, p = 0.00002; Figure <ref type="figure">1</ref>). Second, the observed QTL distribution does not fit the expected distribution under a model in which QTL density is proportional to protein-coding gene density (chi-square = 216.1, p &lt; 0.00001; Figure <ref type="figure">1</ref>).</p><p>Together, these results support the hypothesis that QTL for cave-derived traits cluster in the genome. Six 10 Mb windows across four different chromosomes were identified as putative QTL clusters due to having significantly more QTL than expected under either distribution (Table <ref type="table">S7</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Distribution of putative QTL clusters is not explained by molecular factors</head><p>Increased mutation rates lead to increased opportunity for evolution <ref type="bibr">(Bromham, 2009;</ref><ref type="bibr">Kimura and Ohta, 1971)</ref>. To investigate whether the distribution of putative QTL clusters across the A. mexicanus genome are associated with known characteristics of the genome that are potentially associated with increased opportunity for mutation, 13 variables including recombination rate, gene length, gene number, transcription factor number, CpG sites, TGTGTG repeats, and repetitive DNA were tested for significant differences between putative QTL clusters (n=6) compared to 10 Mb windows that contain no known QTL (n=22; Table <ref type="table">S8</ref>). None of the variables tested were significantly different between the two groups</p><p>Downloaded from <ref type="url">https://academic.oup.com/jhered/advance-article/doi/10.1093/jhered/esae040/7724259</ref> by Applied Materials user on 22 August 2024 (Wilcoxon rank-sum test, p&gt;0.05 for all tests), suggesting that none of these factors are associated with cave-derived evolution at this coarse scale (Figure <ref type="figure">S2</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Distribution of effect sizes depends on type of trait</head><p>While acknowledging that we will miss many small and medium effect loci, the distribution of PVE values reported for all previously mapped cave-derived QTL in A. mexicanus was compiled to understand more about the distributions of effect size across different trait classifications (Figure <ref type="figure">2</ref>). The distribution is extremely positively skewed (Skewness coefficient = 2.04) and light-tailed (kurtosis coefficient = 8.02). This indicates that most alleles detected to be associated with cave-derived phenotypes explain a relatively small proportion of phenotypic variation, with just a few outlying mutations that contribute more substantially to cave-derived phenotypes.</p><p>This trend generally holds true when cave-derived QTL are divided into trait categories -all distributions are positively skewed and light-tailed, but there is variation in the degrees of skewness and kurtosis. The most highly positive skewed trait categories are Eye (skewness coefficient = 2.08) and Skeletal (skewness coefficient = 2.55), suggesting that a strong majority of mutations affecting these traits are of small effect with a few mutations having larger effects. The least strongly positively skewed trait categories are Non-Eye Sensory (skewness coefficient = 0.77) and Behavior (skewness coefficient = 0.73), suggesting that the distribution of effect sizes for these trait categories are somewhat larger than other traits.</p><p>Notably, traits in the Pigment category have some of the largest effect sizes across all traits Downloaded from <ref type="url">https://academic.oup.com/jhered/advance-article/doi/10.1093/jhered/esae040/7724259</ref> by Applied Materials user on 22 August 2024 A c c e p t e d M a n u s c r i p t (Figure <ref type="figure">2</ref>), and this may be because multiple studies mapped albinism which is a Mendelian trait.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A robust list of candidate genes underlying repeated cave-derived trait evolution</head><p>By utilizing multiple avenues of genomic data, the pool of candidate genes underlying traits of interest can be narrowed. We identified 36 candidate genes (Table <ref type="table">S13</ref>) underlying the repeated evolution of cave-derived traits in A. mexicanus by finding the subset of proteincoding genes that are present in at least one cave-derived QTL, have evidence of repeated selective sweeps across three populations of cavefish <ref type="bibr">(Moran, et al., 2023)</ref>, and are differentially expressed between cavefish and surface fish in all of the same three cave populations <ref type="bibr">(Mack, et al., 2021)</ref> (Tables <ref type="table">S10-S12</ref>). In addition, 14 of the 36 candidate genes (Table <ref type="table">S13</ref>) also exhibit down regulation in Pach&#243;n cavefish eyes relative to surface fish at 54hpf (Tables <ref type="table">S13</ref>, <ref type="table">S14</ref>). Notably, none of the 36 candidate genes were significantly upregulated in Pach&#243;n cavefish eyes at 54hpf relative to surface fish.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>The genetic architecture underlying adaptive evolution is a central topic in evolutionary biology. QTL studies can identify regions of the genome important for trait evolution and, thus, are a powerful method for investigating genetic mechanisms of adaptation. In this study, 206 quantitative trait loci (QTL) from 16 previously published studies <ref type="bibr">(Carlson, et al., 2018;</ref><ref type="bibr">Gross, Borowsky and Tabin, 2009;</ref><ref type="bibr">Gross, Krutzler and Carlson, 2014;</ref><ref type="bibr">Kowalko, et al., 2013;</ref><ref type="bibr">Kowalko, et al., 2013;</ref><ref type="bibr">Lyon, et al., 2017;</ref><ref type="bibr">O'Quin, et al., 2015;</ref><ref type="bibr">O'Quin, et al., 2013;</ref><ref type="bibr">Powers, Boggs and Gross, 2020;</ref><ref type="bibr">Protas, et al., 2007;</ref><ref type="bibr">Protas, et al., 2008;</ref><ref type="bibr">Protas, et al., 2006;</ref><ref type="bibr"/> A c c e p t e d M a n u s c r i p t <ref type="bibr">Stockdale, et al., 2018;</ref><ref type="bibr">Warren, et al., 2021;</ref><ref type="bibr">Yoshizawa, et al., 2015;</ref><ref type="bibr">Yoshizawa, et al., 2012)</ref> were mapped to the newly published A. mexicanus surface genome assembly, AstMex3. Collectively, these QTL represent a variety of morphological, physiological, and behavioral traits of interest that differentiate Mexican cavefish from their surface-dwelling counterparts. Our study organizes the disparate maps onto one high quality genome assembly, allowing for easy comparison and analysis of QTL features across studies. Further, this work is an improvement on a previous QTL review in A. mexicanus because it maps QTL to a more contiguous surface fish genome, capturing loci in QTL intervals that may have been fragmented or may not have been present in the cavefish genome <ref type="bibr">(O'Quin and McGaugh, 2015)</ref>. This integrated QTL map showed that the observed distribution of QTL for cave-derived traits across the A. mexicanus genome is non-uniform and is not proportional to proteincoding gene density. As a result of this analysis, six putative QTL clusters that contain more unique QTL than expected by chance were identified on four of the 25 chromosomes in the A. mexicanus genome. The most striking of these clusters occurs on Chromosome 1 from base pairs ~2Mb-42Mb. This 40 Mb region contains QTL for 10 distinct phenotypic traits including eye size, vibration attraction behavior, relative condition, tastebuds, melanophore, and several others, suggesting that Chromosome 1 is particularly enriched for cave-derived trait evolution in A. mexicanus. This result is very similar to the conclusion of a similar analysis in three-spined stickleback, which found that five out of 21 chromosomes contained more QTL than expected by chance <ref type="bibr">(Peichel and Marques, 2017)</ref>, suggesting that specific genomic regions drive adaptation in both systems. Clustering of QTL for seemingly distinct traits, such as what is observed on Chromosome 1, could be due to pleiotropic loci that influence multiple phenotypic traits of interest <ref type="bibr">(O'Quin and McGaugh, 2015;</ref><ref type="bibr">Peichel and Marques, 2017)</ref>. Pleiotropy could help explain adaptation to the cave environment that has independently occurred in multiple populations of this species, as selective pressure on one trait could drive coordinated change in other traits and bring about extensive phenotypic change <ref type="bibr">(Jeffery, 2005;</ref><ref type="bibr">Yamamoto, et al., 2009)</ref>. The putative QTL clusters identified in this study provide an ideal starting point to search for genes that may have pleiotropic effects <ref type="bibr">(Jeffery, 2010;</ref><ref type="bibr">Protas, et al., 2007;</ref><ref type="bibr">Yoshizawa, Kelly and Jeffery, 2013)</ref>. One such pleiotropic gene, oca2, has already been identified in the literature and is known to affect both albinism and sleep duration <ref type="bibr">(Klaassen, et al., 2018;</ref><ref type="bibr">O'Gorman, et al., 2021;</ref><ref type="bibr">Protas, et al., 2006)</ref>. Additionally, tightly linked genes could underlie the traits implicated in QTL clusters <ref type="bibr">(Borowsky, 2013)</ref>. Tight linkage of locally adapted alleles is predicted under theoretical scenarios of adaptation in the face of gene flow <ref type="bibr">(Yeaman, Aeschbacher and B&#252;rger, 2016;</ref><ref type="bibr">Yeaman and Whitlock, 2011)</ref> and appears to be a common theme as more evolutionary models are empirically interrogated <ref type="bibr">(Wilder, et al., 2020;</ref><ref type="bibr">Yeaman, 2022)</ref>. Indeed, loci associated with suites of traits in other systems have been found to be in tight physical linkage or genomic inversions <ref type="bibr">(Hess, et al., 2020;</ref><ref type="bibr">Noor, et al., 2001;</ref><ref type="bibr">Wellenreuther and Bernatchez, 2018;</ref><ref type="bibr">Wilder, et al., 2020)</ref> and future work will explore the potential for inversions to contribute to ecotype differences in this system. Notably, these two mechanisms for the clustering of phenotypes in the genome, pleiotropy and linkage of locally adapted alleles, are not mutually exclusive and can work synergistically <ref type="bibr">(Archambeault, et al., 2020;</ref><ref type="bibr">Ferris, et al., 2017)</ref> and could also work independently, but it is beyond the scope of this manuscript to distinguish if one or both of these mechanisms are impacting the statistical clustering of QTL.</p><p>A c c e p t e d M a n u s c r i p t Functional interrogation, for example with CRISPR-Cas9 gene editing <ref type="bibr">(Klaassen, et al., 2018)</ref>, of genes contained in QTL clusters also represents a promising next step toward understanding the extent pleiotropy and linkage may be involved in shaping traits in this system. For instance, if a CRISPR knockout of a single gene is found to affect multiple traits of interest, as was the case for oca2, then this experimental result would further support a role for pleiotropy <ref type="bibr">(O'Gorman, et al., 2021)</ref>. Mutating tightly linked genes within a QTL cluster and finding multiple genes that impact the traits of interest would support roles for linkage.</p><p>More broadly, the integrated QTL map is useful in analyzing how the distribution of cavederived QTL correlates with other factors on a genomic scale. Certain molecular attributes may influence which regions of the genome are more likely to mutate and contribute to phenotypic evolution, including recombination rate <ref type="bibr">(Halldorsson, et al., 2019;</ref><ref type="bibr">Lercher and Hurst, 2002)</ref> abundance of CpG sites <ref type="bibr">(Storz, et al., 2019)</ref>, TGTGTG repeats <ref type="bibr">(Xie, et al., 2019)</ref>, and gene length <ref type="bibr">(Mei, et al., 2018;</ref><ref type="bibr">Moran, et al., 2023;</ref><ref type="bibr">Woodhouse and Hufford, 2019)</ref>. If supported, these hypotheses suggest mutational biases predispose certain regions of the genome to phenotypic evolution. In our study, however, none of the 13 variables that were tested were significantly correlated with the distribution of QTL on a genomic scale.</p><p>While these variables exhibit no significant correlations to QTL clusters, the corresponding QTL intervals in the integrated map presented here are quite large and overlap several hundred genes. Thus, only a subsection of the QTL region may contribute in a meaningful way to the phenotype and the scale at which we examined the correlations may be too broad to accurately address correlations between genomic features and cave-derived evolution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A c c e p t e d M a n u s c r i p t</head><p>The collection of all previously published QTL into one dataset facilitates the analysis of effect sizes of cave-derived QTL across a range of traits. While most adaptative walks occur with predominantly small effect mutations, large-effect mutations may be utilized when responding to large environmental shifts <ref type="bibr">(Orr, 2006)</ref>, especially when, as in the cavefish system, migration with non-adapted populations occurs <ref type="bibr">(Yeaman and Whitlock, 2011)</ref>. To assess whether large-effect mutations may play an outsized role in the evolution of certain cave-derived phenotypes, the PVE reported for each QTL was recorded for all studies identified in the literature review. The QTL with the highest PVE value is a QTL for albinism, which is a Mendelian trait attributed to oca2 <ref type="bibr">(Protas, et al., 2006)</ref>, and pigmentation related traits seem to have several large-effect outliers compared to other trait classes. The phenomenon of a single gene explaining a large amount of variation in a trait, like oca2 does for albinism, is the exception, similar to the QTL review of stickleback fish (Peichel and Marques, 2017). Additionally, QTL mapping experiments tend to overestimate PVE <ref type="bibr">(Beavis, 1994)</ref> and be biased in detecting mostly larger-effect alleles <ref type="bibr">(Rockman, 2012)</ref>, therefore, some of the outlying QTL with high PVE values may not have a true effect size as great as their position in the distribution suggests. A final caveat is that the bias in overestimating PVE values increases with decreasing sample size, so comparing PVE effects of studies with different sample sizes is inexact <ref type="bibr">(Beavis, 1994;</ref><ref type="bibr">Xu, 2003)</ref>. Despite the bias in detecting large effect alleles, the shape of the distribution of PVE values across several traits and studies suggests that adaptation is generally attributable to many mutations of small effect rather than few mutations of large effect in A. mexicanus.</p><p>Likely most important for evolutionary geneticists, the integrated QTL map presented in this study provides an ideal starting point for identifying candidate genes underlying cave-derived traits. As proof of principle, here we utilized three sources of data -QTL, differential A c c e p t e d M a n u s c r i p t expression, and population genomics -to curate a list of just 36 genes that are likely to be involved in the repeated evolution of cave-derived traits in A. mexicanus (Table <ref type="table">S13</ref>). One exciting candidate gene is rgrb, which encodes retinal G protein-coupled receptor b. The protein encoded by this gene is predicted to be an opsin and the gene is widely expressed across zebrafish tissues <ref type="bibr">(Davies, et al., 2015)</ref>. We find that rgrb maps to a QTL for retinal thickness (specifically the outer plexiform layer (OPL)), and that rgrb is downregulated in cavefish populations relative to surface populations in whole body samples from 30dpf fry. Furthermore, rgrb is downregulated in developing eye tissue of 54 hpf Pach&#243;n cavefish relative to surface fish <ref type="bibr">(Gore, et al., 2018)</ref>. As with all candidates on our list, rgrb exhibits evidence of selective sweeps across three cave populations, suggesting that rgrb is a target of adaptation to the cave environment. Interestingly, exon 4 and exon 5 are nearly completely deleted across multiple cave populations (Lineage 1: Escondido, Molino, Jineo; Lineage 2: Jos, Monticellos, Pach&#243;n, Palma Seca, Tinaja, Yerbaniz), while reading frames are intact for all surface populations (see supplementary alignment). Together, these lines of evidence strongly suggest that rgrb has been a target of repeated selection in A. mexicanus, and that exon deletions and downregulation of rgrb may be involved in convergent abnormal retinal development in <ref type="bibr">A. mexicanus cavefish (Emam, et al., 2020)</ref>.</p><p>Other intriguing candidate genes from our list span a variety of traits. For instance, we find that the retinoic acid receptor stra6 is downregulated in cavefish and falls under a QTL for maxillary bone length, which tends to be longer in cavefish than surface fish <ref type="bibr">(Atukorala, et al., 2013)</ref>. Retinoic acid signaling is known to play a role in skull and tooth development in fish <ref type="bibr">(Draut, Liebenstein and Begemann, 2019)</ref>, so it is possible that stra6 downregulation may influence maxillary bone morphology in A. mexicanus through this avenue. As another example, the rab-GTPase rab8a has been implicated in the maintenance of glucose homeostasis through its regulation of glucose transporters <ref type="bibr">(Sun, et al., 2010)</ref>, which could  Downloaded from <ref type="url">https://academic.oup.com/jhered/advance-article/doi/10.1093/jhered/esae040/7724259</ref> by Applied Materials user on 22 August 2024 A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t   </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Downloaded from https://academic.oup.com/jhered/advance-article/doi/10.1093/jhered/esae040/7724259 by Applied Materials user on 22 August 2024</p></note>
		</body>
		</text>
</TEI>
