Abstract BackgroundEstimating and accounting for hidden variables is widely practiced as an important step in molecular quantitative trait locus (molecular QTL, henceforth “QTL”) analysis for improving the power of QTL identification. However, few benchmark studies have been performed to evaluate the efficacy of the various methods developed for this purpose. ResultsHere we benchmark popular hidden variable inference methods including surrogate variable analysis (SVA), probabilistic estimation of expression residuals (PEER), and hidden covariates with prior (HCP) against principal component analysis (PCA)—a well-established dimension reduction and factor discovery method—via 362 synthetic and 110 real data sets. We show that PCA not only underlies the statistical methodology behind the popular methods but is also orders of magnitude faster, better-performing, and much easier to interpret and use. ConclusionsTo help researchers use PCA in their QTL analysis, we provide an R package along with a detailed guide, both of which are freely available athttps://github.com/heatherjzhou/PCAForQTL. We believe that using PCA rather than SVA, PEER, or HCP will substantially improve and simplify hidden variable inference in QTL mapping as well as increase the transparency and reproducibility of QTL research.
more »
« less
This content will become publicly available on September 4, 2026
Evaluating the Effectiveness of Data Reduction Techniques in QTL Mapping
Abstract Data reduction methods are frequently employed in large genomics and phenomics studies to extract core patterns, reduce dimensionality, and alleviate multiple testing effects. Principal component analysis (PCA), in particular, identifies the components that capture the most variance within omics datasets. While data reduction can simplify complex datasets, it remains unclear how the use of PCA impacts downstream analyses such as quantitative trait loci (QTL) or genome-wide association (GWA) approaches and their biological interpretation. In QTL studies, an alternative to data reduction is the use of post-hoc data summarization approaches, such as hotspot analysis, which involves mapping individual traits and consolidating results based on shared genomic locations. To evaluate how different analytical approaches may alter the biological insights derived from multi-dimensional QTL datasets, we compared individual trait hotspots with PCA-based QTL mapping using transcriptomic and metabolomic data from a structured recombinant inbred line population. Interestingly, these two approaches identified different genomic regions and genetic architectures. These findings suggest that mapping PCA-reduced data does not merely streamline analyses but may generate a fundamentally different view of the underlying genetic architecture compared to individual trait mapping and hotspot analysis. Thus, the use of PCA and other data reduction techniques prior to QTL or GWAS mapping should be carefully considered to ensure alignment with the specific biological question being addressed.
more »
« less
- PAR ID:
- 10651090
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Populations may adapt to similar environments via parallel or non‐parallel genetic changes, but the frequency of these alternative mechanisms and underlying contributing factors are still poorly understood outside model systems. We used QTL mapping to investigate the genetic basis of highly divergent craniofacial traits between the scale‐eater (Cyprinodon desquamator) and molluscivore (C. brontotheroides) pupfish adapting to two different hypersaline lake environments on San Salvador Island, Bahamas. We lab‐reared F2 scale‐eater x molluscivore intercrosses from two different lake populations, estimated linkage maps, scanned for significant QTL for 29 skeletal and craniofacial traits, female mate preference, and sex. We compared the location of QTL between lakes to quantify parallel and non‐parallel genetic changes. We detected significant QTL for six craniofacial traits in at least one lake. However, nearly all shared QTL loci were associated with a different craniofacial trait within each lake. Therefore, our estimate of parallel evolution of craniofacial genetic architecture could range from one out of six identical trait QTL (low parallelism) to five out of six integrated trait QTL (high parallelism). We suggest that pleiotropy and trait integration can affect estimates of parallel evolution, particularly within rapid radiations. We also observed increased adaptive introgression in shared QTL regions, suggesting that gene flow contributed to parallel evolution. Overall, our results suggest that the same genomic regions may contribute to parallel adaptation across integrated suites of craniofacial traits, rather than specific traits, and highlight the need for a more expansive definition of parallel evolution.more » « less
-
Abstract The genetic basis of traits shapes and constrains how adaptation proceeds in nature; rapid adaptation can proceed using stores of polygenic standing genetic variation or hard selective sweeps, and increasing polygenicity fuels genetic redundancy, reducing gene re-use (genetic convergence). Guppy life history traits evolve rapidly and convergently among natural high- and low-predation environments in northern Trinidad. This system has been studied extensively at the phenotypic level, but little is known about the underlying genetic architecture. Here, we use four independent F2 QTL crosses to examine the genetic basis of seven (five female, two male) guppy life history phenotypes and discuss how these genetic architectures may facilitate or constrain rapid adaptation and convergence. We use RAD-sequencing data (16,539 SNPs) from 370 male and 267 female F2 individuals. We perform linkage mapping, estimates of genome-wide and per-chromosome heritability (multi-locus associations), and QTL mapping (single-locus associations). Our results are consistent with architectures of many loci of small-effect for male age and size at maturity and female interbrood period. Male trait associations are clustered on specific chromosomes, but female interbrood period exhibits a weak genome-wide signal suggesting a potentially highly polygenic component. Offspring weight and female size at maturity are also associated with a single significant QTL each. These results suggest rapid, repeatable phenotypic evolution of guppies may be facilitated by polygenic trait architectures, but subsequent genetic redundancy may limit gene re-use across populations, in agreement with an absence of strong signatures of genetic convergence from recent analyses of wild guppies.more » « less
-
Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.more » « less
-
Wisser, R J (Ed.)Abstract Ionomics measures elemental concentrations in biological organisms and provides a snapshot of physiology under different conditions. In this study, we evaluate genetic variation of the ionome in outbred, perennial switchgrass in three environments across the species’ native range, and explore patterns of genotype-by-environment interactions. We grew 725 clonally replicated genotypes of a large full sib family from a four-way linkage mapping population, created from deeply diverged upland and lowland switchgrass ecotypes, at three common gardens. Concentrations of 18 mineral elements were determined in whole post-anthesis tillers using ion coupled plasma mass spectrometry (ICP-MS). These measurements were used to identify quantitative trait loci (QTL) with and without QTL-by-environment interactions (QTLxE) using a multi-environment QTL mapping approach. We found that element concentrations varied significantly both within and between switchgrass ecotypes, and GxE was present at both the trait and QTL level. Concentrations of 14 of the 18 elements were under some genetic control, and 77 QTL were detected for these elements. Seventy-four percent of QTL colocalized multiple elements, half of QTL exhibited significant QTLxE, and roughly equal numbers of QTL had significant differences in magnitude and sign of their effects across environments. The switchgrass ionome is under moderate genetic control and by loci with highly variable effects across environments.more » « less
An official website of the United States government
