skip to main content


Title: Genome-Wide Association Study in Two Cohorts from a Multi-generational Mouse Advanced Intercross Line Highlights the Difficulty of Replication Due to Study-Specific Heterogeneity
Abstract There has been extensive discussion of the “Replication Crisis” in many fields, including genome-wide association studies (GWAS). We explored replication in a mouse model using an advanced intercross line (AIL), which is a multigenerational intercross between two inbred strains. We re-genotyped a previously published cohort of LG/J x SM/J AIL mice (F34; n = 428) using a denser marker set and genotyped a new cohort of AIL mice (F39-43; n = 600) for the first time. We identified 36 novel genome-wide significant loci in the F34 and 25 novel loci in the F39-43 cohort. The subset of traits that were measured in both cohorts (locomotor activity, body weight, and coat color) showed high genetic correlations, although the SNP heritabilities were slightly lower in the F39-43 cohort. For this subset of traits, we attempted to replicate loci identified in either F34 or F39-43 in the other cohort. Coat color was robustly replicated; locomotor activity and body weight were only partially replicated, which was inconsistent with our power simulations. We used a random effects model to show that the partial replications could not be explained by Winner’s Curse but could be explained by study-specific heterogeneity. Despite this heterogeneity, we performed a mega-analysis by combining F34 and F39-43 cohorts (n = 1,028), which identified four novel loci associated with locomotor activity and body weight. These results illustrate that even with the high degree of genetic and environmental control possible in our experimental system, replication was hindered by study-specific heterogeneity, which has broad implications for ongoing concerns about reproducibility.  more » « less
Award ID(s):
1910885
NSF-PAR ID:
10283142
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
G3 Genes|Genomes|Genetics
Volume:
10
Issue:
3
ISSN:
2160-1836
Page Range / eLocation ID:
951 to 965
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Matise, T (Ed.)
    Abstract Combining samples for genetic association is standard practice in human genetic analysis of complex traits, but is rarely undertaken in rodent genetics. Here, using 23 phenotypes and genotypes from two independent laboratories, we obtained a sample size of 3076 commercially available outbred mice and identified 70 loci, more than double the number of loci identified in the component studies. Fine-mapping in the combined sample reduced the number of likely causal variants, with a median reduction in set size of 51%, and indicated novel gene associations, including Pnpo, Ttll6, and GM11545 with bone mineral density, and Psmb9 with weight. However, replication at a nominal threshold of 0.05 between the two component studies was low, with less than one-third of loci identified in one study replicated in the second. In addition to overestimates in the effect size in the discovery sample (Winner’s Curse), we also found that heterogeneity between studies explained the poor replication, but the contribution of these two factors varied among traits. Leveraging these observations, we integrated information about replication rates, study-specific heterogeneity, and Winner’s Curse corrected estimates of power to assign variants to one of four confidence levels. Our approach addresses concerns about reproducibility and demonstrates how to obtain robust results from mapping complex traits in any genome-wide association study. 
    more » « less
  2. null (Ed.)
    High rates of dispersal can breakdown coadapted gene complexes. However, concentrated genomic architecture (i.e., genomic islands of divergence) can suppress recombination to allow evolution of local adaptations despite high gene flow. Pacific lamprey (Entosphenus tridentatus) is a highly dispersive anadromous fish. Observed trait diversity and evidence for genetic basis of traits suggests it may be locally adapted. We addressed whether concentrated genomic architecture could influence local adaptation for Pacific lamprey. Using two new whole genome assemblies and genotypes from 7,716 single nucleotide polymorphism (SNP) loci in 518 individuals from across the species range, we identified four genomic islands of divergence (on chromosomes 01, 02, 04, and 22). We determined robust phenotype-by-genotype relationships by testing multiple traits across geographic sites. These trait associations probably explain genomic divergence across the species’ range. We genotyped a subset of 302 broadly distributed SNPs in 2,145 individuals for association testing for adult body size, sexual maturity, migration distance and timing, adult swimming ability, and larval growth. Body size traits were strongly associated with SNPs on chromosomes 02 and 04. Moderate associations also implicated SNPs on chromosome 01 as being associated with variation in female maturity. Finally, we used candidate SNPs to extrapolate a heterogeneous spatiotemporal distribution of these predicted phenotypes based on independent data sets of larval and adult collections. These maturity and body size results guide future elucidation of factors driving regional optimization of these traits for fitness. Pacific lamprey is culturally important and imperiled. This research addresses biological uncertainties that challenge restoration efforts. 
    more » « less
  3. Abstract

    Selection that acts in a sex-specific manner causes the evolution of sexual dimorphism. Sex-specific phenotypic selection has been demonstrated in many taxa and can be in the same direction in the two sexes (differing only in magnitude), limited to one sex, or in opposing directions (antagonistic). Attempts to detect the signal of sex-specific selection from genomic data have confronted numerous difficulties. These challenges highlight the utility of “direct approaches,” in which fitness is predicted from individual genotype within each sex. Here, we directly measured selection on Single Nucleotide Polymorphisms (SNPs) in a natural population of the sexually dimorphic, dioecious plant, Silene latifolia. We measured flowering phenotypes, estimated fitness over one reproductive season, as well as survival to the next year, and genotyped all adults and a subset of their offspring for SNPs across the genome. We found that while phenotypic selection was congruent (fitness covaried similarly with flowering traits in both sexes), SNPs showed clear evidence for sex-specific selection. SNP-level selection was particularly strong in males and may involve an important gametic component (e.g., pollen competition). While the most significant SNPs under selection in males differed from those under selection in females, paternity selection showed a highly polygenic tradeoff with female survival. Alleles that increased male mating success tended to reduce female survival, indicating sexual antagonism at the genomic level. Perhaps most importantly, this experiment demonstrates that selection within natural populations can be strong enough to measure sex-specific fitness effects of individual loci.

    Males and females typically differ phenotypically, a phenomenon known as sexual dimorphism. These differences arise when selection on males differs from selection on females, either in magnitude or direction. Estimated relationships between traits and fitness indicate that sex-specific selection is widespread, occurring in both plants and animals, and explains why so many species exhibit sexual dimorphism. Finding the specific loci experiencing sex-specific selection is a challenging prospect but one worth undertaking given the extensive evolutionary consequences. Flowering plants with separate sexes are ideal organisms for such studies, given that the fitness of females can be estimated by counting the number of seeds they produce. Determination of fitness for males has been made easier as thousands of genetic markers can now be used to assign paternity to seeds. We undertook just such a study in S. latifolia, a short-lived, herbaceous plant. We identified loci under sex-specific selection in this species and found more loci affecting fitness in males than females. Importantly, loci with major effects on male fitness were distinct from the loci with major effects on females. We detected sexual antagonism only when considering the aggregate effect of many loci. Hence, even though males and females share the same genome, this does not necessarily impose a constraint on their independent evolution.

     
    more » « less
  4. Abstract

    Urbanization is the dominant trend of global land use change. The replicated nature of environmental change associated with urbanization should drive parallel evolution, yet insight into the repeatability of evolutionary processes in urban areas has been limited by a lack of multi-city studies. Here we leverage community science data on coat color in > 60,000 eastern gray squirrels (Sciurus carolinensis) across 43 North American cities to test for parallel clines in melanism, a genetically based trait associated with thermoregulation and crypsis. We show the prevalence of melanism was positively associated with urbanization as measured by impervious cover. Urban–rural clines in melanism were strongest in the largest cities with extensive forest cover and weakest or absent in cities with warmer winter temperatures, where thermal selection likely limits the prevalence of melanism. Our results suggest that novel traits can evolve in a highly repeatable manner among urban areas, modified by factors intrinsic to individual cities, including their size, land cover, and climate.

     
    more » « less
  5. INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability. 
    more » « less