skip to main content


Title: Multiple Loci Control Eyespot Number Variation on the Hindwings of Bicyclus anynana Butterflies
The underlying genetic changes that regulate the appearance and disappearance of repeated traits, or serial homologs, remain poorly understood. One hypothesis is that variation in genomic regions flanking master regulatory genes, also known as input–output genes, controls variation in trait number, making the locus of evolution almost predictable. Another hypothesis implicates genetic variation in up- or downstream loci of master control genes. Here, we use the butterfly Bicyclus anynana , a species that exhibits natural variation in eyespot number on the dorsal hindwing, to test these two hypotheses. We first estimated the heritability of dorsal hindwing eyespot number by breeding multiple butterfly families differing in eyespot number and regressing eyespot numbers of offspring on midparent values. We then estimated the number and identity of independent genetic loci contributing to eyespot number variation by performing a genome-wide association study with restriction site-associated DNA sequencing from multiple individuals varying in number of eyespots sampled across a freely breeding laboratory population. We found that dorsal hindwing eyespot number has a moderately high heritability of ∼0.50 and is characterized by a polygenic architecture. Previously identified genomic regions involved in eyespot development, and novel ones, display high association with dorsal hindwing eyespot number, suggesting that homolog number variation is likely determined by regulatory changes at multiple loci that build the trait, and not by variation at single master regulators or input–output genes.  more » « less
Award ID(s):
1656389
NSF-PAR ID:
10171233
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Genetics
Volume:
214
Issue:
4
ISSN:
0016-6731
Page Range / eLocation ID:
1059 to 1078
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding the genetic basis of natural phenotypic variation is of great importance, particularly since selection can act on this variation to cause evolution. We examined expression and allelic variation in candidate flowering time loci inBrassica rapaplants derived from a natural population and showing a broad range in the timing of first flowering. The loci of interest were orthologs of the Arabidopsis genesFLCandSOC1(BrFLCandBrSOC1, respectively), which in Arabidopsis play a central role in the flowering time regulatory network, withFLCrepressing andSOC1promoting flowering. InB. rapa, there are four copies ofFLCand three ofSOC1. Plants were grown in controlled conditions in the lab. Comparisons were made between plants that flowered the earliest and latest, with the difference in average flowering time between these groups ∼30 days. As expected, we found that total expression ofBrSOC1paralogs was significantly greater in early than in late flowering plants. Paralog-specific primers showed that expression was greater in early flowering plants in theBrSOC1paralogsBr004928, Br00393andBr009324, although the difference was not significant inBr009324. Thus expression of at least 2 of the 3BrSOC1orthologs is consistent with their predicted role in flowering time in this natural population. Sequences of the promoter regions of theBrSOC1orthologs were variable, but there was no association between allelic variation at these loci and flowering time variation. For theBrFLCorthologs, expression varied over time, but did not differ between the early and late flowering plants. The coding regions, promoter regions and introns of these genes were generally invariant. Thus theBrFLCorthologs do not appear to influence flowering time in this population. Overall, the results suggest that even for a trait like flowering time that is controlled by a very well described genetic regulatory network, understanding the underlying genetic basis of natural variation in such a quantitative trait is challenging.

     
    more » « less
  2. Abstract

    Accelerating biomass improvement is a major goal ofMiscanthusbreeding. The development and implementation of genomic‐enabled breeding tools, like marker‐assisted selection (MAS) and genomic selection, has the potential to improve the efficiency ofMiscanthusbreeding. The present study conducted genome‐wide association (GWA) and genomic prediction of biomass yield and 14 yield‐components traits inMiscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions ofM. sacchariflorusgrown across 4 years in one subtropical and three temperate locations and genotyped with 268,109 single‐nucleotide polymorphisms (SNPs). The GWA study identified a total of 835 significant SNPs and 674 candidate genes across all traits and locations. Of the significant SNPs identified, 280 were localized in mapped quantitative trait loci intervals and proximal to SNPs identified for similar traits in previously reportedMiscanthusstudies, providing additional support for the importance of these genomic regions for biomass yield. Our study gave insights into the genetic basis for yield‐component traits inM. sacchariflorusthat may facilitate marker‐assisted breeding for biomass yield. Genomic prediction accuracy for the yield‐related traits ranged from 0.15 to 0.52 across all locations and genetic groups. Prediction accuracies within the six genetic groupings ofM. saccharifloruswere limited due to low sample sizes. Nevertheless, the Korea/NE China/Russia (N = 237) genetic group had the highest prediction accuracy of all genetic groups (ranging 0.26–0.71), suggesting that with adequate sample sizes, there is strong potential for genomic selection within the genetic groupings ofM. sacchariflorus. This study indicated that MAS and genomic prediction will likely be beneficial for conducting population‐improvement ofM. sacchariflorus.

     
    more » « less
  3. Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding. 
    more » « less
  4. INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability. 
    more » « less
  5. null (Ed.)
    Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana . Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes. 
    more » « less