Understanding the genetic basis of natural phenotypic variation is of great importance, particularly since selection can act on this variation to cause evolution. We examined expression and allelic variation in candidate flowering time loci in
- Award ID(s):
- 1656389
- NSF-PAR ID:
- 10171233
- Date Published:
- Journal Name:
- Genetics
- Volume:
- 214
- Issue:
- 4
- ISSN:
- 0016-6731
- Page Range / eLocation ID:
- 1059 to 1078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Brassica rapa plants derived from a natural population and showing a broad range in the timing of first flowering. The loci of interest were orthologs of the Arabidopsis genesFLC andSOC1 (BrFLC andBrSOC1 , respectively), which in Arabidopsis play a central role in the flowering time regulatory network, withFLC repressing andSOC1 promoting flowering. InB. rapa , there are four copies ofFLC and three ofSOC1 . Plants were grown in controlled conditions in the lab. Comparisons were made between plants that flowered the earliest and latest, with the difference in average flowering time between these groups ∼30 days. As expected, we found that total expression ofBrSOC1 paralogs was significantly greater in early than in late flowering plants. Paralog-specific primers showed that expression was greater in early flowering plants in theBrSOC1 paralogsBr004928, Br00393 andBr009324 , although the difference was not significant inBr009324 . Thus expression of at least 2 of the 3BrSOC1 orthologs is consistent with their predicted role in flowering time in this natural population. Sequences of the promoter regions of theBrSOC1 orthologs were variable, but there was no association between allelic variation at these loci and flowering time variation. For theBrFLC orthologs, expression varied over time, but did not differ between the early and late flowering plants. The coding regions, promoter regions and introns of these genes were generally invariant. Thus theBrFLC orthologs do not appear to influence flowering time in this population. Overall, the results suggest that even for a trait like flowering time that is controlled by a very well described genetic regulatory network, understanding the underlying genetic basis of natural variation in such a quantitative trait is challenging. -
Abstract Accelerating biomass improvement is a major goal of
Miscanthus breeding. The development and implementation of genomic‐enabled breeding tools, like marker‐assisted selection (MAS) and genomic selection, has the potential to improve the efficiency ofMiscanthus breeding. The present study conducted genome‐wide association (GWA) and genomic prediction of biomass yield and 14 yield‐components traits inMiscanthus sacchariflorus . We evaluated a diversity panel with 590 accessions ofM. sacchariflorus grown across 4 years in one subtropical and three temperate locations and genotyped with 268,109 single‐nucleotide polymorphisms (SNPs). The GWA study identified a total of 835 significant SNPs and 674 candidate genes across all traits and locations. Of the significant SNPs identified, 280 were localized in mapped quantitative trait loci intervals and proximal to SNPs identified for similar traits in previously reportedMiscanthus studies, providing additional support for the importance of these genomic regions for biomass yield. Our study gave insights into the genetic basis for yield‐component traits inM. sacchariflorus that may facilitate marker‐assisted breeding for biomass yield. Genomic prediction accuracy for the yield‐related traits ranged from 0.15 to 0.52 across all locations and genetic groups. Prediction accuracies within the six genetic groupings ofM. sacchariflorus were limited due to low sample sizes. Nevertheless, the Korea/NE China/Russia (N = 237) genetic group had the highest prediction accuracy of all genetic groups (ranging 0.26–0.71), suggesting that with adequate sample sizes, there is strong potential for genomic selection within the genetic groupings ofM. sacchariflorus . This study indicated that MAS and genomic prediction will likely be beneficial for conducting population‐improvement ofM. sacchariflorus . -
Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.more » « less
-
INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency < 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P < 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P < 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P < 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability.more » « less
-
null (Ed.)Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana . Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes.more » « less