skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputingcishaplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linkedcisacting causal variants in the immediate vicinity of the gene, while excludingtranseffects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.  more » « less
Award ID(s):
1822330
PAR ID:
10489600
Author(s) / Creator(s):
; ; ;
Editor(s):
Hake, Sarah
Publisher / Repository:
PLOS
Date Published:
Journal Name:
PLOS Genetics
Volume:
17
Issue:
10
ISSN:
1553-7404
Page Range / eLocation ID:
e1009568
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Qu, Li-Jia (Ed.)
    Pleiotropy—when a single gene controls two or more seemingly unrelated traits—has been shown to impact genes with effects on flowering time, leaf architecture, and inflorescence morphology in maize. However, the genome-wide impact of biological pleiotropy across all maize phenotypes is largely unknown. Here, we investigate the extent to which biological pleiotropy impacts phenotypes within maize using GWAS summary statistics reanalyzed from previously published metabolite, field, and expression phenotypes across the Nested Association Mapping population and Goodman Association Panel. Through phenotypic saturation of 120,597 traits, we obtain over 480 million significant quantitative trait nucleotides. We estimate that only 1.56–32.3% of intervals show some degree of pleiotropy. We then assess the relationship between pleiotropy and various biological features such as gene expression, chromatin accessibility, sequence conservation, and enrichment for gene ontology terms. We find very little relationship between pleiotropy and these variables when compared to permuted pleiotropy. We hypothesize that biological pleiotropy of common alleles is not widespread in maize and is highly impacted by nuisance terms such as population structure and linkage disequilibrium. Natural selection on large standing natural variation in maize populations may target wide and large effect variants, leaving the prevalence of detectable pleiotropy relatively low. 
    more » « less
  2. Betancourt, Andrea (Ed.)
    Abstract Evolutionary processes driving physiological trait variation depend on the underlying genomic mechanisms. Evolution of these mechanisms depends on the genetic complexity (involving many genes) and how gene expression impacting the traits is converted to phenotype. Yet, genomic mechanisms that impact physiological traits are diverse and context dependent (e.g., vary by environment and tissues), making them difficult to discern. We examine the relationships between genotype, mRNA expression, and physiological traits to discern the genetic complexity and whether the gene expression affecting the physiological traits is primarily cis- or trans-acting. We use low-coverage whole genome sequencing and heart- or brain-specific mRNA expression to identify polymorphisms directly associated with physiological traits and expressed quantitative trait loci (eQTL) indirectly associated with variation in six temperature specific physiological traits (standard metabolic rate, thermal tolerance, and four substrate specific cardiac metabolic rates). Focusing on a select set of mRNAs belonging to co-expression modules that explain up to 82% of temperature specific traits, we identified hundreds of significant eQTL for mRNA whose expression affects physiological traits. Surprisingly, most eQTL (97.4% for heart and 96.7% for brain) were trans-acting. This could be due to higher effect size of trans- versus cis-acting eQTL for mRNAs that are central to co-expression modules. That is, we may have enhanced the identification of trans-acting factors by looking for single nucleotide polymorphisms associated with mRNAs in co-expression modules that broadly influence gene expression patterns. Overall, these data indicate that the genomic mechanism driving physiological variation across environments is driven by trans-acting heart- or brain-specific mRNA expression. 
    more » « less
  3. Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene–variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual’s genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA . 
    more » « less
  4. Abstract Developments in genomics and phenomics have provided valuable tools for use in cultivar development. Genomic prediction (GP) has been used in commercial soybean [Glycine maxL. (Merr.)] breeding programs to predict grain yield and seed composition traits. Phenomic prediction (PP) is a rapidly developing field that holds the potential to be used for the selection of genotypes early in the growing season. The objectives of this study were to compare the performance of GP and PP for predicting soybean seed yield, protein, and oil. We additionally conducted genome‐wide association studies (GWAS) to identify significant single‐nucleotide polymorphisms (SNPs) associated with the traits of interest. The GWAS panel of 292 diverse accessions was grown in six environments in replicated trials. Spectral data were collected at two time points during the growing season. A genomic best linear unbiased prediction (GBLUP) model was trained on 269 accessions, while three separate machine learning (ML) models were trained on vegetation indices (VIs) and canopy traits. We observed that PP had a higher correlation coefficient than GP for seed yield, while GP had higher correlation coefficients for seed protein and oil contents. VIs with high feature importance were used as covariates in a new GBLUP model, and a new random forest model was trained with the inclusion of selected SNPs. These models did not outperform the original GP and PP models. These results show the capability of using ML for in‐season predictions for specific traits in soybean breeding and provide insights on PP and GP inclusions in breeding programs. 
    more » « less
  5. Abstract BackgroundMany plant species exhibit genetic variation for coping with environmental stress. However, there are still limited approaches to effectively uncover the genomic region that regulates distinct responsive patterns of the gene across multiple varieties within the same species under abiotic stress. ResultsBy analyzing the transcriptomes of more than 100 maize inbreds, we reveal manycis- andtrans-acting eQTLs that influence the expression response to heat stress. Thecis-acting eQTLs in response to heat stress are identified in genes with differential responses to heat stress between genotypes as well as genes that are only expressed under heat stress. Thecis-acting variants for heat stress-responsive expression likely result from distinct promoter activities, and the differential heat responses of the alleles are confirmed for selected genes using transient expression assays. Global footprinting of transcription factor binding is performed in control and heat stress conditions to document regions with heat-enriched transcription factor binding occupancies. ConclusionsFootprints enriched near proximal regions of characterized heat-responsive genes in a large association panel can be utilized for prioritizing functional genomic regions that regulate genotype-specific responses under heat stress. 
    more » « less