skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A comparative approach for selecting orthologous candidate genes underlying signal in genome-wide association studies across multiple species
Advances in quantitative genetics have enabled researchers to identify genomic regions associated with changes in phenotype. However, genomic regions can contain hundreds to thousands of genes, and progressing from genomic regions to candidate genes is still challenging. In genome-wide association studies (GWAS) measuring elemental accumulation (ionomic) traits, a mere 5% of loci are associated with a known ionomic gene - indicating that many causal genes are still unknown. To select candidates for the remaining 95% of loci, we developed a method to identify conserved genes underlying GWAS loci in multiple species. For 19 ionomic traits, we identified 14,336 candidates across Arabidopsis, soybean, rice, maize, and sorghum. We calculated the likelihood of candidates with random permutations of the data and determined that most of the top 10% of candidates were orthologous genes linked to GWAS loci across all five species. The candidate list also includes orthologous genes with previously established ionomic functions in Arabidopsis and rice. Our methods highlight the conserved nature of ionomic genetic regulators and enable the identification of previously unknown ionomic genes.  more » « less
Award ID(s):
2309932
PAR ID:
10512498
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Journal Name:
bioRxiv
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Genome-wide association studies (GWAS) can identify genetic variants responsible for naturally occurring and quantitative phenotypic variation. Association studies therefore provide a powerful complement to approaches that rely on de novo mutations for characterizing gene function. Although bacteria should be amenable to GWAS, few GWAS have been conducted on bacteria, and the extent to which nonindependence among genomic variants (e.g., linkage disequilibrium [LD]) and the genetic architecture of phenotypic traits will affect GWAS performance is unclear. We apply association analyses to identify candidate genes underlying variation in 20 biochemical, growth, and symbiotic phenotypes among 153 strains of Ensifer meliloti . For 11 traits, we find genotype-phenotype associations that are stronger than expected by chance, with the candidates in relatively small linkage groups, indicating that LD does not preclude resolving association candidates to relatively small genomic regions. The significant candidates show an enrichment for nucleotide polymorphisms (SNPs) over gene presence-absence variation (PAV), and for five traits, candidates are enriched in large linkage groups, a possible signature of epistasis. Many of the variants most strongly associated with symbiosis phenotypes were in genes previously identified as being involved in nitrogen fixation or nodulation. For other traits, apparently strong associations were not stronger than the range of associations detected in permuted data. In sum, our data show that GWAS in bacteria may be a powerful tool for characterizing genetic architecture and identifying genes responsible for phenotypic variation. However, careful evaluation of candidates is necessary to avoid false signals of association. IMPORTANCE Genome-wide association analyses are a powerful approach for identifying gene function. These analyses are becoming commonplace in studies of humans, domesticated animals, and crop plants but have rarely been conducted in bacteria. We applied association analyses to 20 traits measured in Ensifer meliloti , an agriculturally and ecologically important bacterium because it fixes nitrogen when in symbiosis with leguminous plants. We identified candidate alleles and gene presence-absence variants underlying variation in symbiosis traits, antibiotic resistance, and use of various carbon sources; some of these candidates are in genes previously known to affect these traits whereas others were in genes that have not been well characterized. Our results point to the potential power of association analyses in bacteria, but also to the need to carefully evaluate the potential for false associations. 
    more » « less
  2. null (Ed.)
    Inferring phenotypic outcomes from genomic features is both a promise and challenge for systems biology. Using gene expression data to predict phenotypic outcomes, and functionally validating the genes with predictive powers are two challenges we address in this study. We applied an evolutionarily informed machine learning approach to predict phenotypes based on transcriptome responses shared both within and across species. Specifically, we exploited the phenotypic diversity in nitrogen use efficiency and evolutionarily conserved transcriptome responses to nitrogen treatments across Arabidopsis accessions and maize varieties. We demonstrate that using evolutionarily conserved nitrogen responsive genes is a biologically principled approach to reduce the feature dimensionality in machine learning that ultimately improved the predictive power of our gene-to-trait models. Further, we functionally validated seven candidate transcription factors with predictive power for NUE outcomes in Arabidopsis and one in maize. Moreover, application of our evolutionarily informed pipeline to other species including rice and mice models underscores its potential to uncover genes affecting any physiological or clinical traits of interest across biology, agriculture, or medicine. 
    more » « less
  3. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks. 
    more » « less
  4. Abstract Theaus(Oryza sativaL.) varietal group comprises of aus, boro, ashina and rayada seasonal and/or field ecotypes, and exhibits unique stress tolerance traits, making it valuable for rice breeding. Despite its importance, the agro-morphological diversity and genetic control of yield traits inausrice remain poorly understood. To address this knowledge gap, we investigated the genetic structure of 181ausaccessions using 399,115 SNP markers and evaluated them for 11 morpho-agronomic traits. Through genome-wide association studies (GWAS), we aimed to identify key loci controlling yield and plant architectural traits. Our population genetic analysis unveiled six subpopulations with strong geographical patterns. Subpopulation-specific differences were observed in most phenotypic traits. Principal component analysis (PCA) of agronomic traits showed that principal component 1 (PC1) was primarily associated with panicle traits, plant height, and heading date, while PC2 and PC3 were linked to primary grain yield traits. GWAS using PC1 identifiedOsSAC1on Chromosome 7 as a significant gene influencing multiple agronomic traits. PC2-based GWAS highlighted the importance ofOsGLT1and OsPUP4/ Big Grain 3 in determining grain yield. Haplotype analysis of these genes in the 3,000 Rice Genome Panel revealed distinct genetic variations inausrice. In summary, this study offers valuable insights into the genetic structure and phenotypic diversity ofausrice accessions. We have identified significant loci associated with essential agronomic traits, withGLT1, PUP4, andSAC1genes emerging as key players in yield determination. 
    more » « less
  5. INTRODUCTION During the independent process of cereal evolution, many trait shifts appear to have been under convergent selection to meet the specific needs of humans. Identification of convergently selected genes across cereals could help to clarify the evolution of crop species and to accelerate breeding programs. In the past several decades, researchers have debated whether convergent phenotypic selection in distinct lineages is driven by conserved molecular changes or by diverse molecular pathways. Two of the most economically important crops, maize and rice, display some conserved phenotypic shifts—including loss of seed dispersal, decreased seed dormancy, and increased grain number during evolution—even though they experienced independent selection. Hence, maize and rice can serve as an excellent system for understanding the extent of convergent selection among cereals. RATIONALE Despite the identification of a few convergently selected genes, our understanding of the extent of molecular convergence on a genome-wide scale between maize and rice is very limited. To learn how often selection acts on orthologous genes, we investigated the functions and molecular evolution of the grain yield quantitative trait locus KRN2 in maize and its rice ortholog OsKRN2 . We also identified convergently selected genes on a genome-wide scale in maize and rice, using two large datasets. RESULTS We identified a selected gene, KRN2 ( kernel row number2 ), that differs between domesticated maize and its wild ancestor, teosinte. This gene underlies a major quantitative trait locus for kernel row number in maize. Selection in the noncoding upstream regions resulted in a reduction of KRN2 expression and an increased grain number through an increase in kernel rows. The rice ortholog, OsKRN2 , also underwent selection and negatively regulates grain number via control of secondary panicle branches. These orthologs encode WD40 proteins and function synergistically with a gene of unknown function, DUF1644, which suggests that a conserved protein interaction controls grain number in maize and rice. Field tests show that knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by ~10% and ~8%, respectively, with no apparent trade-off in other agronomic traits. This suggests potential applications of KRN2 and its orthologs for crop improvement. On a genome-wide scale, we identified a set of 490 orthologous genes that underwent convergent selection during maize and rice evolution, including KRN2/OsKRN2 . We found that the convergently selected orthologous genes appear to be significantly enriched in two specific pathways in both maize and rice: starch and sucrose metabolism, and biosynthesis of cofactors. A deep analysis of convergently selected genes in the starch metabolic pathway indicates that the degree of genetic convergence via convergent selection is related to the conservation and complexity of the gene network for a given selection. CONCLUSION Our findings show that common phenotypic shifts during maize and rice evolution acting on conserved genes are driven at least in part by convergent selection, which in maize and rice likely occurred both during and after domestication. We provide evolutionary and functional evidence on the convergent selection of KRN2/OsKRN2 for grain number between maize and rice. We further found that a complete loss-of-function allele of KRN2/OsKRN2 increased grain yield without an apparent negative impact on other agronomic traits. Exploring the role of KRN2/OsKRN2 and other convergently selected genes across the cereals could provide new opportunities to enhance the production of other global crops. Shared selected orthologous genes in maize and rice for convergent phenotypic shifts during domestication and improvement. By comparing 3163 selected genes in maize and 18,755 selected genes in rice, we identified 490 orthologous gene pairs, including KRN2 and its rice ortholog OsKRN2 , as having been convergently selected. Knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by increasing kernel rows and secondary panicle branches, respectively. 
    more » « less