This study investigated the generalizability of Arabidopsis thaliana immune responses across diverse pathogens, including Botrytis cinerea, Sclerotinia sclerotiorum, and Pseudomonas syringae, using a data-driven, machine learning approach. Machine learning models were trained to predict disease development from early transcriptional responses. Feature selection techniques based on network science and topology were used to train models employing only a fraction of the transcriptome. Machine learning models trained on one pathosystem where then validated by predicting disease development in new pathosystems. The identified feature selection gene sets were enriched for pathways related to biotic, abiotic, and stress responses, though the specific genes involved differed between feature sets. This suggests common immune responses to diverse pathogens that operate via different gene sets.The study demonstrates that machine learning can uncover both established and novel components of the plant's immune response, offering insights into disease resistance mechanisms. These predictive models highlight the potential to advance our understanding of multigenic outcomes in plant immunity and can be further refined for applications in disease prediction.
more »
« less
Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships
Inferring phenotypic outcomes from genomic features is both a promise and challenge for systems biology. Using gene expression data to predict phenotypic outcomes, and functionally validating the genes with predictive powers are two challenges we address in this study. We applied an evolutionarily informed machine learning approach to predict phenotypes based on transcriptome responses shared both within and across species. Specifically, we exploited the phenotypic diversity in nitrogen use efficiency and evolutionarily conserved transcriptome responses to nitrogen treatments across Arabidopsis accessions and maize varieties. We demonstrate that using evolutionarily conserved nitrogen responsive genes is a biologically principled approach to reduce the feature dimensionality in machine learning that ultimately improved the predictive power of our gene-to-trait models. Further, we functionally validated seven candidate transcription factors with predictive power for NUE outcomes in Arabidopsis and one in maize. Moreover, application of our evolutionarily informed pipeline to other species including rice and mice models underscores its potential to uncover genes affecting any physiological or clinical traits of interest across biology, agriculture, or medicine.
more »
« less
- Award ID(s):
- 1812235
- PAR ID:
- 10302299
- Date Published:
- Journal Name:
- Nature communications
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract A plant's response to external and internal nitrogen signals/status relies on sensing and signaling mechanisms that operate across spatial and temporal dimensions. From a comprehensive systems biology perspective, this involves integrating nitrogen responses in different cell types and over long distances to ensure organ coordination in real time and yield practical applications. In this prospective review, we focus on novel aspects of nitrogen (N) sensing/signaling uncovered using temporal and spatial systems biology approaches, largely in the model Arabidopsis. The temporal aspects span: transcriptional responses to N-dose mediated by Michaelis-Menten kinetics, the role of the master NLP7 transcription factor as a nitrate sensor, its nitrate-dependent TF nuclear retention, its “hit-and-run” mode of target gene regulation, and temporal transcriptional cascade identified by “network walking.” Spatial aspects of N-sensing/signaling have been uncovered in cell type-specific studies in roots and in root-to-shoot communication. We explore new approaches using single-cell sequencing data, trajectory inference, and pseudotime analysis as well as machine learning and artificial intelligence approaches. Finally, unveiling the mechanisms underlying the spatial dynamics of nitrogen sensing/signaling networks across species from model to crop could pave the way for translational studies to improve nitrogen-use efficiency in crops. Such outcomes could potentially reduce the detrimental effects of excessive fertilizer usage on groundwater pollution and greenhouse gas emissions.more » « less
-
Abstract Changes in gene expression are important for responses to abiotic stress. Transcriptome profiling of heat- or cold-stressed maize genotypes identifies many changes in transcript abundance. We used comparisons of expression responses in multiple genotypes to identify alleles with variable responses to heat or cold stress and to distinguish examples of cis- or trans-regulatory variation for stress-responsive expression changes. We used motifs enriched near the transcription start sites (TSSs) for thermal stress-responsive genes to develop predictive models of gene expression responses. Prediction accuracies can be improved by focusing only on motifs within unmethylated regions near the TSS and vary for genes with different dynamic responses to stress. Models trained on expression responses in a single genotype and promoter sequences provided lower performance when applied to other genotypes but this could be improved by using models trained on data from all three genotypes tested. The analysis of genes with cis-regulatory variation provides evidence for structural variants that result in presence/absence of transcription factor binding sites in creating variable responses. This study provides insights into cis-regulatory motifs for heat- and cold-responsive gene expression and defines a framework for developing models to predict expression responses across multiple genotypes.more » « less
-
Advances in quantitative genetics have enabled researchers to identify genomic regions associated with changes in phenotype. However, genomic regions can contain hundreds to thousands of genes, and progressing from genomic regions to candidate genes is still challenging. In genome-wide association studies (GWAS) measuring elemental accumulation (ionomic) traits, a mere 5% of loci are associated with a known ionomic gene - indicating that many causal genes are still unknown. To select candidates for the remaining 95% of loci, we developed a method to identify conserved genes underlying GWAS loci in multiple species. For 19 ionomic traits, we identified 14,336 candidates across Arabidopsis, soybean, rice, maize, and sorghum. We calculated the likelihood of candidates with random permutations of the data and determined that most of the top 10% of candidates were orthologous genes linked to GWAS loci across all five species. The candidate list also includes orthologous genes with previously established ionomic functions in Arabidopsis and rice. Our methods highlight the conserved nature of ionomic genetic regulators and enable the identification of previously unknown ionomic genes.more » « less
-
INTRODUCTION During the independent process of cereal evolution, many trait shifts appear to have been under convergent selection to meet the specific needs of humans. Identification of convergently selected genes across cereals could help to clarify the evolution of crop species and to accelerate breeding programs. In the past several decades, researchers have debated whether convergent phenotypic selection in distinct lineages is driven by conserved molecular changes or by diverse molecular pathways. Two of the most economically important crops, maize and rice, display some conserved phenotypic shifts—including loss of seed dispersal, decreased seed dormancy, and increased grain number during evolution—even though they experienced independent selection. Hence, maize and rice can serve as an excellent system for understanding the extent of convergent selection among cereals. RATIONALE Despite the identification of a few convergently selected genes, our understanding of the extent of molecular convergence on a genome-wide scale between maize and rice is very limited. To learn how often selection acts on orthologous genes, we investigated the functions and molecular evolution of the grain yield quantitative trait locus KRN2 in maize and its rice ortholog OsKRN2 . We also identified convergently selected genes on a genome-wide scale in maize and rice, using two large datasets. RESULTS We identified a selected gene, KRN2 ( kernel row number2 ), that differs between domesticated maize and its wild ancestor, teosinte. This gene underlies a major quantitative trait locus for kernel row number in maize. Selection in the noncoding upstream regions resulted in a reduction of KRN2 expression and an increased grain number through an increase in kernel rows. The rice ortholog, OsKRN2 , also underwent selection and negatively regulates grain number via control of secondary panicle branches. These orthologs encode WD40 proteins and function synergistically with a gene of unknown function, DUF1644, which suggests that a conserved protein interaction controls grain number in maize and rice. Field tests show that knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by ~10% and ~8%, respectively, with no apparent trade-off in other agronomic traits. This suggests potential applications of KRN2 and its orthologs for crop improvement. On a genome-wide scale, we identified a set of 490 orthologous genes that underwent convergent selection during maize and rice evolution, including KRN2/OsKRN2 . We found that the convergently selected orthologous genes appear to be significantly enriched in two specific pathways in both maize and rice: starch and sucrose metabolism, and biosynthesis of cofactors. A deep analysis of convergently selected genes in the starch metabolic pathway indicates that the degree of genetic convergence via convergent selection is related to the conservation and complexity of the gene network for a given selection. CONCLUSION Our findings show that common phenotypic shifts during maize and rice evolution acting on conserved genes are driven at least in part by convergent selection, which in maize and rice likely occurred both during and after domestication. We provide evolutionary and functional evidence on the convergent selection of KRN2/OsKRN2 for grain number between maize and rice. We further found that a complete loss-of-function allele of KRN2/OsKRN2 increased grain yield without an apparent negative impact on other agronomic traits. Exploring the role of KRN2/OsKRN2 and other convergently selected genes across the cereals could provide new opportunities to enhance the production of other global crops. Shared selected orthologous genes in maize and rice for convergent phenotypic shifts during domestication and improvement. By comparing 3163 selected genes in maize and 18,755 selected genes in rice, we identified 490 orthologous gene pairs, including KRN2 and its rice ortholog OsKRN2 , as having been convergently selected. Knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by increasing kernel rows and secondary panicle branches, respectively.more » « less
An official website of the United States government

