skip to main content

Title: Predicting transcriptional responses to cold stress across plant species
Although genome-sequence assemblies are available for a growing number of plant species, gene-expression responses to stimuli have been cataloged for only a subset of these species. Many genes show altered transcription patterns in response to abiotic stresses. However, orthologous genes in related species often exhibit different responses to a given stress. Accordingly, data on the regulation of gene expression in one species are not reliable predictors of orthologous gene responses in a related species. Here, we trained a supervised classification model to identify genes that transcriptionally respond to cold stress. A model trained with only features calculated directly from genome assemblies exhibited only modest decreases in performance relative to models trained by using genomic, chromatin, and evolution/diversity features. Models trained with data from one species successfully predicted which genes would respond to cold stress in other related species. Cross-species predictions remained accurate when training was performed in cold-sensitive species and predictions were performed in cold-tolerant species and vice versa. Models trained with data on gene expression in multiple species provided at least equivalent performance to models trained and tested in a single species and outperformed single-species models in cross-species prediction. These results suggest that classifiers trained on stress data from more » well-studied species may suffice for predicting gene-expression patterns in related, less-studied species with sequenced genomes. « less
Authors:
; ; ; ; ; ; ;
Award ID(s):
1845175
Publication Date:
NSF-PAR ID:
10320744
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
118
Issue:
10
ISSN:
0027-8424
Sponsoring Org:
National Science Foundation
More Like this
  1. Transcription factors (TFs) play a central role in regulating molecular level responses of plants to external stresses such as water limiting conditions, but identification of such TFs in the genome remains a challenge. Here, we describe a network-based supervised machine learning framework that accurately predicts and ranks all TFs in the genome according to their potential association with drought tolerance. We show that top ranked regulators fall mainly into two ‘age’ groups; genes that appeared first in land plants and genes that emerged later in the Oryza clade. TFs predicted to be high in the ranking belong to specific genemore »families, have relatively simple intron/exon and protein structures, and functionally converge to regulate primary and secondary metabolism pathways. Repeated trials of nested cross-validation tests showed that models trained only on regulatory network patterns, inferred from large transcriptome datasets, outperform models trained on heterogenous genomic features in the prediction of known drought response regulators. A new R/Shiny based web application, called the DroughtApp, provides a primer for generation of new testable hypotheses related to regulation of drought stress response. Furthermore, to test the system we experimentally validated predictions on the functional role of the rice transcription factor OsbHLH148, using RNA sequencing of knockout mutants in response to drought stress and protein-DNA interaction assays. Our study exemplifies the integration of domain knowledge for prioritization of regulatory genes in biological pathways of well-studied agricultural traits.« less
  2. Abstract Changes in gene expression are important for responses to abiotic stress. Transcriptome profiling of heat- or cold-stressed maize genotypes identifies many changes in transcript abundance. We used comparisons of expression responses in multiple genotypes to identify alleles with variable responses to heat or cold stress and to distinguish examples of cis- or trans-regulatory variation for stress-responsive expression changes. We used motifs enriched near the transcription start sites (TSSs) for thermal stress-responsive genes to develop predictive models of gene expression responses. Prediction accuracies can be improved by focusing only on motifs within unmethylated regions near the TSS and vary formore »genes with different dynamic responses to stress. Models trained on expression responses in a single genotype and promoter sequences provided lower performance when applied to other genotypes but this could be improved by using models trained on data from all three genotypes tested. The analysis of genes with cis-regulatory variation provides evidence for structural variants that result in presence/absence of transcription factor binding sites in creating variable responses. This study provides insights into cis-regulatory motifs for heat- and cold-responsive gene expression and defines a framework for developing models to predict expression responses across multiple genotypes.« less
  3. Like animals, plants have internal biological clocks that allow them to adapt to daily and yearly changes, such as day-night cycles or seasons turning. Unlike animals, however, plants cannot move when their environment becomes different, so they need to be able to weather these changes by adjusting which genes they switch on and off. To do this, plants keep track of how long days are using external cues such as light or temperature. One of the effects of climate change is that these cues become less reliable, making it harder for plants to adapt to their environment and survive. Thismore »is a potential problem for crop species, like Brassica rapa . This plant has many edible forms, including Chinese cabbage, oilseed, pak choi, and turnip. It is also a close relative of the well-studied model plant, Arabidopsis . Since evolving away from Arabidopsis , the genome of B. rapa tripled, meaning it has one, two, or three copies of each gene. This has allowed the extra gene copies to mutate and adapt to different purposes. The question is, what impact has this genome expansion had on the plant's biological clock? One way to find out is to perform RNA-sequencing experiments, which record the genes a plant is using at any one time. Here, Greenham, Sartor et al. report the results of a series of RNA-sequencing experiments performed every two hours across two days. Plants were first exposed to light-dark or temperature cycles and then samples were taken when the plants were in constant light and temperature. This revealed which genes B. rapa turned on and off in response to signals from the internal biological clock. It turns out that the biological clock of B. rapa controls close to three quarters of its genes. These genes showed distinct phases, increasing or decreasing in regular patterns. But the different copies of duplicated and triplicated genes did not necessarily all behave in the same way. Many of the copies had different rhythms, and some increased and decreased in patterns totally opposite to their counterparts. Not only did the daily patterns differ, but responses to stressors like drought were also altered. Comparing these patterns to the patterns seen in Arabidopsis revealed that often, one B. rapa gene behaved just like its Arabidopsis equivalent, while its copies had evolved new behaviors. The different behaviors of the copies of each gene in B. rapa relative to its biological clock allow this plant to grow in different environments with varying temperatures and day lengths. Understanding how these adaptations work opens new avenues of research into how plants detect and respond to environmental signals. This could help to guide future work into targeting genes to improve crop growth and stress resilience.« less
  4. Wild cotton species can contribute to a valuable gene pool for genetic improvement, such as genes related to salt tolerance. However, reproductive isolation of different species poses an obstacle to produce hybrids through conventional breeding. Protoplast fusion technology for somatic cell hybridization provides an opportunity for genetic manipulation and targeting of agronomic traits. Transcriptome sequencing analysis of callus under salt stress is conducive to study salt tolerance genes. In this study, calli were induced to provide materials for extracting protoplasts and also for screening salt tolerance genes. Calli were successfully induced from leaves of Gossypium sturtianum (C 1 genome) andmore »hypocotyls of G. raimondii (D 5 genome), and embryogenic calli of G. sturtianum and G. raimondii were induced on a differentiation medium with different concentrations of 2, 4-D, KT, and IBA, respectively. In addition, embryogenic calli were also induced successfully from G. raimondii through suspension cultivation. Transcriptome sequencing analysis was performed on the calli of G. raimondii and G. sturtianum , which were treated with 200 mM NaCl at 0, 6, 12, 24, and 48 h, and a total of 12,524 genes were detected with different expression patterns under salt stress. Functional analysis showed that 3,482 genes, which were differentially expressed in calli of G. raimondii and G. sturtianum , were associated with biological processes of nucleic acid binding, plant hormone (such as ABA) biosynthesis, and signal transduction. We demonstrated that DEGs or TFs which related to ABA metabolism were involved in the response to salt stress, including xanthoxin dehydrogenase genes ( ABA2 ), sucrose non-fermenting 1-related protein kinases ( SnRK2 ), NAM, ATAT1 / 2 , and CUC2 transcription factors ( NAC ), and WRKY class of zinc-finger proteins ( WRKY ). This research has successfully induced calli from two diploid cotton species and revealed new genes responding to salt stress in callus tissue, which will lay the foundation for protoplast fusion for further understanding of salt stress responses in cotton.« less
  5. Abstract Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response usingmore »putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.« less