skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Expression‐based machine learning models for predicting plant tissue identity
Abstract PremiseThe selection ofArabidopsisas a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. MethodsHere, we examine the ability of models based onArabidopsisgene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested onArabidopsisdata achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained onArabidopsisdata, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. ResultsThe identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance fromArabidopsis.k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. DiscussionOur data‐driven results highlight that the assertion that knowledge fromArabidopsisis translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis onArabidopsisand prioritize plant diversity.  more » « less
Award ID(s):
2310355
PAR ID:
10595899
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
13
Issue:
1
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mittelsten_Scheid, Ortrun (Ed.)
    Heterochromatin is critical for maintaining genome stability, especially in flowering plants, where it relies on a feedback loop involving the H3K9 methyltransferase, KRYPTONITE (KYP), and the DNA methyltransferase CHROMOMETHYLASE3 (CMT3). The H3K9 demethylase INCREASED IN BONSAI METHYLATION 1 (IBM1) counteracts the detrimental consequences of KYP-CMT3 activity in transcribed genes.IBM1expression inArabidopsisis uniquely regulated by methylation of the 7th intron, allowing it to monitor global H3K9me2 levels. We show the methylated intron is prevalent across flowering plants and its underlying sequence exhibits dynamic evolution. We also find extensive genetic and expression variations inKYP,CMT3, andIBM1across flowering plants. We identifyArabidopsisaccessions resembling weakibm1mutants and Brassicaceae species with reducedIBM1expression or deletions. Evolution towards reduced IBM1 activity in some flowering plants could explain the frequent natural occurrence of diminished or lost CMT3 activity and loss of gene body DNA methylation, ascmt3mutants inA.thalianamitigate the deleterious effects of IBM1. 
    more » « less
  2. Abstract BackgroundRNA secondary structure (RSS) can influence the regulation of transcription, RNA processing, and protein synthesis, among other processes. 3′ untranslated regions (3′ UTRs) of mRNA also hold the key for many aspects of gene regulation. However, there are often contradictory results regarding the roles of RSS in 3′ UTRs in gene expression in different organisms and/or contexts. ResultsHere, we incidentally observe that the primary substrate of miR159a (pri-miR159a), when embedded in a 3′ UTR, could promote mRNA accumulation. The enhanced expression is attributed to the earlier polyadenylation of the transcript within the hybrid pri-miR159a-3′ UTR and, resultantly, a poorly structured 3′ UTR. RNA decay assays indicate that poorly structured 3′ UTRs could promote mRNA stability, whereas highly structured 3′ UTRs destabilize mRNA in vivo. Genome-wide DMS-MaPseq also reveals the prevailing inverse relationship between 3′ UTRs’ RSS and transcript accumulation in the transcriptomes ofArabidopsis, rice, and even human. Mechanistically, transcripts with highly structured 3′ UTRs are preferentially degraded by 3′–5′ exoribonuclease SOV and 5′–3′ exoribonuclease XRN4, leading to decreased expression inArabidopsis. Finally, we engineer different structured 3′ UTRs to an endogenousFTgene and alter theFT-regulated flowering time inArabidopsis. ConclusionsWe conclude that highly structured 3′ UTRs typically cause reduced accumulation of the harbored transcripts inArabidopsis. This pattern extends to rice and even mammals. Furthermore, our study provides a new strategy of engineering the 3′ UTRs’ RSS to modify plant traits in agricultural production and mRNA stability in biotechnology. 
    more » « less
  3. Dubrovsky, Joseph (Ed.)
    Abstract A fundamental question in developmental biology is how the progeny of stem cells become differentiated tissues. The Arabidopsis root is a tractable model to address this question due to its simple organization and defined cell lineages. In particular, the zone of dividing cells at the root tip—the root apical meristem—presents an opportunity to map the gene regulatory networks underlying stem cell niche maintenance, tissue patterning, and cell identity acquisition. To identify molecular regulators of these processes, studies over the last 20 years employed global profiling of gene expression patterns. However, these technologies are prone to information loss due to averaging gene expression signatures over multiple cell types and/or developmental stages. Recently developed high-throughput methods to profile gene expression at single-cell resolution have been successfully applied to plants. Here, we review insights from the first published single-cell mRNA sequencing and chromatin accessibility datasets generated from Arabidopsis roots. These studies successfully reconstruct developmental trajectories, phenotype cell identity mutants at unprecedented resolution, and reveal cell type-specific responses to environmental stimuli. The experimental insight gained from Arabidopsis paves the way to profile roots from additional species. 
    more » « less
  4. Abstract BackgroundVirus infection and herbivory can alter the expression of stress-responsive genes in plants. This study employed high-throughput transcriptomic and alternative splicing analysis to understand the separate and combined impacts on host gene expression inArabidopsis thalianabyMyzus persicae(green peach aphid), and turnip mosaic virus (TuMV). ResultsBy investigating changes in transcript abundance, the data shows that aphids feeding on virus infected plants intensify the number of differentially expressed stress responsive genes compared to challenge by individual stressors. This study presents evidence that the combination of virus-vector-host interactions induces significant changes in hormone and secondary metabolite biosynthesis, as well as downstream factors involved in feedback loops within hormone signaling pathways. This study also shows that gene expressions are regulated through alternative pre-mRNA splicing and the use of alternative transcription start and termination sites. ConclusionsThese combined data suggest that complex genetic changes occur as plants adapt to the combined challenges posed by aphids and the viruses they vector. This study also provides more advanced analyses that could be used in the future to dissect the genetic mechanisms mediating tripartite interactions and inform future breeding programs. 
    more » « less
  5. Summary Seasonal changes in spring induce flowering by expressing the florigen, FLOWERING LOCUS T (FT), inArabidopsis.FTis expressed in unique phloem companion cells with unknown characteristics. The question of which genes are co-expressed withFTand whether they have roles in flowering remains elusive. Through tissue-specific translatome analysis, we discovered that under long-day conditions with the natural sunlight red/far-red ratio, theFT-producing cells express a gene encoding FPF1-LIKE PROTEIN 1 (FLP1). The masterFTregulator, CONSTANS (CO), controlsFLP1expression, suggestingFLP1’s involvement in the photoperiod pathway. FLP1 promotes early flowering independently ofFT,is active in the shoot apical meristem, and induces the expression ofSEPALLATA 3(SEP3), a key E-class homeotic gene. Unlike FT, FLP1 facilitates inflorescence stem elongation. Our cumulative evidence indicates that FLP1 may act as a mobile signal. Thus, FLP1 orchestrates floral initiation together with FT and promotes inflorescence stem elongation during reproductive transitions. 
    more » « less