skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 1, 2026

Title: Expression‐based machine learning models for predicting plant tissue identity
Abstract PremiseThe selection ofArabidopsisas a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. MethodsHere, we examine the ability of models based onArabidopsisgene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested onArabidopsisdata achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained onArabidopsisdata, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. ResultsThe identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance fromArabidopsis.k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. DiscussionOur data‐driven results highlight that the assertion that knowledge fromArabidopsisis translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis onArabidopsisand prioritize plant diversity.  more » « less
Award ID(s):
2310355
PAR ID:
10595899
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Applications in Plant Sciences
Volume:
13
Issue:
1
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mittelsten_Scheid, Ortrun (Ed.)
    Heterochromatin is critical for maintaining genome stability, especially in flowering plants, where it relies on a feedback loop involving the H3K9 methyltransferase, KRYPTONITE (KYP), and the DNA methyltransferase CHROMOMETHYLASE3 (CMT3). The H3K9 demethylase INCREASED IN BONSAI METHYLATION 1 (IBM1) counteracts the detrimental consequences of KYP-CMT3 activity in transcribed genes.IBM1expression inArabidopsisis uniquely regulated by methylation of the 7th intron, allowing it to monitor global H3K9me2 levels. We show the methylated intron is prevalent across flowering plants and its underlying sequence exhibits dynamic evolution. We also find extensive genetic and expression variations inKYP,CMT3, andIBM1across flowering plants. We identifyArabidopsisaccessions resembling weakibm1mutants and Brassicaceae species with reducedIBM1expression or deletions. Evolution towards reduced IBM1 activity in some flowering plants could explain the frequent natural occurrence of diminished or lost CMT3 activity and loss of gene body DNA methylation, ascmt3mutants inA.thalianamitigate the deleterious effects of IBM1. 
    more » « less
  2. Abstract BackgroundIn the past few years, there has been an explosion in single-cell transcriptomics datasets, yet in vivo confirmation of these datasets is hampered in plants due to lack of robust validation methods. Likewise, modeling of plant development is hampered by paucity of spatial gene expression data. RNA fluorescence in situ hybridization (FISH) enables investigation of gene expression in the context of tissue type. Despite development of FISH methods for plants, easy and reliable whole mount FISH protocols have not yet been reported. ResultsWe adapt a 3-day whole mount RNA-FISH method for plant species based on a combination of prior protocols that employs hybridization chain reaction (HCR), which amplifies the probe signal in an antibody-free manner. Our whole mount HCR RNA-FISH method shows expected spatial signals with low background for gene transcripts with known spatial expression patterns in Arabidopsis inflorescences and monocot roots. It allows simultaneous detection of three transcripts in 3D. We also show that HCR RNA-FISH can be combined with endogenous fluorescent protein detection and with our improved immunohistochemistry (IHC) protocol. ConclusionsThe whole mount HCR RNA-FISH and IHC methods allow easy investigation of 3D spatial gene expression patterns in entire plant tissues. 
    more » « less
  3. Abstract The scarcity of accessible sites that are dynamic or cell type-specific in plants may be due in part to tissue heterogeneity in bulk studies. To assess the effects of tissue heterogeneity, we apply single-cell ATAC-seq toArabidopsis thalianaroots and identify thousands of differentially accessible sites, sufficient to resolve all major cell types of the root. We find that the entirety of a cell’s regulatory landscape and its transcriptome independently capture cell type identity. We leverage this shared information on cell identity to integrate accessibility and transcriptome data to characterize developmental progression, endoreduplication and cell division. We further use the combined data to characterize cell type-specific motif enrichments of transcription factor families and link the expression of family members to changing accessibility at specific loci, resolving direct and indirect effects that shape expression. Our approach provides an analytical framework to infer the gene regulatory networks that execute plant development. 
    more » « less
  4. Dubrovsky, Joseph (Ed.)
    Abstract A fundamental question in developmental biology is how the progeny of stem cells become differentiated tissues. The Arabidopsis root is a tractable model to address this question due to its simple organization and defined cell lineages. In particular, the zone of dividing cells at the root tip—the root apical meristem—presents an opportunity to map the gene regulatory networks underlying stem cell niche maintenance, tissue patterning, and cell identity acquisition. To identify molecular regulators of these processes, studies over the last 20 years employed global profiling of gene expression patterns. However, these technologies are prone to information loss due to averaging gene expression signatures over multiple cell types and/or developmental stages. Recently developed high-throughput methods to profile gene expression at single-cell resolution have been successfully applied to plants. Here, we review insights from the first published single-cell mRNA sequencing and chromatin accessibility datasets generated from Arabidopsis roots. These studies successfully reconstruct developmental trajectories, phenotype cell identity mutants at unprecedented resolution, and reveal cell type-specific responses to environmental stimuli. The experimental insight gained from Arabidopsis paves the way to profile roots from additional species. 
    more » « less
  5. Abstract BackgroundRNA secondary structure (RSS) can influence the regulation of transcription, RNA processing, and protein synthesis, among other processes. 3′ untranslated regions (3′ UTRs) of mRNA also hold the key for many aspects of gene regulation. However, there are often contradictory results regarding the roles of RSS in 3′ UTRs in gene expression in different organisms and/or contexts. ResultsHere, we incidentally observe that the primary substrate of miR159a (pri-miR159a), when embedded in a 3′ UTR, could promote mRNA accumulation. The enhanced expression is attributed to the earlier polyadenylation of the transcript within the hybrid pri-miR159a-3′ UTR and, resultantly, a poorly structured 3′ UTR. RNA decay assays indicate that poorly structured 3′ UTRs could promote mRNA stability, whereas highly structured 3′ UTRs destabilize mRNA in vivo. Genome-wide DMS-MaPseq also reveals the prevailing inverse relationship between 3′ UTRs’ RSS and transcript accumulation in the transcriptomes ofArabidopsis, rice, and even human. Mechanistically, transcripts with highly structured 3′ UTRs are preferentially degraded by 3′–5′ exoribonuclease SOV and 5′–3′ exoribonuclease XRN4, leading to decreased expression inArabidopsis. Finally, we engineer different structured 3′ UTRs to an endogenousFTgene and alter theFT-regulated flowering time inArabidopsis. ConclusionsWe conclude that highly structured 3′ UTRs typically cause reduced accumulation of the harbored transcripts inArabidopsis. This pattern extends to rice and even mammals. Furthermore, our study provides a new strategy of engineering the 3′ UTRs’ RSS to modify plant traits in agricultural production and mRNA stability in biotechnology. 
    more » « less