skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A normalization method that controls for total RNA abundance affects the identification of differentially expressed genes, revealing bias toward morning‐expressed responses
SUMMARY RNA‐Sequencing is widely used to investigate changes in gene expression at the transcription level in plants. Most plant RNA‐Seq analysis pipelines base the normalization approaches on the assumption that total transcript levels do not vary between samples. However, this assumption has not been demonstrated. In fact, many common experimental treatments and genetic alterations affect transcription efficiency or RNA stability, resulting in unequal transcript abundance. The addition of synthetic RNA controls is a simple correction that controls for variation in total mRNA levels. However, adding spike‐ins appropriately is challenging with complex plant tissue, and carefully considering how they are added is essential to their successful use. We demonstrate that adding external RNA spike‐ins as a normalization control produces differences in RNA‐Seq analysis compared to traditional normalization methods, even between two times of day in untreated plants. We illustrate the use of RNA spike‐ins with 3' RNA‐Seq and present a normalization pipeline that accounts for differences in total transcriptional levels. We evaluate the effect of normalization methods on identifying differentially expressed genes in the context of identifying the effect of the time of day on gene expression and response to chilling stress in sorghum.  more » « less
Award ID(s):
2210293
PAR ID:
10529317
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
The Plant Journal
Date Published:
Journal Name:
The Plant Journal
Volume:
118
Issue:
5
ISSN:
0960-7412
Page Range / eLocation ID:
1241 to 1257
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SUMMARY The application of high‐throughput sequencing to cellular transcriptome profiling (RNA‐seq) has enabled significant advances in our understanding of gene expression in plants. However, conventional RNA‐seq data reports mainly cytoplasmic transcript abundance rather than actual transcription rates. As a result, it is less sensitive to detect unstable and low‐abundance nuclear RNA species, such as long non‐coding RNAs, and is less directly connected to chromatin features and processes such as DNA replication. To bridge this gap, several protocols have been established to profile newly synthesized RNA in plants and other eukaryotes. These protocols can be technically challenging and present their own difficulties and limitations. Here we analyze newly synthesized nuclear RNA metabolically labeledin vivowith 5‐ethynyl uridine (EU‐nuclear RNA) in maize (Zea maysL.) root tips and compare it with the entire nuclear RNA population. We also compare both nuclear RNA preparations to conventional RNA‐seq analysis of cellular RNA. The transcript abundance profiles of protein‐coding genes in nuclear RNA and EU‐nuclear RNA were tightly correlated with each other (R2 = 0.767), but quite distinct from that of cellular RNA (R2 = 0.170 or 0.293). Nuclear and EU‐nuclear RNA reads are frequently mapped across entire genes, including introns, while cellular reads are predominantly mapped to mature transcripts. Both nuclear and EU‐nuclear RNA exhibited a greater ability to detect both protein‐coding and non‐coding expressed genes. 
    more » « less
  2. Abstract Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. 
    more » « less
  3. Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. 
    more » « less
  4. Abstract The transcriptional plasticity of cancer cells promotes intercellular heterogeneity in response to anticancer drugs and facilitates the generation of subpopulation surviving cells. Characterizing single-cell transcriptional heterogeneity after drug treatments can provide mechanistic insights into drug efficacy. Here, we used single-cell RNA-seq to examine transcriptomic profiles of cancer cells treated with paclitaxel, celecoxib and the combination of the two drugs. By normalizing the expression of endogenous genes to spike-in molecules, we found that cellular mRNA abundance shows dynamic regulation after drug treatment. Using a random forest model, we identified gene signatures classifying single cells into three states: transcriptional repression, amplification and control-like. Treatment with paclitaxel or celecoxib alone generally repressed gene transcription across single cells. Interestingly, the drug combination resulted in transcriptional amplification and hyperactivation of mitochondrial oxidative phosphorylation pathway linking to enhanced cell killing efficiency. Finally, we identified a regulatory module enriched with metabolism and inflammation-related genes activated in a subpopulation of paclitaxel-treated cells, the expression of which predicted paclitaxel efficacy across cancer cell lines and in vivo patient samples. Our study highlights the dynamic global transcriptional activity driving single-cell heterogeneity during drug response and emphasizes the importance of adding spike-in molecules to study gene expression regulation using single-cell RNA-seq. 
    more » « less
  5. Mitochondrial and plastid functions depend on coordinated expression of proteins encoded by genomic compartments that have radical differences in copy number of organellar and nuclear genomes. In polyploids, doubling of the nuclear genome may add challenges to maintaining balanced expression of proteins involved in cytonuclear interactions. Here, we use ribo-depleted RNA sequencing (RNA-seq) to analyze transcript abundance for nuclear and organellar genomes in leaf tissue from four different polyploid angiosperms and their close diploid relatives. We find that even though plastid genomes contain <1% of the number of genes in the nuclear genome, they generate the majority (69.9 to 82.3%) of messenger RNA (mRNA) transcripts in the cell. Mitochondrial genes are responsible for a much smaller percentage (1.3 to 3.7%) of the leaf mRNA pool but still produce much higher transcript abundances per gene compared to nuclear genome. Nuclear genes encoding proteins that functionally interact with mitochondrial or plastid gene products exhibit mRNA expression levels that are consistently more than 10-fold lower than their organellar counterparts, indicating an extreme cytonuclear imbalance at the RNA level despite the predominance of equimolar interactions at the protein level. Nevertheless, interacting nuclear and organellar genes show strongly correlated transcript abundances across functional categories, suggesting that the observed mRNA stoichiometric imbalance does not preclude coordination of cytonuclear expression. Finally, we show that nuclear genome doubling does not alter the cytonuclear expression ratios observed in diploid relatives in consistent or systematic ways, indicating that successful polyploid plants are able to compensate for cytonuclear perturbations associated with nuclear genome doubling. 
    more » « less