skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient and cost-effective bacterial mRNA sequencing from low input samples through ribosomal RNA depletion
Abstract BackgroundRNA sequencing is a powerful approach to quantify the genome-wide distribution of mRNA molecules in a population to gain deeper understanding of cellular functions and phenotypes. However, unlike eukaryotic cells, mRNA sequencing of bacterial samples is more challenging due to the absence of a poly-A tail that typically enables efficient capture and enrichment of mRNA from the abundant rRNA molecules in a cell. Moreover, bacterial cells frequently contain 100-fold lower quantities of RNA compared to mammalian cells, which further complicates mRNA sequencing from non-cultivable and non-model bacterial species. To overcome these limitations, we report EMBR-seq (Enrichment of mRNA by Blocked rRNA), a method that efficiently depletes 5S, 16S and 23S rRNA using blocking primers to prevent their amplification. ResultsEMBR-seq results in 90% of the sequenced RNA molecules from anE. coliculture deriving from mRNA. We demonstrate that this increased efficiency provides a deeper view of the transcriptome without introducing technical amplification-induced biases. Moreover, compared to recent methods that employ a large array of oligonucleotides to deplete rRNA, EMBR-seq uses a single or a few oligonucleotides per rRNA, thereby making this new technology significantly more cost-effective, especially when applied to varied bacterial species. Finally, compared to existing commercial kits for bacterial rRNA depletion, we show that EMBR-seq can be used to successfully quantify the transcriptome from more than 500-fold lower starting total RNA. ConclusionsEMBR-seq provides an efficient and cost-effective approach to quantify global gene expression profiles from low input bacterial samples.  more » « less
Award ID(s):
1725797
PAR ID:
10306299
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Genomics
Volume:
21
Issue:
1
ISSN:
1471-2164
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The ability to profile transcriptomes and characterize global gene expression changes has been greatly enabled by the development of RNA sequencing technologies (RNA-seq). However, the process of generating sequencing-compatible cDNA libraries from RNA samples can be time-consuming and expensive, especially for bacterial mRNAs which lack poly(A)-tails that are often used to streamline this process for eukaryotic samples. Compared to the increasing throughput and decreasing cost of sequencing, library preparation has had limited advances. Here, we describe bacterial-multiplexed-seq (BaM-seq), an approach that enables simple barcoding of many bacterial RNA samples that decreases the time and cost of library preparation. We also present targeted-bacterial-multiplexed-seq (TBaM-seq) that allows for differential expression analysis of specific gene panels with over 100-fold enrichment in read coverage. In addition, we introduce the concept of transcriptome redistribution based on TBaM-seq that dramatically reduces the required sequencing depth while still allowing for quantification of both highly and lowly abundant transcripts. These methods accurately measure gene expression changes with high technical reproducibility and agreement with gold standard, lower throughput approaches. Together, use of these library preparation protocols allows for fast, affordable generation of sequencing libraries. 
    more » « less
  2. Abstract MotivationAccurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction. ResultsIn the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available. Availability and implementationSource code is available at: https://github.com/CompbioLabUcf/IntMTQ. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract BackgroundLow back pain is a leading cause of disability worldwide and is frequently attributed to intervertebral disc (IVD) degeneration. Though the contributions of the adjacent cartilage endplates (CEP) to IVD degeneration are well documented, the phenotype and functions of the resident CEP cells are critically understudied. To better characterize CEP cell phenotype and possible mechanisms of CEP degeneration, bulk and single-cell RNA sequencing of non-degenerated and degenerated CEP cells were performed. MethodsHuman lumbar CEP cells from degenerated (Thompson grade ≥ 4) and non-degenerated (Thompson grade ≤ 2) discs were expanded for bulk (N=4 non-degenerated,N=4 degenerated) and single-cell (N=1 non-degenerated,N=1 degenerated) RNA sequencing. Genes identified from bulk RNA sequencing were categorized by function and their expression in non-degenerated and degenerated CEP cells were compared. A PubMed literature review was also performed to determine which genes were previously identified and studied in the CEP, IVD, and other cartilaginous tissues. For single-cell RNA sequencing, different cell clusters were resolved using unsupervised clustering and functional annotation. Differential gene expression analysis and Gene Ontology, respectively, were used to compare gene expression and functional enrichment between cell clusters, as well as between non-degenerated and degenerated CEP samples. ResultsBulk RNA sequencing revealed 38 genes were significantly upregulated and 15 genes were significantly downregulated in degenerated CEP cells relative to non-degenerated cells (|fold change| ≥ 1.5). Of these, only 2 genes were previously studied in CEP cells, and 31 were previously studied in the IVD and other cartilaginous tissues. Single-cell RNA sequencing revealed 11 unique cell clusters, including multiple chondrocyte and progenitor subpopulations with distinct gene expression and functional profiles. Analysis of genes in the bulk RNA sequencing dataset showed that progenitor cell clusters from both samples were enriched in “non-degenerated” genes but not “degenerated” genes. For both bulk- and single-cell analyses, gene expression and pathway enrichment analyses highlighted several pathways that may regulate CEP degeneration, including transcriptional regulation, translational regulation, intracellular transport, and mitochondrial dysfunction. ConclusionsThis thorough analysis using RNA sequencing methods highlighted numerous differences between non-degenerated and degenerated CEP cells, the phenotypic heterogeneity of CEP cells, and several pathways of interest that may be relevant in CEP degeneration. 
    more » « less
  4. Abstract Yeasts are naturally diverse, genetically tractable, and easy to grow such that researchers can investigate any number of genotypes, environments, or interactions thereof. However, studies of yeast transcriptomes have been limited by the processing capabilities of traditional RNA sequencing techniques. Here we optimize a powerful, high‐throughput single‐cell RNA sequencing (scRNAseq) platform, SPLiT‐seq (Split Pool Ligation‐based Transcriptome sequencing), for yeasts and apply it to 43,388 cells of multiple species and ploidies. This platform utilizes a combinatorial barcoding strategy to enable massively parallel RNA sequencing of hundreds of yeast genotypes or growth conditions at once. This method can be applied to most species or strains of yeast for a fraction of the cost of traditional scRNAseq approaches. Thus, our technology permits researchers to leverage “the awesome power of yeast” by allowing us to survey the transcriptome of hundreds of strains and environments in a short period of time and with no specialized equipment. The key to this method is that sequential barcodes are probabilistically appended to cDNA copies of RNA while the molecules remain trapped inside of each cell. Thus, the transcriptome of each cell is labeled with a unique combination of barcodes. Since SPLiT‐seq uses the cell membrane as a container for this reaction, many cells can be processed together without the need to physically isolate them from one another in separate wells or droplets. Further, the first barcode in the sequence can be chosen intentionally to identify samples from different environments or genetic backgrounds, enabling multiplexing of hundreds of unique perturbations in a single experiment. In addition to greater multiplexing capabilities, our method also facilitates a deeper investigation of biological heterogeneity, given its single‐cell nature. For example, in the data presented here, we detect transcriptionally distinct cell states related to cell cycle, ploidy, metabolic strategies, and so forth, all within clonal yeast populations grown in the same environment. Hence, our technology has two obvious and impactful applications for yeast research: the first is the general study of transcriptional phenotypes across many strains and environments, and the second is investigating cell‐to‐cell heterogeneity across the entire transcriptome. 
    more » « less
  5. One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350 bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq. 
    more » « less