skip to main content


Title: DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates
Abstract Motivation

High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy.

Results

We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature.

Availability and implementation

The software is available at https://github.com/hao-peng/DEIsoM

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10394873
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
33
Issue:
19
ISSN:
1367-4803
Page Range / eLocation ID:
p. 3018-3027
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Alternatively spliced genes produce multiple spliced isoforms, called transcript variants. In differential alternative splicing, transcript variant abundance differs across sample types. Differential alternative splicing is common in animal systems and influences cellular development in many processes, but its extent and significance is not as well known in plants. To investigate differential alternative splicing in plants, we examined RNA‐Seq data from rice seedlings. The data included three biological replicates per sample type, approximately 30 million sequence alignments per replicate, and four sample types: roots and shoots treated with exogenous cytokinin delivered hydroponically or a mock treatment. Cytokinin treatment triggered expression changes in thousands of genes but had negligible effect on splicing patterns. However, many genes were differentially spliced between mock‐treated roots and shoots, indicating that our methods were sufficiently sensitive to detect differential splicing between data sets. Quantitative fragment analysis of reverse transcriptase‐PCR products made from newly prepared rice samples confirmed 9 of 10 differential splicing events between rice roots and shoots. Differential alternative splicing typically changed the relative abundance of splice variants that co‐occurred in a data set. Analysis of a similar (but less deeply sequenced) RNA‐Seq data set fromArabidopsisshowed the same pattern. In both theArabidopsisand rice RNA‐Seq data sets, most genes annotated as alternatively spliced had small minor variant frequencies. Of splicing choices with abundant support for minor forms, most alternative splicing events were located within the protein‐coding sequence and maintained the annotated reading frame. A tool for visualizing protein annotations in the context of genomic sequence (ProtAnnot) together with a genome browser (Integrated Genome Browser) were used to visualize and assess effects of differential splicing on gene function. In general, differentially spliced regions coincided with conserved protein domains, indicating that differential alternative splicing is likely to affect protein function between root and shoot tissue in rice.

     
    more » « less
  2. Abstract Background

    Cell type specialization is a hallmark of complex multicellular organisms and is usually established through implementation of cell-type-specific gene expression programs. The multicellular green algaVolvox carterihas just two cell types, germ and soma, that have previously been shown to have very different transcriptome compositions which match their specialized roles. Here we interrogated another potential mechanism for differentiation inV. carteri, cell type specific alternative transcript isoforms (CTSAI).

    Methods

    We used pre-existing predictions of alternative transcripts and de novo transcript assembly with HISAT2 and Ballgown software to compile a list of loci with two or more transcript isoforms, identified a small subset that were candidates for CTSAI, and manually curated this subset of genes to remove false positives. We experimentally verified three candidates using semi-quantitative RT-PCR to assess relative isoform abundance in each cell type.

    Results

    Of the 1978 loci with two or more predicted transcript isoforms 67 of these also showed cell type isoform expression biases. After curation 15 strong candidates for CTSAI were identified, three of which were experimentally verified, and their predicted gene product functions were evaluated in light of potential cell type specific roles. A comparison of genes with predicted alternative splicing fromChlamydomonas reinhardtii, a unicellular relative ofV. carteri, identified little overlap between ortholog pairs with alternative splicing in both species. Finally, we interrogated cell type expression patterns of 126 V. carteripredicted RNA binding protein (RBP) encoding genes and found 40 that showed either somatic or germ cell expression bias. These RBPs are potential mediators of CTSAI inV. carteriand suggest possible pre-adaptation for cell type specific RNA processing and a potential path for generating CTSAI in the early ancestors of metazoans and plants.

    Conclusions

    We predicted numerous instances of alternative transcript isoforms in Volvox, only a small subset of which showed cell type specific isoform expression bias. However, the validated examples of CTSAI supported existing hypotheses about cell type specialization inV. carteri,and also suggested new hypotheses about mechanisms of functional specialization for their gene products. Our data imply that CTSAI operates as a minor but important component ofV. cartericellular differentiation and could be used as a model for how alternative isoforms emerge and co-evolve with cell type specialization.

     
    more » « less
  3. Abstract Background

    The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3′-untranslated region (3′-UTR) of mRNA produces transcripts with shorter or longer 3′-UTR. Often, 3′-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3′-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3′-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3′-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3′-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.

    Methods

    APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3′-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3′-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3′-UTR annotation and read coverage on the 3′-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available athttps://github.com/compbiolabucf/APA-Scan.

    Result

    APA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3′-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3′-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3′-UTR APA events and improve genome annotation.

    Conclusion

    APA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3′-UTR APA events. The pipeline integrates both RNA-seq and 3′-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots.

     
    more » « less
  4. Abstract Motivation

    Accurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction.

    Results

    In the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available.

    Availability and implementation

    Source code is available at: https://github.com/CompbioLabUcf/IntMTQ.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Alternative splicing extends the coding potential of genomes by creating multiple isoforms from one gene. Isoforms can render transcript specificity and diversity to initiate multiple responses required during transcriptome adjustments in stressed environments. Although the prevalence of alternative splicing is widely recognized, how diverse isoforms facilitate stress adaptation in plants that thrive in extreme environments are unexplored. Here we examine how an extremophyte model, Schrenkiella parvula, coordinates alternative splicing in response to high salinity compared to a salt-stress sensitive model, Arabidopsis thaliana. We use Iso-Seq to generate full length reference transcripts and RNA-seq to quantify differential isoform usage in response to salinity changes. We find that single-copy orthologs where S. parvula has a higher number of isoforms than A. thaliana as well as S. parvula genes observed and predicted using machine learning to have multiple isoforms are enriched in stress associated functions. Genes that showed differential isoform usage were largely mutually exclusive from genes that were differentially expressed in response to salt. S. parvula transcriptomes maintained specificity in isoform usage assessed via a measure of expression disorderdness during transcriptome reprogramming under salt. Our study adds a novel resource and insight to study plant stress tolerance evolved in extreme environments. 
    more » « less