skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty
Abstract Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.  more » « less
Award ID(s):
2317838 2029424
PAR ID:
10431354
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Volume:
24
Issue:
1
ISSN:
1474-760X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationAlternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. ResultsIn this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. Availability and implementationhttps://github.com/haochenucr/DIFFUSE. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract MotivationAccurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction. ResultsIn the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available. Availability and implementationSource code is available at: https://github.com/CompbioLabUcf/IntMTQ. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Kendziorski, Christina (Ed.)
    Abstract Motivation Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified DAI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart. 
    more » « less
  4. Behavioral evolution relies on genetic changes, yet few behaviors can be traced to specific genetic sequences in vertebrates. Here we provide experimental evidence showing that differentiation of a single gene has contributed to the evolution of divergent behavioral phenotypes in the white-throated sparrow, a common backyard songbird. In this species, a series of chromosomal inversions has formed a supergene that segregates with an aggressive phenotype. The supergene has capturedESR1, the gene that encodes estrogen receptor α (ERα); as a result, this gene is accumulating changes that now distinguish the supergene allele from the standard allele. Our results show that in birds of the more aggressive phenotype, ERα knockdown caused a phenotypic change to that of the less aggressive phenotype. We next showed that in a free-living population, aggression is predicted by allelic imbalance favoring the supergene allele. Finally, we identifiedcis-regulatory features, both genetic and epigenetic, that explain the allelic imbalance. This work provides a rare illustration of how genotypic divergence has led to behavioral phenotypic divergence in a vertebrate. 
    more » « less
  5. Abstract Hibernation in brown bears is an annual process involving multiple physiologically distinct seasons—hibernation, active, and hyperphagia. While recent studies have characterized broad patterns of differential gene regulation and isoform usage between hibernation and active seasons, patterns of gene and isoform expression during hyperphagia remain relatively poorly understood. The hyperphagia stage occurs between active and hibernation seasons and involves the accumulation of large fat reserves in preparation for hibernation. Here, we use time-series analyses of gene expression and isoform usage to interrogate transcriptomic regulation associated with all three seasons. We identify a large number of genes with significant differential isoform usage (DIU) across seasons and show that these patterns of isoform usage are largely tissue-specific. We also show that DIU and differential gene-level expression responses are generally non-overlapping, with only a small subset of multi-isoform genes showing evidence of both gene-level expression changes and changes in isoform usage across seasons. Additionally, we investigate nuanced regulation of candidate genes involved in the insulin signaling pathway and find evidence of hyperphagia-specific gene expression and isoform regulation that may enhance fat accumulation during hyperphagia. Our findings highlight the value of using temporal analyses of both gene- and isoform-level gene expression when interrogating complex physiological phenotypes and provide new insight into the mechanisms underlying seasonal changes in bear physiology. 
    more » « less