Abstract Motivation Isoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms. Results We evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions. Availability and implementation https://github.com/dls03/DeepIsoFun/ Supplementary information Supplementary data are available at Bioinformatics online.
more »
« less
DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
Abstract MotivationAlternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. ResultsIn this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. Availability and implementationhttps://github.com/haochenucr/DIFFUSE. Supplementary informationSupplementary data are available at Bioinformatics online.
more »
« less
- Award ID(s):
- 1646333
- PAR ID:
- 10425979
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 35
- Issue:
- 14
- ISSN:
- 1367-4803
- Page Range / eLocation ID:
- p. i284-i294
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract MotivationGene deletion is traditionally thought of as a nonadaptive process that removes functional redundancy from genomes, such that it generally receives less attention than duplication in evolutionary turnover studies. Yet, mounting evidence suggests that deletion may promote adaptation via the “less-is-more” evolutionary hypothesis, as it often targets genes harboring unique sequences, expression profiles, and molecular functions. Hence, predicting the relative prevalence of redundant and unique functions among genes targeted by deletion, as well as the parameters underlying their evolution, can shed light on the role of gene deletion in adaptation. ResultsHere, we present CLOUDe, a suite of machine learning methods for predicting evolutionary targets of gene deletion events from expression data. Specifically, CLOUDe models expression evolution as an Ornstein–Uhlenbeck process, and uses multi-layer neural network, extreme gradient boosting, random forest, and support vector machine architectures to predict whether deleted genes are “redundant” or “unique”, as well as several parameters underlying their evolution. We show that CLOUDe boasts high power and accuracy in differentiating between classes, and high accuracy and precision in estimating evolutionary parameters, with optimal performance achieved by its neural network architecture. Application of CLOUDe to empirical data from Drosophila suggests that deletion primarily targets genes with unique functions, with further analysis showing these functions to be enriched for protein deubiquitination. Thus, CLOUDe represents a key advance in learning about the role of gene deletion in functional evolution and adaptation. Availability and implementationCLOUDe is freely available on GitHub (https://github.com/anddssan/CLOUDe).more » « less
-
Abstract MotivationAccurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction. ResultsIn the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available. Availability and implementationSource code is available at: https://github.com/CompbioLabUcf/IntMTQ. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract BackgroundMorphologic sex differences between males and females typically emerge after the primordial germ cell migration and gonad formation, although sex is determined at fertilization based on chromosome composition. A key debated sexual difference is the embryonic developmental rate, within vitroproduced male embryos often developing faster. However, the molecular mechanisms driving early embryonic sex differences remain unclear. ResultsTo investigate the transcriptional sex difference during early development,in vitroproduced bovine blastocysts were collected and sexed by PCR. A significant male-biased development was observed in expanded blastocysts. Ultra-low input RNA-seq analysis identified 837 DEGs, with 231 upregulated and 606 downregulated in males. Functional enrichment analysis revealed male-biased DEGs were associated with metabolic regulation, whereas female-biased DEGs were related to female gonad development, sex differentiation, inflammatory pathways, and TGF-beta signaling. Comparing X chromosome and autosome expression ratio, we found that female-biased DEGs contributed to the higher X-linked gene dosage, a phenomenon not observed in male embryos. Moreover, we identified the sex-biased transcription factors and RNA-bind proteins, including pluripotent factors such asSOX21andPRDM14, and splicing factorsFMR1andHNRNPH2. Additionally, we revealed 1,555 significantly sex-biased differential alternative splicing (AS), predominantly skipped exons, mapped to 906 genes, with 59 overlapping with DEGs enriched in metabolic and autophagy pathways. By incorporating novel isoforms from long reads sequencing, we identified 1,151 sex-biased differentially expressed isoforms (DEIs) associated with 1,017 genes. Functional analysis showed that female-biased DEIs were involved in the negative regulation of transcriptional activity, while male-biased DEIs were related to energy metabolism. Furthermore, we identified sex-biased differential exon usage inDENND1B, DIS3L2, DOCK11, IL1RAPL2,andZRSR2Y,indicating their sex-specific regulation in early embryo development. ConclusionThis study provided a comprehensive analysis of transcriptome differences between male and female bovine blastocysts, integrating sex-biased gene expression, alternative splicing, and isoform dynamics. Our findings indicate that enriched metabolism processes in male embryos may contribute to the faster developmental pace, providing insights into sex-specific regulatory mechanisms during early embryogenesis. Plain English summaryMale and female early embryos develop at different speeds, with male embryos often developing faster than female embryos. However, the reasons behind these early differences remain unclear. In this study, we examined gene activity in bovine embryos to uncover the biological factors regulating these early sex differences. We collected in vitro-produced bovine blastocysts, examined their sex, and confirmed that male embryos develop faster. By analyzing global gene activity, including alternative splicing, which allows one gene to code for multiple RNA isoforms and proteins, we found distinct gene expression profiles between male and female embryos. Male embryos showed higher activity in genes related to metabolism and cellular functions, while female embryos had increased activity in genes associated with female-specific gonad development and gene expression regulation. We also examined differences in how genes on the X chromosome were expressed. Female embryos had higher X-linked gene expression, which may contribute to sex-specific developmental regulation. Additionally, we identified sex-specific transcription factors and RNA-binding proteins that regulate early embryo development, some of which are known to control pluripotency and gene splicing. Overall, our study provides new insights into how gene activity shapes early sex differences, suggesting that enhanced metabolism in male embryos may be a key driver of their faster developmental rate. HighlightsMale embryos develop faster due to increased gene expression in metabolism pathwaysFemale embryos exhibit higher X-linked gene expression, suggesting X-dosage compensation plays a role in early developmentSex-biased alternative splicing events contribute to embryonic metabolism, autophagy, and transcriptional regulation in embryosSex-biased isoform diversity contributes to distinct developmental regulation in male and female embryosKey pluripotency factors (SOX21, PRDM14) and splicing regulators (FMR1, HNRNPH2) drive sex-specific gene expressionmore » « less
-
Background: Cell type specialization is a hallmark of complex multicellular organisms and is usually established through implementation of cell-type-specific gene expression programs. The multicellular green alga Volvox carteri has just two cell types, germ and soma, that have previously been shown to have very different transcriptome com- positions which match their specialized roles. Here we interrogated another potential mechanism for differentiation in V. carteri, cell type specific alternative transcript isoforms (CTSAI). Methods: We used pre-existing predictions of alternative transcripts and de novo transcript assembly with HISAT2 and Ballgown software to compile a list of loci with two or more transcript isoforms, identified a small subset that were candidates for CTSAI, and manually curated this subset of genes to remove false positives. We experimentally verified three candidates using semi-quantitative RT-PCR to assess relative isoform abundance in each cell type. Results: Of the 1978 loci with two or more predicted transcript isoforms 67 of these also showed cell type isoform expression biases. After curation 15 strong candidates for CTSAI were identified, three of which were experimen- tally verified, and their predicted gene product functions were evaluated in light of potential cell type specific roles. A comparison of genes with predicted alternative splicing from Chlamydomonas reinhardtii, a unicellular relative of V. carteri, identified little overlap between ortholog pairs with alternative splicing in both species. Finally, we inter- rogated cell type expression patterns of 126 V. carteri predicted RBP encoding genes and found 40 that showed either somatic or germ cell expression bias. These RBPs are potential mediators of CTSAI in V. carteri and suggest possible pre-adaptation for cell type specific RNA processing and a potential path for generating CTSAI in the early ancestors of metazoans and plants. Conclusions: We predicted numerous instances of alternative transcript isoforms in Volvox, only a small subset of which showed cell type specific isoform expression bias. However, the validated examples of CTSAI supported existing hypotheses about cell type specialization in V. carteri, and also suggested new hypotheses about mecha- nisms of functional specialization for their gene products. Our data imply that CTSAI operates as a minor but impor- tant component of V. carteri cellular differentiation and could be used as a model for how alternative isoforms emerge and co-evolve with cell type specialization.more » « less
An official website of the United States government
