Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Alternative cleavage and polyadenylation within introns (intronic APA) generate shorter mRNA isoforms; however, their physiological significance remains elusive. In this study, we developed a comprehensive workflow to analyze intronic APA profiles using the mammalian target of rapamycin (mTOR)-regulated transcriptome as a model system. Our investigation revealed two contrasting effects within the transcriptome in response to fluctuations in cellular mTOR activity: an increase in intronic APA for a subset of genes and a decrease for another subset of genes. The application of this workflow to RNA-seq data from The Cancer Genome Atlas demonstrated that this dichotomous intronic APA pattern is a consistent feature in transcriptomes across both normal tissues and various cancer types. Notably, our analyses of protein length changes resulting from intronic APA events revealed two distinct phenomena in proteome programming: a loss of functional domains due to significant changes in protein length or minimal alterations in C- terminal protein sequences within unstructured regions. Focusing on conserved intronic APA events across 10 different cancer types highlighted the prevalence of the latter cases in cancer transcriptomes, whereas the former cases were relatively enriched in normal tissue transcriptomes. These observations suggest potential, yet distinct, roles for intronic APA events during pathogenic processes and emphasize the abundance of protein isoforms with similar lengths in the cancer proteome. Furthermore, our investigation into the isoform-specific functions of JMJD6 intronic APA events supported the hypothesis that alterations in unstructured C-terminal protein regions lead to functional differences. Collectively, our findings underscore intronic APA events as a discrete molecular signature present in both normal tissues and cancer transcriptomes, highlighting the contribution of APA to the multifaceted functionality of the cancer proteome.more » « lessFree, publicly-accessible full text available October 1, 2025
-
Abstract Spatial transcriptomics data play a crucial role in cancer research, providing a nuanced understanding of the spatial organization of gene expression within tumor tissues. Unraveling the spatial dynamics of gene expression can unveil key insights into tumor heterogeneity and aid in identifying potential therapeutic targets. However, in many large-scale cancer studies, spatial transcriptomics data are limited, with bulk RNA-seq and corresponding Whole Slide Image (WSI) data being more common (e.g. TCGA project). To address this gap, there is a critical need to develop methodologies that can estimate gene expression at near-cell (spot) level resolution from existing WSI and bulk RNA-seq data. This approach is essential for reanalyzing expansive cohort studies and uncovering novel biomarkers that have been overlooked in the initial assessments. In this study, we present STGAT (Spatial Transcriptomics Graph Attention Network), a novel approach leveraging Graph Attention Networks (GAT) to discern spatial dependencies among spots. Trained on spatial transcriptomics data, STGAT is designed to estimate gene expression profiles at spot-level resolution and predict whether each spot represents tumor or non-tumor tissue, especially in patient samples where only WSI and bulk RNA-seq data are available. Comprehensive tests on two breast cancer spatial transcriptomics datasets demonstrated that STGAT outperformed existing methods in accurately predicting gene expression. Further analyses using the TCGA breast cancer dataset revealed that gene expression estimated from tumor-only spots (predicted by STGAT) provides more accurate molecular signatures for breast cancer sub-type and tumor stage prediction, and also leading to improved patient survival and disease-free analysis. Availability: Code is available at https://github.com/compbiolabucf/STGAT.
-
Abstract Background The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3′-untranslated region (3′-UTR) of mRNA produces transcripts with shorter or longer 3′-UTR. Often, 3′-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3′-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3′-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3′-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3′-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.
Methods APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3′-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3′-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3′-UTR annotation and read coverage on the 3′-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available at
https://github.com/compbiolabucf/APA-Scan .Result APA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3′-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3′-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3′-UTR APA events and improve genome annotation.
Conclusion APA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3′-UTR APA events. The pipeline integrates both RNA-seq and 3′-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots.
-
Robinson, Peter (Ed.)Abstract Motivation Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. Results Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. Availability and implementation Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
null (Ed.)Microbes and viruses are known to alter host transcriptomes by means of infection. In light of recent challenges posed by the COVID-19 pandemic, a deeper understanding of the disease at the transcriptome level is needed. However, research about transcriptome reprogramming by post-transcriptional regulation is very limited. In this study, computational methods developed by our lab were applied to RNA-seq data to detect transcript variants (i.e., alternative splicing (AS) and alternative polyadenylation (APA) events). The RNA-seq data were obtained from a publicly available source, and they consist of mock-treated and SARS-CoV-2 infected (COVID-19) lung alveolar (A549) cells. Data analysis results show that more AS events are found in SARS-CoV-2 infected cells than in mock-treated cells, whereas fewer APA events are detected in SARS-CoV-2 infected cells. A combination of conventional differential gene expression analysis and transcript variants analysis revealed that most of the genes with transcript variants are not differentially expressed. This indicates that no strong correlation exists between differential gene expression and the AS/APA events in the mock-treated or SARS-CoV-2 infected samples. These genes with transcript variants can be applied as another layer of molecular signatures for COVID-19 studies. In addition, the transcript variants are enriched in important biological pathways that were not detected in the studies that only focused on differential gene expression analysis. Therefore, the pathways may lead to new molecular mechanisms of SARS-CoV-2 pathogenesis.more » « less
-
(1) Background: A simplistic understanding of the central dogma falls short in correlating the number of genes in the genome to the number of proteins in the proteome. Post-transcriptional alternative splicing contributes to the complexity of the proteome and is critical in understanding gene expression. mRNA-sequencing (RNA-seq) has been widely used to study the transcriptome and provides opportunity to detect alternative splicing events among different biological conditions. Despite the popularity of studying transcriptome variants with RNA-seq, few efficient and user-friendly bioinformatics tools have been developed for the genome-wide detection and visualization of alternative splicing events. (2) Results: We propose AS-Quant, (Alternative Splicing Quantitation), a robust program to identify alternative splicing events from RNA-seq data. We then extended AS-Quant to visualize the splicing events with short-read coverage plots along with complete gene annotation. The tool works in three major steps: (i) calculate the read coverage of the potential spliced exons and the corresponding gene; (ii) categorize the events into five different categories according to the annotation, and assess the significance of the events between two biological conditions; (iii) generate the short reads coverage plot for user specified splicing events. Our extensive experiments on simulated and real datasets demonstrate that AS-Quant outperforms the other three widely used baselines, SUPPA2, rMATS, and diffSplice for detecting alternative splicing events. Moreover, the significant alternative splicing events identified by AS-Quant between two biological contexts were validated by RT-PCR experiment. (3) Availability: AS-Quant is implemented in Python 3.0. Source code and a comprehensive user’s manual are freely available online.more » « less
-
null (Ed.)Deregulation of gene expression is associated with the pathogenesis of numerous human diseases including cancer. Current data analyses on gene expression are mostly focused on differential gene/transcript expression in big data-driven studies. However, a poor connection to the proteome changes is a widespread problem in current data analyses. This is partly due to the complexity of gene regulatory pathways at the post-transcriptional level. In this study, we overcome these limitations and introduce a graph-based learning model, PTNet, which simulates the microRNAs (miRNAs) that regulate gene expression post-transcriptionally in silico. Our model does not require large-scale proteomics studies to measure the protein expression and can successfully predict the protein levels by considering the miRNA–mRNA interaction network, the mRNA expression, and the miRNA expression. Large-scale experiments on simulations and real cancer high-throughput datasets using PTNet validated that (i) the miRNA-mediated interaction network affects the abundance of corresponding proteins and (ii) the predicted protein expression has a higher correlation with the proteomics data (ground-truth) than the mRNA expression data. The classification performance also shows that the predicted protein expression has an improved prediction power on cancer outcomes compared to the prediction done by the mRNA expression data only or considering both mRNA and miRNA. Availability: PTNet toolbox is available at http://github.com/CompbioLabUCF/PTNetmore » « less
-
The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA provides a means to regulate gene expression at the post-transcriptional level and is known to promote translation. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3’-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3’-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations. APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3’-UTR transcripts in the RNA-seq data. The performance of APA-Scan was validated by qPCR.more » « less
-
mTOR-regulated U2af1 tandem exon splicing specifies transcriptome features for translational controlAbstract U2 auxiliary factor 1 (U2AF1) functions in 3′-splice site selection during pre-mRNA processing. Alternative usage of duplicated tandem exons in U2AF1 produces two isoforms, U2AF1a and U2AF1b, but their functional differences are unappreciated due to their homology. Through integrative approaches of genome editing, customized-transcriptome profiling and crosslinking-mediated interactome analyses, we discovered that the expression of U2AF1 isoforms is controlled by mTOR and they exhibit a distinctive molecular profile for the splice site and protein interactomes. Mechanistic dissection of mutually exclusive alternative splicing events revealed that U2AF1 isoforms’ inherent differential preferences of nucleotide sequences and their stoichiometry determine the 3′-splice site. Importantly, U2AF1a-driven transcriptomes feature alternative splicing events in the 5′-untranslated region (5′-UTR) that are favorable for translation. These findings unveil distinct roles of duplicated tandem exon-derived U2AF1 isoforms in the regulation of the transcriptome and suggest U2AF1a-driven 5′-UTR alternative splicing as a molecular mechanism of mTOR-regulated translational control.more » « less