skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DeepPASTA: deep neural network based polyadenylation site analysis
Abstract MotivationAlternative polyadenylation (polyA) sites near the 3′ end of a pre-mRNA create multiple mRNA transcripts with different 3′ untranslated regions (3′ UTRs). The sequence elements of a 3′ UTR are essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding and translation efficiency. Moreover, numerous studies in the literature have reported the correlation between diseases and the shortening (or lengthening) of 3′ UTRs. As alternative polyA sites are common in mammalian genes, several machine learning tools have been published for predicting polyA sites from sequence data. These tools either consider limited sequence features or use relatively old algorithms for polyA site prediction. Moreover, none of the previous tools consider RNA secondary structures as a feature to predict polyA sites. ResultsIn this paper, we propose a new deep learning model, called DeepPASTA, for predicting polyA sites from both sequence and RNA secondary structure data. The model is then extended to predict tissue-specific polyA sites. Moreover, the tool can predict the most dominant (i.e. frequently used) polyA site of a gene in a specific tissue and relative dominance when two polyA sites of the same gene are given. Our extensive experiments demonstrate that DeepPASTA signisficantly outperforms the existing tools for polyA site prediction and tissue-specific relative and absolute dominant polyA site prediction. Availability and implementationhttps://github.com/arefeen/DeepPASTA Supplementary informationSupplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1646333
PAR ID:
10124041
Author(s) / Creator(s):
 ;  ;  ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
35
Issue:
22
ISSN:
1367-4803
Page Range / eLocation ID:
p. 4577-4585
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundRNA secondary structure (RSS) can influence the regulation of transcription, RNA processing, and protein synthesis, among other processes. 3′ untranslated regions (3′ UTRs) of mRNA also hold the key for many aspects of gene regulation. However, there are often contradictory results regarding the roles of RSS in 3′ UTRs in gene expression in different organisms and/or contexts. ResultsHere, we incidentally observe that the primary substrate of miR159a (pri-miR159a), when embedded in a 3′ UTR, could promote mRNA accumulation. The enhanced expression is attributed to the earlier polyadenylation of the transcript within the hybrid pri-miR159a-3′ UTR and, resultantly, a poorly structured 3′ UTR. RNA decay assays indicate that poorly structured 3′ UTRs could promote mRNA stability, whereas highly structured 3′ UTRs destabilize mRNA in vivo. Genome-wide DMS-MaPseq also reveals the prevailing inverse relationship between 3′ UTRs’ RSS and transcript accumulation in the transcriptomes ofArabidopsis, rice, and even human. Mechanistically, transcripts with highly structured 3′ UTRs are preferentially degraded by 3′–5′ exoribonuclease SOV and 5′–3′ exoribonuclease XRN4, leading to decreased expression inArabidopsis. Finally, we engineer different structured 3′ UTRs to an endogenousFTgene and alter theFT-regulated flowering time inArabidopsis. ConclusionsWe conclude that highly structured 3′ UTRs typically cause reduced accumulation of the harbored transcripts inArabidopsis. This pattern extends to rice and even mammals. Furthermore, our study provides a new strategy of engineering the 3′ UTRs’ RSS to modify plant traits in agricultural production and mRNA stability in biotechnology. 
    more » « less
  2. Structures in the 5′ untranslated regions (UTRs) of mRNAs can physically modulate translation efficiency by impeding the scanning ribosome or by sequestering the translational start site. We assessed the impact of stable protein binding in 5′- and 3′-UTRs on translation efficiency by targeting the MS2 coat protein to a reporter RNA via its hairpin recognition site. Translation was assessed from the reporter RNA when coexpressed with MS2 coat proteins of varying affinities for the RNA, and at different expression levels. Binding of high-affinity proteins in the 5′-UTR hindered translation, whereas no effect was observed when the coat protein was targeted to the 3′-UTR. Inhibition of translation increased with coat protein concentration and affinity, reaching a maximum of 50%–70%. MS2 proteins engineered to bind two reporter mRNA sites had a stronger effect than those binding a single site. Our findings demonstrate that protein binding in an mRNA 5′-UTR physically impedes translation, with the effect governed by affinity, concentration, and sterics. 
    more » « less
  3. The 3′ untranslated regions (3′ UTRs) of mRNAs serve as hubs for post-transcriptional control as the targets of microRNAs (miRNAs) and RNA-binding proteins (RBPs). Sequences in 3′ UTRs confer alterations in mRNA stability, direct mRNA localization to subcellular regions, and impart translational control. Thousands of mRNAs are localized to subcellular compartments in neurons—including axons, dendrites, and synapses—where they are thought to undergo local translation. Despite an established role for 3′ UTR sequences in imparting mRNA localization in neurons, the specific RNA sequences and structural features at play remain poorly understood. The nervous system selectively expresses longer 3′ UTR isoforms via alternative polyadenylation (APA). The regulation of APA in neurons and the neuronal functions of longer 3′ UTR mRNA isoforms are starting to be uncovered. Surprising roles for 3′ UTRs are emerging beyond the regulation of protein synthesis and include roles as RBP delivery scaffolds and regulators of alternative splicing. Evidence is also emerging that 3′ UTRs can be cleaved, leading to stable, isolated 3′ UTR fragments which are of unknown function. Mutations in 3′ UTRs are implicated in several neurological disorders—more studies are needed to uncover how these mutations impact gene regulation and what is their relationship to disease severity. 
    more » « less
  4. Abstract mRNA therapeutics are revolutionizing the pharmaceutical industry, but methods to optimize the primary sequence for increased expression are still lacking. Here, we design 5’UTRs for efficient mRNA translation using deep learning. We perform polysome profiling of fully or partially randomized 5’UTR libraries in three cell types and find that UTR performance is highly correlated across cell types. We train models on our datasets and use them to guide the design of high-performing 5’UTRs using gradient descent and generative neural networks. We experimentally test designed 5’UTRs with mRNA encoding megaTALTMgene editing enzymes for two different gene targets and in two different cell lines. We find that the designed 5’UTRs support strong gene editing activity. Editing efficiency is correlated between cell types and gene targets, although the best performing UTR was specific to one cargo and cell type. Our results highlight the potential of model-based sequence design for mRNA therapeutics. 
    more » « less
  5. Abstract RNA‐protein interactions play essential roles in regulating gene expression. While some RNA‐protein interactions are “specific”, that is, the RNA‐binding proteins preferentially bind to particular RNA sequence or structural motifs, others are “non‐RNA specific.” Deciphering the protein‐RNA recognition code is essential for comprehending the functional implications of these interactions and for developing new therapies for many diseases. Because of the high cost of experimental determination of protein‐RNA interfaces, there is a need for computational methods to identify RNA‐binding residues in proteins. While most of the existing computational methods for predicting RNA‐binding residues in RNA‐binding proteins are oblivious to the characteristics of the partner RNA, there is growing interest in methods for partner‐specific prediction of RNA binding sites in proteins. In this work, we assess the performance of two recently published partner‐specific protein‐RNA interface prediction tools, PS‐PRIP, and PRIdictor, along with our own new tools. Specifically, we introduce a novel metric, RNA‐specificity metric (RSM), for quantifying the RNA‐specificity of the RNA binding residues predicted by such tools. Our results show that the RNA‐binding residues predicted by previously published methods are oblivious to the characteristics of the putative RNA binding partner. Moreover, when evaluated using partner‐agnostic metrics, RNA partner‐specific methods are outperformed by the state‐of‐the‐art partner‐agnostic methods. We conjecture that either (a) the protein‐RNA complexes in PDB are not representative of the protein‐RNA interactions in nature, or (b) the current methods for partner‐specific prediction of RNA‐binding residues in proteins fail to account for the differences in RNA partner‐specific versus partner‐agnostic protein‐RNA interactions, or both. 
    more » « less