skip to main content

Title: Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing

Sequence-specific RNA-binding proteins (RBPs) play central roles in splicing decisions. Here, we describe a modular splicing architecture that leverages in vitro-derived RNA affinity models for 79 human RBPs and the annotated human genome to produce improved models of RBP binding and activity. Binding and activity are modeled by separate Motif and Aggregator components that can be mixed and matched, enforcing sparsity to improve interpretability. Training a new Adjusted Motif (AM) architecture on the splicing task not only yields better splicing predictions but also improves prediction of RBP-binding sites in vivo and of splicing activity, assessed using independent data.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. RNA-binding proteins (RBPs) participate in all stages of RNA life cycle from transcription, splicing, to translation. Under the ENCODE project, a large number of RBPs were knocked down in human cancer cell lines, offering an excellent opportunity to infer targets of RBPs. Taking both RBP binding sites and RNA-seq profiles of RBP knockdown samples as input, we present a pipeline to identify causal RBP RNA interactions. The pipeline employs a recent functional chi-square test (FunChisq) that deciphers directional association, and utilizes a novel functional index that measures the effect size of functional dependency. We examined ∼45 million RBP RNA pairs in leukemia (K562) and liver cancer (HepG2) cell lines for functional patterns as causal interaction candidates. Here, we report a total of 936,707 RBP RNA pairs in the two cell lines that show statistically significant linear or nonlinear functional patterns. About 31% of these pairs have supportive biological evidence from other sources, suggesting the effectiveness of the pipeline. The interactions constitute RBP specific regulatory networks that may potentially represent core mechanisms in the two cancers. The pipeline is implemented through an R interface with pre-computed results and data libraries for users to query specific networks and visualize RBP RNA interactions. Such networks serve as a useful resource for studying RNA dysregulation in cancer. 
    more » « less
  2. RNA binding proteins (RBPs) regulate all aspects of RNA biogenesis from transcription, splicing, and translation to degradation, and they have a critical role in cellular homeostasis and functional diversity. Recent studies have indicated that altered expressions of RBPs are associated with many human diseases ranging from neurologic disorders to cancer. The transcriptional coregulator yes-associated protein 1 (YAP1), a critical nuclear effector of the mammalian Hippo pathway, regulates cell fate, cell contact, metabolism, and developmental processes. This study demonstrates a link between YAP1 and nucleophosmin1 (NPM1) protein. NPM1 is an RNA-binding protein that regulates many cellular activities, including ribosome biogenesis, RNA processing, chromatin remodeling, DNA repair, and genomic stability. We identified NPM1 from YAP1 protein complexes of androgen-responsive human cancer cells using proteomics approaches. Our proximity ligation assay demonstrated that YAP1 and NPM1 physically interacted with each other. The interaction between YAP1 and NPM1 occurred in cell nuclei and was regulated by androgen hormone signaling. In addition, our GST-pulldown assay demonstrated that NPM1 formed a protein complex with the proline-rich domain of YAP1. Furthermore, our enhanced RNA interactome capture (eRIC) assay showed that androgen also regulated the interaction of RBPs to polyA+ mRNA within the cell. Consistent with this observation, our eRIC assay combined with the mass spectrometry method enabled us to identify distinct RBP patterns in human cancer cells that are genetically related but phenotypically different. These observations indicate that global alterations of RBPs under changing environmental conditions may have essential roles in cellular physiology and disease biology. 
    more » « less
  3. Abstract

    RNA-binding proteins (RBPs) modulate alternative splicing outcomes to determine isoform expression and cellular survival. To identify RBPs that directly drive alternative exon inclusion, we developed tethered function luciferase-based splicing reporters that provide rapid, scalable and robust readouts of exon inclusion changes and used these to evaluate 718 human RBPs. We performed enhanced cross-linking immunoprecipitation, RNA sequencing and affinity purification–mass spectrometry to investigate a subset of candidates with no prior association with splicing. Integrative analysis of these assays indicates surprising roles for TRNAU1AP, SCAF8 and RTCA in the modulation of hundreds of endogenous splicing events. We also leveraged our tethering assays and top candidates to identify potent and compact exon inclusion activation domains for splicing modulation applications. Using these identified domains, we engineered programmable fusion proteins that outperform current artificial splicing factors at manipulating inclusion of reporter and endogenous exons. This tethering approach characterizes the ability of RBPs to induce exon inclusion and yields new molecular parts for programmable splicing control.

    more » « less
  4. Abstract Objective

    RNA-binding proteins (RBPs) are important regulators of gene expression that influence mRNA splicing, stability, localization, transport, and translational control. In particular, RBPs play an important role in neurons, which have a complex morphology. Previously, we showed that there are many RBPs that play a conserved role in dendrite development inDrosophiladendritic arborization neurons andCaenorhabditis elegans(C. elegans) PVD neurons including the cytoplasmic polyadenylation element binding proteins (CPEBs), Orb inDrosophilaand CPB-3 inC. elegans, and the DEAD box RNA helicases, Me31B inDrosophilaand CGH-1 inC. elegans. During these studies, we observed that fluorescently-labeled CPB-3 and CGH-1 localize to cytoplasmic particles that are motile, and our research aims to further characterize these RBP-containing particles in live neurons.


    Here we extend on previous work to show that CPB-3 and CGH-1 localize to motile particles within dendrites that move at a speed consistent with microtubule-based transport. This is consistent with a model in which CPB-3 and CGH-1 influence dendrite development through the transport and localization of their mRNA targets. Moreover, CPB-3 and CGH-1 rarely localize to the same particles suggesting that these RBPs function in discrete ribonucleoprotein particles (RNPs) that may regulate distinct mRNAs.

    more » « less
  5. Abstract

    Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.

    more » « less