skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Structural analysis of MALAT1 long noncoding RNA in cells and in evolution
Although not canonically polyadenylated, the long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is stabilized by a highly conserved 76-nt triple helix structure on its 3′ end. The entire MALAT1 transcript is over 8000 nt long in humans. The strongest structural conservation signal in MALAT1 (as measured by covariation of base pairs) is in the triple helix structure. Primary sequence analysis of covariation alone does not reveal the degree of structural conservation of the entire full-length transcript, however. Furthermore, RNA structure is often context dependent; RNA binding proteins that are differentially expressed in different cell types may alter structure. We investigate here the in-cell and cell-free structures of the full-length human and green monkey (Chlorocebus sabaeus) MALAT1 transcripts in multiple tissue-derived cell lines using SHAPE chemical probing. Our data reveal levels of uniform structural conservation in different cell lines, in cells and cell-free, and even between species, despite significant differences in primary sequence. The uniformity of the structural conservation across the entire transcript suggests that, despite seeing covariation signals only in the triple helix junction of the lncRNA, the rest of the transcript's structure is remarkably conserved, at least in primates and across multiple cell types and conditions.  more » « less
Award ID(s):
2014943
PAR ID:
10483878
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Cold Spring Harbor Laboratory Press
Date Published:
Journal Name:
RNA
Volume:
29
Issue:
5
ISSN:
1355-8382
Page Range / eLocation ID:
691-704
Subject(s) / Keyword(s):
green monkey MALAT1 primates RNA structure SHAPE
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Small molecule-based modulation of a triple helix in the long non-coding RNA metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) has been proposed as an attractive avenue for cancer treatment and a model system for understanding small molecule:RNA recognition. To elucidate fundamental recognition principles and structure–function relationships, we designed and synthesized nine novel analogs of a diphenylfuran-based small molecule DPFp8, a previously identified lead binder of MALAT1. We investigated the role of recognition modalities in binding and in silico studies along with the relationship between affinity, stability and in vitro enzymatic degradation of the triple helix. Specifically, molecular docking studies identified patterns driving affinity and selectivity, including limited ligand flexibility, as observed by ligand preorganization and 3D shape complementarity for the binding pocket. The use of differential scanning fluorimetry allowed rapid evaluation of ligand-induced thermal stabilization of the triple helix, which correlated with decreased in vitro degradation of this structure by the RNase R exonuclease. The magnitude of stabilization was related to binding mode and selectivity between the triple helix and its precursor stem loop structure. Together, this work demonstrates the value of scaffold-based libraries in revealing recognition principles and of raising broadly applicable strategies, including functional assays, for small molecule–RNA targeting. 
    more » « less
  2. Dutch, Rebecca Ellis. (Ed.)
    ABSTRACT Opium poppy mosaic virus (OPMV) is a recently discovered umbravirus in the family Tombusviridae . OPMV has a plus-sense genomic RNA (gRNA) of 4,241 nucleotides (nt) from which replication protein p35 and p35 extension product p98, the RNA-dependent RNA polymerase (RdRp), are expressed. Movement proteins p27 (long distance) and p28 (cell to cell) are expressed from a 1,440-nt subgenomic RNA (sgRNA2). A highly conserved structure was identified just upstream from the sgRNA2 transcription start site in all umbraviruses, which includes a carmovirus consensus sequence, denoting generation by an RdRp-mediated mechanism. OPMV also has a second sgRNA of 1,554 nt (sgRNA1) that starts just downstream of a canonical exoribonuclease-resistant sequence (xrRNA D ). sgRNA1 codes for a 30-kDa protein in vitro that is in frame with p28 and cannot be synthesized in other umbraviruses. Eliminating sgRNA1 or truncating the p30 open reading frame (ORF) without affecting p28 substantially reduced accumulation of OPMV gRNA, suggesting a functional role for the protein. The 652-nt 3′ untranslated region of OPMV contains two 3′ cap-independent translation enhancers (3′ CITEs), a T-shaped structure (TSS) near its 3′ end, and a Barley yellow dwarf virus -like translation element (BTE) in the central region. Only the BTE is functional in luciferase reporter constructs containing gRNA or sgRNA2 5′ sequences in vivo , which differs from how umbravirus 3′ CITEs were used in a previous study. Similarly to most 3′ CITEs, the OPMV BTE links to the 5′ end via a long-distance RNA-RNA interaction. Analysis of 14 BTEs revealed additional conserved sequences and structural features beyond the previously identified 17-nt conserved sequence. IMPORTANCE Opium poppy mosaic virus (OPMV) is an umbravirus in the family Tombusviridae . We determined that OPMV accumulates two similarly sized subgenomic RNAs (sgRNAs), with the smaller known to code for proteins expressed from overlapping open reading frames. The slightly larger sgRNA1 has a 5′ end just upstream from a previously predicted xrRNA D site, identifying this sgRNA as an unusually long product produced by exoribonuclease trimming. Although four umbraviruses have similar predicted xrRNA D sites, only sgRNA1 of OPMV can code for a protein that is an extension product of umbravirus ORF4. Inability to generate the sgRNA or translate this protein was associated with reduced gRNA accumulation in vivo . We also characterized the OPMV BTE structure, a 3′ cap-independent translation enhancer (3′ CITE). Comparisons of 13 BTEs with the OPMV BTE revealed additional stretches of sequence similarity beyond the 17-nt signature sequence, as well as conserved structural features not previously recognized in these 3′ CITEs. 
    more » « less
  3. The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5 ′ and 3 ′ untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics. 
    more » « less
  4. Enteroviruses comprise a significant class of pathogens that cause various human diseases ranging from the common cold to poliomyelitis, acute flaccid paralysis, and myocarditis. The enteroviral (+)-strand RNA genome replication, essential for viral proliferation, has been proposed to depend on RNA structures at the genome’s extreme 5ʹ end. Such replication-linked RNAs (REPLRs) recruit essential proteins, the host poly-C binding protein 2 (PCBP2) and viral 3CD protein (precursor of the viral protease 3C and RNA-dependent RNA polymerase D) during genome replication, but the tertiary structures and mechanisms of the enteroviral REPLRs are mainly unknown. Recently, we have determined the crystal structures of CVB3, RVB14, and RVC15 REPLRs, revealing their highly conserved H-type four-way junction folds with co-axially stacked subdomains. The sA helix stacks on the sD helix, the sB helix on the sC helix, and the structure forms a unique long-range A•C•U base-triple between the sC- loop and the sD-helix. These conserved features enabled us to perform the structural prediction of additional enteroviral REPLRs through a homology modeling approach. The structure-guided binding studies with viral 3C revealed its primary binding site being the sD tetra-loop and a dinucleotide bulge. Moreover, the human PCBP2 binding studies revealed two binding sites for this protein – the sB loop and 3ʹ spacer, which collectively bind a single PCBP2 cooperatively. We also showed that the A•C•U base-triple disruption did not affect the 3C binding but did abrogate the PCBP2 interactions with the REPLR, suggesting a crucial role of this tertiary interaction in positioning the 3C and PCBP2 binding sites within the enteroviral REPLRs. Furthermore, oligonucleotides complementary to the spacer region diminished REPLR-PCBP2 interactions, highlighting the crucial function of this single-stranded segment in recruiting PCBP2. This insight sheds light on the potential for developing therapeutics to combat enteroviral infections by targeting this replication platform. 
    more » « less
  5. RNA macromolecules, like proteins, fold to assume shapes that are intimately connected to their broadly recognized biological functions; however, because of their high charge and dynamic nature, RNA structures are far more challenging to determine. We introduce an approach that exploits the high brilliance of x-ray free-electron laser sources to reveal the formation and ready identification of angstrom-scale features in structured and unstructured RNAs. Previously unrecognized structural signatures of RNA secondary and tertiary structures are identified through wide-angle solution scattering experiments. With millisecond time resolution, we observe an RNA fold from a dynamically varying single strand through a base-paired intermediate to assume a triple-helix conformation. While the backbone orchestrates the folding, the final structure is locked in by base stacking. This method may help to rapidly characterize and identify structural elements in nucleic acids in both equilibrium and time-resolved experiments. 
    more » « less