skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, November 14 until 2:00 AM ET on Saturday, November 15 due to maintenance. We apologize for the inconvenience.


Title: Structural analysis of MALAT1 long noncoding RNA in cells and in evolution
Although not canonically polyadenylated, the long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is stabilized by a highly conserved 76-nt triple helix structure on its 3′ end. The entire MALAT1 transcript is over 8000 nt long in humans. The strongest structural conservation signal in MALAT1 (as measured by covariation of base pairs) is in the triple helix structure. Primary sequence analysis of covariation alone does not reveal the degree of structural conservation of the entire full-length transcript, however. Furthermore, RNA structure is often context dependent; RNA binding proteins that are differentially expressed in different cell types may alter structure. We investigate here the in-cell and cell-free structures of the full-length human and green monkey (Chlorocebus sabaeus) MALAT1 transcripts in multiple tissue-derived cell lines using SHAPE chemical probing. Our data reveal levels of uniform structural conservation in different cell lines, in cells and cell-free, and even between species, despite significant differences in primary sequence. The uniformity of the structural conservation across the entire transcript suggests that, despite seeing covariation signals only in the triple helix junction of the lncRNA, the rest of the transcript's structure is remarkably conserved, at least in primates and across multiple cell types and conditions.  more » « less
Award ID(s):
2014943
PAR ID:
10483878
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Cold Spring Harbor Laboratory Press
Date Published:
Journal Name:
RNA
Volume:
29
Issue:
5
ISSN:
1355-8382
Page Range / eLocation ID:
691-704
Subject(s) / Keyword(s):
green monkey MALAT1 primates RNA structure SHAPE
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Small molecule-based modulation of a triple helix in the long non-coding RNA metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) has been proposed as an attractive avenue for cancer treatment and a model system for understanding small molecule:RNA recognition. To elucidate fundamental recognition principles and structure–function relationships, we designed and synthesized nine novel analogs of a diphenylfuran-based small molecule DPFp8, a previously identified lead binder of MALAT1. We investigated the role of recognition modalities in binding and in silico studies along with the relationship between affinity, stability and in vitro enzymatic degradation of the triple helix. Specifically, molecular docking studies identified patterns driving affinity and selectivity, including limited ligand flexibility, as observed by ligand preorganization and 3D shape complementarity for the binding pocket. The use of differential scanning fluorimetry allowed rapid evaluation of ligand-induced thermal stabilization of the triple helix, which correlated with decreased in vitro degradation of this structure by the RNase R exonuclease. The magnitude of stabilization was related to binding mode and selectivity between the triple helix and its precursor stem loop structure. Together, this work demonstrates the value of scaffold-based libraries in revealing recognition principles and of raising broadly applicable strategies, including functional assays, for small molecule–RNA targeting. 
    more » « less
  2. Dutch, Rebecca Ellis. (Ed.)
    ABSTRACT Opium poppy mosaic virus (OPMV) is a recently discovered umbravirus in the family Tombusviridae . OPMV has a plus-sense genomic RNA (gRNA) of 4,241 nucleotides (nt) from which replication protein p35 and p35 extension product p98, the RNA-dependent RNA polymerase (RdRp), are expressed. Movement proteins p27 (long distance) and p28 (cell to cell) are expressed from a 1,440-nt subgenomic RNA (sgRNA2). A highly conserved structure was identified just upstream from the sgRNA2 transcription start site in all umbraviruses, which includes a carmovirus consensus sequence, denoting generation by an RdRp-mediated mechanism. OPMV also has a second sgRNA of 1,554 nt (sgRNA1) that starts just downstream of a canonical exoribonuclease-resistant sequence (xrRNA D ). sgRNA1 codes for a 30-kDa protein in vitro that is in frame with p28 and cannot be synthesized in other umbraviruses. Eliminating sgRNA1 or truncating the p30 open reading frame (ORF) without affecting p28 substantially reduced accumulation of OPMV gRNA, suggesting a functional role for the protein. The 652-nt 3′ untranslated region of OPMV contains two 3′ cap-independent translation enhancers (3′ CITEs), a T-shaped structure (TSS) near its 3′ end, and a Barley yellow dwarf virus -like translation element (BTE) in the central region. Only the BTE is functional in luciferase reporter constructs containing gRNA or sgRNA2 5′ sequences in vivo , which differs from how umbravirus 3′ CITEs were used in a previous study. Similarly to most 3′ CITEs, the OPMV BTE links to the 5′ end via a long-distance RNA-RNA interaction. Analysis of 14 BTEs revealed additional conserved sequences and structural features beyond the previously identified 17-nt conserved sequence. IMPORTANCE Opium poppy mosaic virus (OPMV) is an umbravirus in the family Tombusviridae . We determined that OPMV accumulates two similarly sized subgenomic RNAs (sgRNAs), with the smaller known to code for proteins expressed from overlapping open reading frames. The slightly larger sgRNA1 has a 5′ end just upstream from a previously predicted xrRNA D site, identifying this sgRNA as an unusually long product produced by exoribonuclease trimming. Although four umbraviruses have similar predicted xrRNA D sites, only sgRNA1 of OPMV can code for a protein that is an extension product of umbravirus ORF4. Inability to generate the sgRNA or translate this protein was associated with reduced gRNA accumulation in vivo . We also characterized the OPMV BTE structure, a 3′ cap-independent translation enhancer (3′ CITE). Comparisons of 13 BTEs with the OPMV BTE revealed additional stretches of sequence similarity beyond the 17-nt signature sequence, as well as conserved structural features not previously recognized in these 3′ CITEs. 
    more » « less
  3. The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5 ′ and 3 ′ untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics. 
    more » « less
  4. RNA macromolecules, like proteins, fold to assume shapes that are intimately connected to their broadly recognized biological functions; however, because of their high charge and dynamic nature, RNA structures are far more challenging to determine. We introduce an approach that exploits the high brilliance of x-ray free-electron laser sources to reveal the formation and ready identification of angstrom-scale features in structured and unstructured RNAs. Previously unrecognized structural signatures of RNA secondary and tertiary structures are identified through wide-angle solution scattering experiments. With millisecond time resolution, we observe an RNA fold from a dynamically varying single strand through a base-paired intermediate to assume a triple-helix conformation. While the backbone orchestrates the folding, the final structure is locked in by base stacking. This method may help to rapidly characterize and identify structural elements in nucleic acids in both equilibrium and time-resolved experiments. 
    more » « less
  5. null (Ed.)
    Telomerase is essential for maintaining telomere integrity. Although telomerase function is widely conserved, the integral telomerase RNA (TR) that provides a template for telomeric DNA synthesis has diverged dramatically. Nevertheless, TR molecules retain 2 highly conserved structural domains critical for catalysis: a template-proximal pseudoknot (PK) structure and a downstream stem-loop structure. Here we introduce the authentic TR from the plant Arabidopsis thaliana , called AtTR, identified through next-generation sequencing of RNAs copurifying with Arabidopsis TERT. This RNA is distinct from the RNA previously described as the templating telomerase RNA, AtTER1. AtTR is a 268-nt Pol III transcript necessary for telomere maintenance in vivo and sufficient with TERT to reconstitute telomerase activity in vitro. Bioinformatics analysis identified 85 AtTR orthologs from 3 major clades of plants: angiosperms, gymnosperms, and lycophytes. Through phylogenetic comparisons, a secondary structure model conserved among plant TRs was inferred and verified using in vitro and in vivo chemical probing. The conserved plant TR structure contains a template-PK core domain enclosed by a P1 stem and a 3′ long-stem P4/5/6, both of which resemble a corresponding structural element in ciliate and vertebrate TRs. However, the plant TR contains additional stems and linkers within the template-PK core, allowing for expansion of PK structure from the simple PK in the smaller ciliate TR during evolution. Thus, the plant TR provides an evolutionary bridge that unites the disparate structures of previously characterized TRs from ciliates and vertebrates. 
    more » « less