skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Smar2C2: A Simple and Efficient Protocol for the Identification of Transcription Start Sites
Abstract Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non‐coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5′ UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5′ end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5′ and 3′ end of the full‐length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full‐length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Splint generation Basic Protocol 2: RNA extraction Basic Protocol 3: cDNA synthesis Basic Protocol 4: cDNA circularization and amplification Basic Protocol 5: Library generation  more » « less
Award ID(s):
1856627
PAR ID:
10403626
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Current Protocols
Volume:
3
Issue:
3
ISSN:
2691-1299
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Regulation of gene expression is a fundamental biological process that relies on transcription factors (TF) recognizing specific cis motifs in the regulatory regions of the genes that they control. In most eukaryotic organisms, cis-regulatory elements are significantly enriched around the transcription start site (TSS). However, different from other genic features, TSSs need to be experimentally determined, becoming then important components of genome annotations. One of the methods for experimentally determining TSSs at the genome-wide level is CAGE (cap analysis of gene expression). This chapter describes how to prepare a CAGE library for sequencing, starting with RNA extraction, library construction, and quality controls before proceed to sequencing in the Illumina platform. We then describe how to use a computational pipeline to determine, from the alignment of CAGE tags, the genome-wide location of TSSs, followed with statistical approaches required to cluster TSSs that operate as transcriptional units, and to determine core promoter properties such as shape. The analyses described here focus on maize, since its large and yet deficiently annotated genome creates some unique challenges, but with some modifications can be easily adopted for other organisms as well. 
    more » « less
  2. Abstract Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. However, it remains unclear how non-canonical sequence motifs collectively control transcription rates. Here, we combine massively parallel assays, biophysics, and machine learning to develop a 346-parameter model that predicts site-specific transcription initiation rates for any σ70promoter sequence, validated across 22132 bacterial promoters with diverse sequences. We apply the model to predict genetic context effects, design σ70promoters with desired transcription rates, and identify undesired promoters inside engineered genetic systems. The model provides a biophysical basis for understanding gene regulation in natural genetic systems and precise transcriptional control for engineering synthetic genetic systems. 
    more » « less
  3. Polen, Tino (Ed.)
    ABSTRACT Regulation of gene expression is a vital component of cellular biology. Transcription factor proteins often bind regulatory DNA sequences upstream of transcription start sites to facilitate the activation or repression of RNA polymerase. Research laboratories have devoted many projects to understanding the transcription regulatory networks for transcription factors, as these regulated genes provide critical insight into the biology of the host organism. Various in vivo and in vitro assays have been developed to elucidate transcription regulatory networks. Several assays, including SELEX-seq and ChIP-seq, capture DNA-bound transcription factors to determine the preferred DNA-binding sequences, which can then be mapped to the host organism’s genome to identify candidate regulatory genes. In this protocol, we describe an alternative in vitro , iterative selection approach to ascertaining DNA-binding sequences of a transcription factor of interest using restriction endonuclease, protection, selection, and amplification (REPSA). Contrary to traditional antibody-based capture methods, REPSA selects for transcription factor-bound DNA sequences by challenging binding reactions with a type IIS restriction endonuclease. Cleavage-resistant DNA species are amplified by PCR and then used as inputs for the next round of REPSA. This process is repeated until a protected DNA species is observed by gel electrophoresis, which is an indication of a successful REPSA experiment. Subsequent high-throughput sequencing of REPSA-selected DNAs accompanied by motif discovery and scanning analyses can be used for determining transcription factor consensus binding sequences and potential regulated genes, providing critical first steps in determining organisms’ transcription regulatory networks. IMPORTANCE Transcription regulatory proteins are an essential class of proteins that help maintain cellular homeostasis by adapting the transcriptome based on environmental cues. Dysregulation of transcription factors can lead to diseases such as cancer, and many eukaryotic and prokaryotic transcription factors have become enticing therapeutic targets. Additionally, in many understudied organisms, the transcription regulatory networks for uncharacterized transcription factors remain unknown. As such, the need for experimental techniques to establish transcription regulatory networks is paramount. Here, we describe a step-by-step protocol for REPSA, an inexpensive, iterative selection technique to identify transcription factor-binding sequences without the need for antibody-based capture methods. 
    more » « less
  4. Abstract Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5′end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr. 
    more » « less
  5. Abstract This article contains detailed synthetic protocols for preparation of 5‐cyanomethyluridine (cnm5U) and 5‐cyanouridine (cn5U) phosphoramidites. The synthesis of the cnm5U phosphoramidite building block starts with commercially available 5‐methyluridine (m5C), followed by bromination of the 5‐methyl group to install the cyano moiety using TMSCN/TBAF. The cn5U phosphoramidite is obtained by regular Vorbrüggen glycosylation of the protected ribofuranose with silylated 5‐cyanouracil. These two modified phosphoramidites are suitable for synthesis of RNA oligonucleotides on solid phase using conventional amidite chemistry. Our protocol provides access to two novel building blocks for constructing RNA‐based therapeutics. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Preparation of cnm5U and cn5U phosphoramidites Basic Protocol 2: Synthesis, purification, and characterization of cnm5U‐ and cn5U‐modified RNA oligonucleotides 
    more » « less