skip to main content


Title: TSSr: an R package for comprehensive analyses of TSS sequencing data
Abstract

Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5′end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr.

 
more » « less
Award ID(s):
1951332
NSF-PAR ID:
10307831
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
NAR Genomics and Bioinformatics
Volume:
3
Issue:
4
ISSN:
2631-9268
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The transcription initiation landscape of eukaryotic genes is complex and highly dynamic. In eukaryotes, genes can generate multiple transcript variants that differ in 5′ boundaries due to usages of alternative transcription start sites (TSSs), and the abundance of transcript isoforms are highly variable. Due to a large number and complexity of the TSSs, it is not feasible to depict details of transcript initiation landscape of all genes using text-format genome annotation files. Therefore, it is necessary to provide data visualization of TSSs to represent quantitative TSS maps and the core promoters (CPs). In addition, the selection and activity of TSSs are influenced by various factors, such as transcription factors, chromatin remodeling and histone modifications. Thus, integration and visualization of functional genomic data related to these features could provide a better understanding of the gene promoter architecture and regulatory mechanism of transcription initiation. Yeast species play important roles for the research and human society, yet no database provides visualization and integration of functional genomic data in yeast. Here, we generated quantitative TSS maps for 12 important yeast species, inferred their CPs and built a public database, YeasTSS (www.yeastss.org). YeasTSS was designed as a central portal for visualization and integration of the TSS maps, CPs and functional genomic data related to transcription initiation in yeast. YeasTSS is expected to benefit the research community and public education for improving genome annotation, studies of promoter structure, regulated control of transcription initiation and inferring gene regulatory network. 
    more » « less
  2. null (Ed.)
    Regulation of gene expression starts from the transcription initiation. Regulated transcription initiation is critical for generating correct transcripts with proper abundance. The impact of epigenetic control, such as histone modifications and chromatin remodelling, on gene regulation has been extensively investigated, but their specific role in regulating transcription initiation is far from well understood. Here we aimed to better understand the roles of genes involved in histone H3 methylations and chromatin remodelling on the regulation of transcription initiation at a genome-scale using the budding yeast as a study system. We obtained and compared maps of transcription start site (TSS) at single-nucleotide resolution by nAnT-iCAGE for a strain with depletion of MINC (Mot1-Ino80C-Nc2) by Mot1p and Ino80p anchor-away (Mot1&Ino80AA) and a strain with loss of histone methylation (set1Δset2Δdot1Δ) to their wild-type controls. Our study showed that the depletion of MINC stimulated transcription initiation from many new sites flanking the dominant TSS of genes, while the loss of histone methylation generates more TSSs in the coding region. Moreover, the depletion of MINC led to less confined boundaries of TSS clusters (TCs) and resulted in broader core promoters, and such patterns are not present in the ssdΔ mutant. Our data also exhibits that the MINC has distinctive impacts on TATA-containing and TATA-less promoters. In conclusion, our study shows that MINC is required for accurate identification of bona fide TSSs, particularly in TATA-containing promoters, and histone methylation contributes to the repression of transcription initiation in coding regions. 
    more » « less
  3. Abstract

    Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non‐coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5′ UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5′ end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5′ and 3′ end of the full‐length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full‐length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.

    Basic Protocol 1: Splint generation

    Basic Protocol 2: RNA extraction

    Basic Protocol 3: cDNA synthesis

    Basic Protocol 4: cDNA circularization and amplification

    Basic Protocol 5: Library generation

     
    more » « less
  4. Faust, Karoline (Ed.)
    ABSTRACT Much of our knowledge of bacterial transcription initiation has been derived from studying the promoters of Escherichia coli and Bacillus subtilis . Given the expansive diversity across the bacterial phylogeny, it is unclear how much of this knowledge can be applied to other organisms. Here, we report on bioinformatic analyses of promoter sequences of the primary σ factor (σ 70 ) by leveraging publicly available transcription start site (TSS) sequencing data sets for nine bacterial species spanning five phyla. This analysis identifies previously unreported differences in the −35 and −10 elements of σ 70 -dependent promoters in several groups of bacteria. We found that Actinobacteria and Betaproteobacteria σ 70 -dependent promoters lack the TTG triad in their −35 element, which is predicted to be conserved across the bacterial phyla. In addition, the majority of the Alphaproteobacteria σ 70 -dependent promoters analyzed lacked the thymine at position −7 that is highly conserved in other phyla. Bioinformatic examination of the Alphaproteobacteria σ 70 -dependent promoters identifies a significant overrepresentation of essential genes and ones encoding proteins with common cellular functions downstream of promoters containing an A, C, or G at position −7. We propose that transcription of many σ 70 -dependent promoters in Alphaproteobacteria depends on the transcription factor CarD, which is an essential protein in several members of this phylum. Our analysis expands the knowledge of promoter architecture across the bacterial phylogeny and provides new information that can be used to engineer bacteria for use in medical, environmental, agricultural, and biotechnological processes. IMPORTANCE Transcription of DNA to RNA by RNA polymerase is essential for cells to grow, develop, and respond to stress. Understanding the process and control of transcription is important for health, disease, the environment, and biotechnology. Decades of research on a few bacteria have identified promoter DNA sequences that are recognized by the σ subunit of RNA polymerase. We used bioinformatic analyses to reveal previously unreported differences in promoter DNA sequences across the bacterial phylogeny. We found that many Actinobacteria and Betaproteobacteria promoters lack a sequence in their −35 DNA recognition element that was previously assumed to be conserved and that Alphaproteobacteria lack a thymine residue at position −7, also previously assumed to be conserved. Our work reports important new information about bacterial transcription, illustrates the benefits of studying bacteria across the phylogenetic tree, and proposes new lines of future investigation. 
    more » « less
  5. Regulation of gene expression is a fundamental biological process that relies on transcription factors (TF) recognizing specific cis motifs in the regulatory regions of the genes that they control. In most eukaryotic organisms, cis-regulatory elements are significantly enriched around the transcription start site (TSS). However, different from other genic features, TSSs need to be experimentally determined, becoming then important components of genome annotations. One of the methods for experimentally determining TSSs at the genome-wide level is CAGE (cap analysis of gene expression). This chapter describes how to prepare a CAGE library for sequencing, starting with RNA extraction, library construction, and quality controls before proceed to sequencing in the Illumina platform. We then describe how to use a computational pipeline to determine, from the alignment of CAGE tags, the genome-wide location of TSSs, followed with statistical approaches required to cluster TSSs that operate as transcriptional units, and to determine core promoter properties such as shape. The analyses described here focus on maize, since its large and yet deficiently annotated genome creates some unique challenges, but with some modifications can be easily adopted for other organisms as well. 
    more » « less