skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 AM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: Single-cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis
Abstract Single-cell RNA sequencing is a powerful technique that continues to expand across various biological applications. However, incomplete 3′-UTR annotations can impede single-cell analysis resulting in genes that are partially or completely uncounted. Performing single-cell RNA sequencing with incomplete 3′-UTR annotations can hinder the identification of cell identities and gene expression patterns and lead to erroneous biological inferences. We demonstrate that performing single-cell isoform sequencing in tandem with single-cell RNA sequencing can rapidly improve 3′-UTR annotations. Using threespine stickleback fish (Gasterosteus aculeatus), we show that gene models resulting from a minimal embryonic single-cell isoform sequencing dataset retained 26.1% greater single-cell RNA sequencing reads than gene models from Ensembl alone. Furthermore, pooling our single-cell sequencing isoforms with a previously published adult bulk Iso-Seq dataset from stickleback, and merging the annotation with the Ensembl gene models, resulted in a marginal improvement (+0.8%) over the single-cell isoform sequencing only dataset. In addition, isoforms identified by single-cell isoform sequencing included thousands of new splicing variants. The improved gene models obtained using single-cell isoform sequencing led to successful identification of cell types and increased the reads identified of many genes in our single-cell RNA sequencing stickleback dataset. Our work illuminates single-cell isoform sequencing as a cost-effective and efficient mechanism to rapidly annotate genomes for single-cell RNA sequencing.  more » « less
Award ID(s):
2015301
PAR ID:
10364150
Author(s) / Creator(s):
 ;  ;  ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Genetics
Volume:
220
Issue:
3
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Alternate isoforms are important contributors to phenotypic diversity across eukaryotes. Although short-read RNA-sequencing has increased our understanding of isoform diversity, it is challenging to accurately detect full-length transcripts, preventing the identification of many alternate isoforms. Long-read sequencing technologies have made it possible to sequence full-length alternative transcripts, accurately characterizing alternative splicing events, alternate transcription start and end sites, and differences in UTR regions. Here, we use Pacific Biosciences (PacBio) long-read RNA-sequencing (Iso-Seq) to examine the transcriptomes of five organs in threespine stickleback fish ( Gasterosteus aculeatus ), a widely used genetic model species. The threespine stickleback fish has a refined genome assembly in which gene annotations are based on short-read RNA sequencing and predictions from coding sequence of other species. This suggests some of the existing annotations may be inaccurate or alternative transcripts may not be fully characterized. Using Iso-Seq we detected thousands of novel isoforms, indicating many isoforms are absent in the current Ensembl gene annotations. In addition, we refined many of the existing annotations within the genome. We noted many improperly positioned transcription start sites that were refined with long-read sequencing. The Iso-Seq-predicted transcription start sites were more accurate and verified through ATAC-seq. We also detected many alternative splicing events between sexes and across organs. We found a substantial number of genes in both somatic and gonadal samples that had sex-specific isoforms. Our study highlights the power of long-read sequencing to study the complexity of transcriptomes, greatly improving genomic resources for the threespine stickleback fish. 
    more » « less
  2. Abstract BackgroundThe eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3′-untranslated region (3′-UTR) of mRNA produces transcripts with shorter or longer 3′-UTR. Often, 3′-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3′-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3′-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3′-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3′-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations. MethodsAPA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3′-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3′-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3′-UTR annotation and read coverage on the 3′-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user’s manual are freely available athttps://github.com/compbiolabucf/APA-Scan. ResultAPA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3′-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3′-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3′-UTR APA events and improve genome annotation. ConclusionAPA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3′-UTR APA events. The pipeline integrates both RNA-seq and 3′-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots. 
    more » « less
  3. The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3’-untranslated region (3’-UTR) of mRNA produces transcripts with shorter 3’-UTR. Often, 3’-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3’-UTR APA provides a means to regulate gene expression at the post-transcriptional level and is known to promote translation. Current bioinformatics pipelines have limited capability in profiling 3’-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3’-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3’-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations. APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3’-UTR transcripts in the RNA-seq data. The performance of APA-Scan was validated by qPCR. 
    more » « less
  4. Single-cell RNA sequencing (scRNAseq) is rapidly advancing our understanding of cellular composition within complex tissues and organisms. A major limitation in most scRNAseq analysis pipelines is the reliance on manual annotations to determine cell identities, which are time consuming, subjective, and require expertise. Given the surge in cell sequencing, supervised methods–especially deep learning models–have been developed for automatic cell type identification (ACTI), which achieve high accuracy and scalability. However, all existing deep learning frameworks for ACTI lack interpretability and are used as “black-box” models. We present N-ACT (Neural-Attention for Cell Type identification): the first-of-its-kind interpretable deep neural network for ACTI utilizing neural attention to detect salient genes for use in cell-types identification. We compare N-ACT to conventional annotation methods on two previously manually annotated data sets, demonstrating that N-ACT accurately identifies marker genes and cell types in an unsupervised manner, while performing comparably on multiple data sets to current state-of-the-art model in traditional supervised ACTI. 
    more » « less
  5. Abstract Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics. 
    more » « less