skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protocol for fast scRNA-seq raw data processing using scKB and non-arbitrary quality control with COPILOT
We describe a protocol to perform fast and non-arbitrary quality control of single-cell RNA sequencing (scRNA-seq) raw data using scKB and COPILOT. scKB is a wrapper script of kallisto and bustools for accelerated alignment and transcript count matrix generation, which runs significantly faster than the popular tool Cell Ranger. COPILOT then offers non-arbitrary background noise removal by comparing distributions of low-quality and high-quality cells. Together, this protocol streamlines the processing workflow and provides an easy entry for new scRNA-seq users. For complete details on the use and execution of this protocol, please refer to Shahan et al. (2022).  more » « less
Award ID(s):
2010686
PAR ID:
10470863
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ScienceDirect
Date Published:
Journal Name:
STAR Protocols
Volume:
3
Issue:
4
ISSN:
2666-1667
Page Range / eLocation ID:
101729
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationSingle-cell RNA sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. ResultsHere we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and 6 different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experimental design and comparison of scRNA–seq computational methods based on specific research goals. Availability and implementationWe have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract It is a challenging task to integrate scRNA-seq and scATAC-seq data obtained from different batches. Existing methods tend to use a pre-defined gene activity matrix to convert the scATAC-seq data into scRNA-seq data. The pre-defined gene activity matrix is often of low quality and does not reflect the dataset-specific relationship between the two data modalities. We propose scDART, a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. Specifically, the design of scDART allows it to preserve cell trajectories in continuous cell populations and can be applied to trajectory inference on integrated data. 
    more » « less
  3. BackgroundUnderstanding genetic underpinnings of immune-mediated inflammatory diseases is crucial to improve treatments. Single-cell RNA sequencing (scRNA-seq) identifies cell states expanded in disease, but often overlooks genetic causality due to cost and small genotyping cohorts. Conversely, large genome-wide association studies (GWAS) are commonly accessible. MethodsWe present a 3-step robust benchmarking analysis of integrating GWAS and scRNA-seq to identify genetically relevant cell states and genes in inflammatory diseases. First, we applied and compared the results of three recent algorithms, based on pathways (scGWAS), single-cell disease scores (scDRS), or both (scPagwas), according to accuracy/sensitivity and interpretability. While previous studies focused on coarse cell types, we used disease-specific, fine-grained single-cell atlases (183,742 and 228,211 cells) and GWAS data (Ns of 97,173 and 45,975) for rheumatoid arthritis (RA) and ulcerative colitis (UC). Second, given the lack of scRNA-seq for many diseases with GWAS, we further tested the tools’ resolution limits by differentiating between similar diseases with only one fine-grained scRNA-seq atlas. Lastly, we provide a novel evaluation of noncoding SNP incorporation methods by testing which enabled the highest sensitivity/accuracy of known cell-state calls. ResultsWe first found that single-cell based tools scDRS and scPagwas called superior numbers of supported cell states that were overlooked by scGWAS. While scGWAS and scPagwas were advantageous for gene exploration, scDRS effectively accounted for batch effect and captured cellular heterogeneity of disease-relevance without single-cell genotyping. For noncoding SNP integration, we found a key trade-off between statistical power and confidence with positional (e.g. MAGMA) and non-positional approaches (e.g. chromatin-interaction, eQTL). Even when directly incorporating noncoding SNPs through 5’ scRNA-seq measures of regulatory elements, non disease-specific atlases gave misleading results by not containing disease-tissue specific transcriptomic patterns. Despite this criticality of tissue-specific scRNA-seq, we showed that scDRS enabled deconvolution of two similar diseases with a single fine-grained scRNA-seq atlas and separate GWAS. Indeed, we identified supported and novel genetic-phenotype linkages separating RA and ankylosing spondylitis, and UC and crohn’s disease. Overall, while noting evolving single-cell technologies, our study provides key findings for integrating expanding fine-grained scRNA-seq, GWAS, and noncoding SNP resources to unravel the complexities of inflammatory diseases. 
    more » « less
  4. During mammalian development, the left and right ventricles arise from early populations of cardiac progenitors known as the first and second heart fields, respectively. While these populations have been extensively studied in non-human model systems, their identification and study in vivo human tissues have been limited due to the ethical and technical limitations of accessing gastrulation-stage human embryos. Human-induced pluripotent stem cells (hiPSCs) present an exciting alternative for modeling early human embryogenesis due to their well-established ability to differentiate into all embryonic germ layers. Here, we describe the development of a TBX5/MYL2 lineage tracing reporter system that allows for the identification of FHF- progenitors and their descendants including left ventricular cardiomyocytes. Furthermore, using single-cell RNA sequencing (scRNA-seq) with oligonucleotide-based sample multiplexing, we extensively profiled differentiating hiPSCs across 12 timepoints in two independent iPSC lines. Surprisingly, our reporter system and scRNA-seq analysis revealed a predominance of FHF differentiation using the small molecule Wnt-based 2D differentiation protocol. We compared this data with existing murine and 3D cardiac organoid scRNA-seq data and confirmed the dominance of left ventricular cardiomyocytes (>90%) in our hiPSC-derived progeny. Together, our work provides the scientific community with a powerful new genetic lineage tracing approach as well as a single-cell transcriptomic atlas of hiPSCs undergoing cardiac differentiation. 
    more » « less
  5. Abstract Numerous single‐cell transcriptomic datasets from identical tissues or cell lines are generated from different laboratories or single‐cell RNA sequencing (scRNA‐seq) protocols. The denoising of these datasets to eliminate batch effects is crucial for data integration, ensuring accurate interpretation and comprehensive analysis of biological questions. Although many scRNA‐seq data integration methods exist, most are inefficient and/or not conducive to downstream analysis. Here, DeepBID, a novel deep learning‐based method for batch effect correction, non‐linear dimensionality reduction, embedding, and cell clustering concurrently, is introduced. DeepBID utilizes a negative binomial‐based autoencoder with dual Kullback–Leibler divergence loss functions, aligning cell points from different batches within a consistent low‐dimensional latent space and progressively mitigating batch effects through iterative clustering. Extensive validation on multiple‐batch scRNA‐seq datasets demonstrates that DeepBID surpasses existing tools in removing batch effects and achieving superior clustering accuracy. When integrating multiple scRNA‐seq datasets from patients with Alzheimer's disease, DeepBID significantly improves cell clustering, effectively annotating unidentified cells, and detecting cell‐specific differentially expressed genes. 
    more » « less