NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data-driven batch detection enhances single-cell omics data analysis

https://doi.org/10.1016/j.cels.2024.09.011

Zhang, Ziqi; Zhang, Xiuwei (October 2024, Cell Systems)

Full Text Available
scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data

https://doi.org/10.1038/s41467-024-45227-w

Zhang, Ziqi; Zhao, Xinye; Bindra, Mehak; Qiu, Peng; Zhang, Xiuwei (December 2024, Nature Communications)

Abstract Single-cell RNA-sequencing (scRNA-seq) has been widely used for disease studies, where sample batches are collected from donors under different conditions including demographic groups, disease stages, and drug treatments. It is worth noting that the differences among sample batches in such a study are a mixture of technical confounders caused by batch effect and biological variations caused by condition effect. However, current batch effect removal methods often eliminate both technical batch effect and meaningful condition effect, while perturbation prediction methods solely focus on condition effect, resulting in inaccurate gene expression predictions due to unaccounted batch effect. Here we introduce scDisInFact, a deep learning framework that models both batch effect and condition effect in scRNA-seq data. scDisInFact learns latent factors that disentangle condition effect from batch effect, enabling it to simultaneously perform three tasks: batch effect removal, condition-associated key gene detection, and perturbation prediction. We evaluate scDisInFact on both simulated and real datasets, and compare its performance with baseline methods for each task. Our results demonstrate that scDisInFact outperforms existing methods that focus on individual tasks, providing a more comprehensive and accurate approach for integrating and predicting multi-batch multi-condition single-cell RNA-sequencing data.
more » « less
Full Text Available
CeSpGRN: Inferring cell-specific gene regulatory networks from single cell multi-omics and spatial data

https://doi.org/10.1101/2022.03.03.482887

Zhang, Ziqi; Han, Jongseok; Song, Le; Zhang, Xiuwei (November 2023, bioRxiv)

Abstract Single cell profiling techniques including multi-omics and spatial-omics technologies allow researchers to study cell-cell variation within a cell population. These variations extend to biological networks within cells, in particular, the gene regulatory networks (GRNs). GRNs rewire as the cells evolve, and different cells can have different governing GRNs. However, existing GRN inference methods usually infer a single GRN for a population of cells, without exploring the cell-cell variation in terms of their regulatory mechanisms. Recently, jointly profiled single cell transcriptomics and chromatin accessibility data have been used to infer GRNs. Although methods based on such multi-omics data were shown to improve over the accuracy of methods using only single cell RNA-seq (scRNA-seq) data, they do not take full advantage of the single cell resolution chromatin accessibility data. We propose CeSpGRN (CellSpecificGeneRegulatoryNetwork inference), which infers cell-specific GRNs from scRNA-seq, single cell multi-omics, or single cell spatial-omics data. CeSpGRN uses a Gaussian weighted kernel that allows the GRN of a given cell to be learned from the sequencing profile of itself and its neighboring cells in the developmental process. The kernel is constructed from the similarity of gene expressions or spatial locations between cells. When the chromatin accessibility data is available, CeSpGRN constructs cell-specific prior networks which are used to further improve the inference accuracy. We applied CeSpGRN to various types of real-world datasets and inferred various regulation changes that were shown to be important in cell development. We also quantitatively measured the performance of CeSpGRN on simulated datasets and compared with baseline methods. The results show that CeSpGRN has a superior performance in reconstructing the GRN for each cell, as well as in detecting the regulatory interactions that differ between cells. CeSpGRN is available athttps://github.com/PeterZZQ/CeSpGRN.
more » « less
Full Text Available
LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data

https://doi.org/10.1038/s41467-023-44173-3

Pan, Xinhai; Li, Hechen; Putta, Pranav; Zhang, Xiuwei (December 2023, Nature Communications)

Abstract Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities.
more » « less
CLARIFY: cell–cell interaction and gene regulatory network refinement from spatially resolved transcriptomics

https://doi.org/10.1093/bioinformatics/btad269

Bafna, Mihir; Li, Hechen; Zhang, Xiuwei (June 2023, Bioinformatics)

Abstract MotivationGene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell–cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model. ResultsWe propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputting refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks. Availability and implementationThe source code and data is available at https://github.com/MihirBafna/CLARIFY.
more » « less
Studying temporal dynamics of single cells: expression, lineage and regulatory networks

https://doi.org/10.1007/s12551-023-01090-5

Pan, Xinhai; Zhang, Xiuwei (January 2023, Biophysical Reviews)

Full Text Available
scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously

https://doi.org/10.1186/s13059-022-02706-x

Zhang, Ziqi; Yang, Chengkai; Zhang, Xiuwei (June 2022, Genome Biology)

Abstract It is a challenging task to integrate scRNA-seq and scATAC-seq data obtained from different batches. Existing methods tend to use a pre-defined gene activity matrix to convert the scATAC-seq data into scRNA-seq data. The pre-defined gene activity matrix is often of low quality and does not reflect the dataset-specific relationship between the two data modalities. We propose scDART, a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. Specifically, the design of scDART allows it to preserve cell trajectories in continuous cell populations and can be applied to trajectory inference on integrated data.
more » « less
TedSim: temporal dynamics simulation of single-cell RNA sequencing data and cell division history

https://doi.org/10.1093/nar/gkac235

Pan, Xinhai; Li, Hechen; Zhang, Xiuwei (April 2022, Nucleic Acids Research)

Abstract Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed.
more » « less
Full Text Available
Inference of high-resolution trajectories in single-cell RNA-seq data by using RNA velocity

https://doi.org/10.1016/j.crmeth.2021.100095

Zhang, Ziqi; Zhang, Xiuwei (October 2021, Cell Reports Methods)

Full Text Available
Hybrid Clustering of Single-Cell Gene Expression and Spatial Information via Integrated NMF and K-Means

https://doi.org/10.3389/fgene.2021.763263

Oh, Sooyoun; Park, Haesun; Zhang, Xiuwei (November 2021, Frontiers in Genetics)

Advances in single cell transcriptomics have allowed us to study the identity of single cells. This has led to the discovery of new cell types and high resolution tissue maps of them. Technologies that measure multiple modalities of such data add more detail, but they also complicate data integration. We offer an integrated analysis of the spatial location and gene expression profiles of cells to determine their identity. We propose scHybridNMF (single-cell Hybrid Nonnegative Matrix Factorization), which performs cell type identification by combining sparse nonnegative matrix factorization (sparse NMF) with k-means clustering to cluster high-dimensional gene expression and low-dimensional location data. We show that, under multiple scenarios, including the cases where there is a small number of genes profiled and the location data is noisy, scHybridNMF outperforms sparse NMF, k-means, and an existing method that uses a hidden Markov random field to encode cell location and gene expression data for cell type identification.
more » « less
Full Text Available

« Prev Next »

Search for: All records