skip to main content


Search for: All records

Creators/Authors contains: "Cang, Zixuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.

     
    more » « less
  2. High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules. 
    more » « less
  3. Spatial transcriptomic technologies and spatially annotated single-cell RNA sequencing datasets provide unprecedented opportunities to dissect cell–cell communication (CCC). However, incorporation of the spatial information and complex biochemical processes required in the reconstruction of CCC remains a major challenge. Here, we present COMMOT (COMMunication analysis by Optimal Transport) to infer CCC in spatial transcriptomics, which accounts for the competition between different ligand and receptor species as well as spatial distances between cells. A collective optimal transport method is developed to handle complex molecular interactions and spatial constraints. Furthermore, we introduce downstream analysis tools to infer spatial signaling directionality and genes regulated by signaling using machine learning models. We apply COMMOT to simulation data and eight spatial datasets acquired with five different technologies to show its effectiveness and robustness in identifying spatial CCC in data with varying spatial resolutions and gene coverages. Finally, COMMOT identifies new CCCs during skin morphogenesis in a case study of human epidermal development. 
    more » « less
  4. Abstract

    One major challenge in analyzing spatial transcriptomic datasets is to simultaneously incorporate the cell transcriptome similarity and their spatial locations. Here, we introduce SpaceFlow, which generates spatially-consistent low-dimensional embeddings by incorporating both expression similarity and spatial information using spatially regularized deep graph networks. Based on the embedding, we introduce a pseudo-Spatiotemporal Map that integrates the pseudotime concept with spatial locations of the cells to unravel spatiotemporal patterns of cells. By comparing with multiple existing methods on several spatial transcriptomic datasets at both spot and single-cell resolutions, SpaceFlow is shown to produce a robust domain segmentation and identify biologically meaningful spatiotemporal patterns. Applications of SpaceFlow reveal evolving lineage in heart developmental data and tumor-immune interactions in human breast cancer data. Our study provides a flexible deep learning framework to incorporate spatiotemporal information in analyzing spatial transcriptomic data.

     
    more » « less
  5. Complex biological tissues consist of numerous cells in a highly coordinated manner and carry out various biological functions. Therefore, segmenting a tissue into spatial and functional domains is critically important for understanding and controlling the biological functions. The emerging spatial transcriptomic technologies allow simultaneous measurements of thousands of genes with precise spatial information, providing an unprecedented opportunity for dissecting biological tissues. However, how to utilize such noisy, sparse, and high dimensional data for tissue segmentation remains a major challenge. Here, we develop a deep learning-based method, named SCAN-IT by transforming the spatial domain identification problem into an image segmentation problem, with cells mimicking pixels and expression values of genes within a cell representing the color channels. Specifically, SCAN-IT relies on geometric modeling, graph neural networks, and an informatics approach, DeepGraphInfomax. We demonstrate that SCAN-IT can handle datasets from a wide range of spatial transcriptomics techniques, including the ones with high spatial resolution but low gene coverage as well as those with low spatial resolution but high gene coverage. We show that SCAN-IT outperforms state-of-the-art methods using a benchmark dataset with ground truth domain annotations. 
    more » « less
  6. Abstract

    The rapid development of spatial transcriptomics (ST) techniques has allowed the measurement of transcriptional levels across many genes together with the spatial positions of cells. This has led to an explosion of interest in computational methods and techniques for harnessing both spatial and transcriptional information in analysis of ST datasets. The wide diversity of approaches in aim, methodology and technology for ST provides great challenges in dissecting cellular functions in spatial contexts. Here, we synthesize and review the key problems in analysis of ST data and methods that are currently applied, while also expanding on open questions and areas of future development.

     
    more » « less