skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
In digital pathology, the spatial context of cells is important for cell classification, cancer diagnosis and prognosis. To model such complex cell context, however, is challenging. Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. We incorporate such structural descriptors into a deep generative model as both conditional inputs and a differentiable loss. This way, we are able to generate high quality multi-class cell layouts for the first time. We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.  more » « less
Award ID(s):
2144901
PAR ID:
10417482
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE Conference on Computer Vision and Pattern Recognition
ISSN:
2163-6648
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The tumor microenvironment (TME) is an immensely complex ecosystem1,2. This complexity underlies difficulties in elucidating principles of spatial organization and using molecular profiling of the TME for clinical use3. Through statistical analysis of 96 spatial transcriptomic (ST-seq) datasets spanning twelve diverse tumor types, we found a conserved distribution of multicellular, transcriptionally covarying units termed ‘Spatial Groups’ (SGs). SGs were either dependent on a hierarchical local spatial context – enriched for cell-extrinsic processes such as immune regulation and signal transduction – or independent from local spatial context – enriched for cell-intrinsic processes such as protein and RNA metabolism, DNA repair, and cell cycle regulation. We used SGs to define a measure of gene spatial heterogeneity – ‘spatial lability’ – and categorized all 96 tumors by their TME spatial lability profiles. The resulting classification captured spatial variation in cell-extrinsic versus cell-intrinsic biology and motivated class-specific strategies for therapeutic intervention. Using this classification to characterize pre-treatment biopsy samples of 16 non-small cell lung cancer (NSCLC) patients outside our database distinguished responders and non-responders to immune checkpoint blockade while programmed death-ligand 1 (PD-L1) status and spatially unaware bulk transcriptional markers did not. Our findings show conserved principles of TME spatial biology that are both biologically and clinically significant. 
    more » « less
  2. Controlled table-to-text generation seeks to generate natural language descriptions for highlighted subparts of a table. Previous SOTA systems still employ a sequence-to-sequence generation method, which merely captures the table as a linear structure and is brittle when table layouts change. We seek to go beyond this paradigm by (1) effectively expressing the relations of content pieces in the table, and (2) making our model robust to content-invariant structural transformations. Accordingly, we propose an equivariance learning framework, which encodes tables with a structure-aware self-attention mechanism. This prunes the full self-attention structure into an order-invariant graph attention that captures the connected graph structure of cells belonging to the same row or column, and it differentiates between relevant cells and irrelevant cells from the structural perspective. Our framework also modifies the positional encoding mechanism to preserve the relative position of tokens in the same cell but enforce position invariance among different cells. Our technology is free to be plugged into existing table-to-text generation models, and has improved T5-based models to offer better performance on ToTTo and HiTab. Moreover, on a harder version of ToTTo, we preserve promising performance, while previous SOTA systems, even with transformation-based data augmentation, have seen significant performance drops. 
    more » « less
  3. Interacting particle system (IPS) models have proven to be highly successful for describing the spatial movement of organisms. However, it is challenging to infer the interaction rules directly from data. In the field of equation discovery, the weak-form sparse identification of nonlinear dynamics (WSINDy) methodology has been shown to be computationally efficient for identifying the governing equations of complex systems from noisy data. Motivated by the success of IPS models to describe the spatial movement of organisms, we develop WSINDy for the second-order IPS to learn equations for communities of cells. Our approach learns the directional interaction rules for each individual cell that in aggregate govern the dynamics of a heterogeneous population of migrating cells. To sort a cell according to the active classes present in its model, we also develop a novel ad hoc classification scheme (which accounts for the fact that some cells do not have enough evidence to accurately infer a model). Aggregated models are then constructed hierarchically to simultaneously identify different species of cells present in the population and determine best-fit models for each species. We demonstrate the efficiency and proficiency of the method on several test scenarios, motivated by common cell migration experiments. 
    more » « less
  4. Spatially resolved scRNA-seq (sp-scRNA-seq) technologies provide the potential to comprehensively profile gene expression patterns in tissue context. However, the development of computational methods lags behind the advances in these technologies, which limits the fulfillment of their potential. In this study, we develop a deep learning approach for clustering sp-scRNA-seq data, named Deep Spatially constrained Single-cell Clustering (DSSC). In this model, we integrate the spatial information of cells into the clustering process in two steps: (1) the spatial information is encoded by using a graphical neural network model, and (2) cell-to-cell constraints are built based on the spatial expression pattern of the marker genes and added in the model to guide the clustering process. Then, a deep embedding clustering is performed on the bottleneck layer of autoencoder by Kullback–Leibler (KL) divergence along with the learning of feature representation. DSSC is the first model that can use information from both spatial coordinates and marker genes to guide cell/spot clustering. Extensive experiments on both simulated and real data sets show that DSSC boosts clustering performance significantly compared with the state-of-the-art methods. It has robust performance across different data sets with various cell type/tissue organization and/or cell type/tissue spatial dependency. We conclude that DSSC is a promising tool for clustering sp-scRNA-seq data. 
    more » « less
  5. Abstract Different cell types aggregate and sort into hierarchical architectures during the formation of animal tissues. The resulting spatial organization depends (in part) on the strength of adhesion of one cell type to itself relative to other cell types. However, automated and unsupervised classification of these multicellular spatial patterns remains challenging, particularly given their structural diversity and biological variability. Recent developments based on topological data analysis are intriguing to reveal similarities in tissue architecture, but these methods remain computationally expensive. In this article, we show that multicellular patterns organized from two interacting cell types can be efficiently represented through persistence images. Our optimized combination of dimensionality reduction via autoencoders, combined with hierarchical clustering, achieved high classification accuracy for simulations with constant cell numbers. We further demonstrate that persistence images can be normalized to improve classification for simulations with varying cell numbers due to proliferation. Finally, we systematically consider the importance of incorporating different topological features as well as information about each cell type to improve classification accuracy. We envision that topological machine learning based on persistence images will enable versatile and robust classification of complex tissue architectures that occur in development and disease. 
    more » « less