skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-domain translation between single-cell imaging and sequencing data using autoencoders
Abstract The development of single-cell methods for capturing different data modalities including imaging and sequencing has revolutionized our ability to identify heterogeneous cell states. Different data modalities provide different perspectives on a population of cells, and their integration is critical for studying cellular heterogeneity and its function. While various methods have been proposed to integrate different sequencing data modalities, coupling imaging and sequencing has been an open challenge. We here present an approach for integrating vastly different modalities by learning a probabilistic coupling between the different data modalities using autoencoders to map to a shared latent space. We validate this approach by integrating single-cell RNA-seq and chromatin images to identify distinct subpopulations of human naive CD4+ T-cells that are poised for activation. Collectively, our approach provides a framework to integrate and translate between data modalities that cannot yet be measured within the same cell for diverse applications in biomedical discovery.  more » « less
Award ID(s):
1651995
PAR ID:
10208578
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers make more accurate predictions and gain a more comprehensive understanding than with single-modality analysis. However, generating such multimodal data is technically challenging and expensive, leading to limited availability of single-cell co-assay data. Here, we propose a model for cross-modal prediction between the transcriptome and chromatin profiles in single cells. Our model is based on a deep neural network architecture that learns the latent representations from the source modality and then predicts the target modality. It demonstrates reliable performance in accurately translating between these modalities across multiple paired human scATAC-seq and scRNA-seq datasets. Additionally, we developed CrossMP, a web-based portal allowing researchers to upload their single-cell modality data through an interactive web interface and predict the other type of modality data, using high-performance computing resources plugged at the backend. 
    more » « less
  2. Abstract Integrating single-cell multi-omics data is a challenging task that has led to new insights into complex cellular systems. Various computational methods have been proposed to effectively integrate these rapidly accumulating datasets, including deep learning. However, despite the proven success of deep learning in integrating multi-omics data and its better performance over classical computational methods, there has been no systematic study of its application to single-cell multi-omics data integration. To fill this gap, we conducted a literature review to explore the use of multimodal deep learning techniques in single-cell multi-omics data integration, taking into account recent studies from multiple perspectives. Specifically, we first summarized different modalities found in single-cell multi-omics data. We then reviewed current deep learning techniques for processing multimodal data and categorized deep learning-based integration methods for single-cell multi-omics data according to data modality, deep learning architecture, fusion strategy, key tasks and downstream analysis. Finally, we provided insights into using these deep learning models to integrate multi-omics data and better understand single-cell biological mechanisms. 
    more » « less
  3. Abstract Tissue development and disease lead to changes in cellular organization, nuclear morphology, and gene expression, which can be jointly measured by spatial transcriptomic technologies. However, methods for jointly analyzing the different spatial data modalities in 3D are still lacking. We present a computational framework to integrate Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data (STACI) to identify molecular and functional alterations in tissues. STACI incorporates multiple modalities in a single representation for downstream tasks, enables the prediction of spatial transcriptomic data from nuclear images in unseen tissue sections, and provides built-in batch correction of gene expression and tissue morphology through over-parameterization. We apply STACI to analyze the spatio-temporal progression of Alzheimer’s disease and identify the associated nuclear morphometric and coupled gene expression features. Collectively, we demonstrate the importance of characterizing disease progression by integrating multiple data modalities and its potential for the discovery of disease biomarkers. 
    more » « less
  4. Abstract Multimodal single-cell sequencing technologies provide unprecedented information on cellular heterogeneity from multiple layers of genomic readouts. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality analysis. How to efficiently utilize the extra information from single cell multi-omics to delineate cell states and identify meaningful signal remains as a significant computational challenge. In this work, we propose a deep learning framework, named SAILERX, for efficient, robust, and flexible analysis of multi-modal single-cell data. SAILERX consists of a variational autoencoder with invariant representation learning to correct technical noises from sequencing process, and a multimodal data alignment mechanism to integrate information from different modalities. Instead of performing hard alignment by projecting both modalities to a shared latent space, SAILERX encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, which facilitates various downstream analysis such as clustering, imputation, and marker gene detection. Furthermore, the invariant representation learning part enables SAILERX to perform integrative analysis on both multi- and single-modal datasets, making it an applicable and scalable tool for more general scenarios. 
    more » « less
  5. Abstract Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations. 
    more » « less