skip to main content

Title: Multi-domain translation between single-cell imaging and sequencing data using autoencoders
Abstract

The development of single-cell methods for capturing different data modalities including imaging and sequencing has revolutionized our ability to identify heterogeneous cell states. Different data modalities provide different perspectives on a population of cells, and their integration is critical for studying cellular heterogeneity and its function. While various methods have been proposed to integrate different sequencing data modalities, coupling imaging and sequencing has been an open challenge. We here present an approach for integrating vastly different modalities by learning a probabilistic coupling between the different data modalities using autoencoders to map to a shared latent space. We validate this approach by integrating single-cell RNA-seq and chromatin images to identify distinct subpopulations of human naive CD4+ T-cells that are poised for activation. Collectively, our approach provides a framework to integrate and translate between data modalities that cannot yet be measured within the same cell for diverse applications in biomedical discovery.

Authors:
; ; ; ; ; ; ;
Award ID(s):
1651995
Publication Date:
NSF-PAR ID:
10208578
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Publisher:
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction: Vaso-occlusive crises (VOCs) are a leading cause of morbidity and early mortality in individuals with sickle cell disease (SCD). These crises are triggered by sickle red blood cell (sRBC) aggregation in blood vessels and are influenced by factors such as enhanced sRBC and white blood cell (WBC) adhesion to inflamed endothelium. Advances in microfluidic biomarker assays (i.e., SCD Biochip systems) have led to clinical studies of blood cell adhesion onto endothelial proteins, including, fibronectin, laminin, P-selectin, ICAM-1, functionalized in microchannels. These microfluidic assays allow mimicking the physiological aspects of human microvasculature and help characterize biomechanical properties of adhered sRBCsmore »under flow. However, analysis of the microfluidic biomarker assay data has so far relied on manual cell counting and exhaustive visual morphological characterization of cells by trained personnel. Integrating deep learning algorithms with microscopic imaging of adhesion protein functionalized microfluidic channels can accelerate and standardize accurate classification of blood cells in microfluidic biomarker assays. Here we present a deep learning approach into a general-purpose analytical tool covering a wide range of conditions: channels functionalized with different proteins (laminin or P-selectin), with varying degrees of adhesion by both sRBCs and WBCs, and in both normoxic and hypoxic environments. Methods: Our neural networks were trained on a repository of manually labeled SCD Biochip microfluidic biomarker assay whole channel images. Each channel contained adhered cells pertaining to clinical whole blood under constant shear stress of 0.1 Pa, mimicking physiological levels in post-capillary venules. The machine learning (ML) framework consists of two phases: Phase I segments pixels belonging to blood cells adhered to the microfluidic channel surface, while Phase II associates pixel clusters with specific cell types (sRBCs or WBCs). Phase I is implemented through an ensemble of seven generative fully convolutional neural networks, and Phase II is an ensemble of five neural networks based on a Resnet50 backbone. Each pixel cluster is given a probability of belonging to one of three classes: adhered sRBC, adhered WBC, or non-adhered / other. Results and Discussion: We applied our trained ML framework to 107 novel whole channel images not used during training and compared the results against counts from human experts. As seen in Fig. 1A, there was excellent agreement in counts across all protein and cell types investigated: sRBCs adhered to laminin, sRBCs adhered to P-selectin, and WBCs adhered to P-selectin. Not only was the approach able to handle surfaces functionalized with different proteins, but it also performed well for high cell density images (up to 5000 cells per image) in both normoxic and hypoxic conditions (Fig. 1B). The average uncertainty for the ML counts, obtained from accuracy metrics on the test dataset, was 3%. This uncertainty is a significant improvement on the 20% average uncertainty of the human counts, estimated from the variance in repeated manual analyses of the images. Moreover, manual classification of each image may take up to 2 hours, versus about 6 minutes per image for the ML analysis. Thus, ML provides greater consistency in the classification at a fraction of the processing time. To assess which features the network used to distinguish adhered cells, we generated class activation maps (Fig. 1C-E). These heat maps indicate the regions of focus for the algorithm in making each classification decision. Intriguingly, the highlighted features were similar to those used by human experts: the dimple in partially sickled RBCs, the sharp endpoints for highly sickled RBCs, and the uniform curvature of the WBCs. Overall the robust performance of the ML approach in our study sets the stage for generalizing it to other endothelial proteins and experimental conditions, a first step toward a universal microfluidic ML framework targeting blood disorders. Such a framework would not only be able to integrate advanced biophysical characterization into fast, point-of-care diagnostic devices, but also provide a standardized and reliable way of monitoring patients undergoing targeted therapies and curative interventions, including, stem cell and gene-based therapies for SCD. Disclosures Gurkan: Dx Now Inc.: Patents & Royalties; Xatek Inc.: Patents & Royalties; BioChip Labs: Patents & Royalties; Hemex Health, Inc.: Consultancy, Current Employment, Patents & Royalties, Research Funding.« less
  2. Abstract

    Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labelsmore »to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.

    « less
  3. Abstract STUDY QUESTION

    Is the combined use of fluorescence lifetime imaging microscopy (FLIM)-based metabolic imaging and second harmonic generation (SHG) spindle imaging a feasible and safe approach for noninvasive embryo assessment?

    SUMMARY ANSWER

    Metabolic imaging can sensitively detect meaningful metabolic changes in embryos, SHG produces high-quality images of spindles and the methods do not significantly impair embryo viability.

    WHAT IS KNOWN ALREADY

    Proper metabolism is essential for embryo viability. Metabolic imaging is a well-tested method for measuring metabolism of cells and tissues, but it is unclear if it is sensitive enough and safe enough for use in embryo assessment.

    STUDY DESIGN, SIZE, DURATION

    Thismore »study consisted of time-course experiments and control versus treatment experiments. We monitored the metabolism of 25 mouse oocytes with a noninvasive metabolic imaging system while exposing them to oxamate (cytoplasmic lactate dehydrogenase inhibitor) and rotenone (mitochondrial oxidative phosphorylation inhibitor) in series. Mouse embryos (n = 39) were measured every 2 h from the one-cell stage to blastocyst in order to characterize metabolic changes occurring during pre-implantation development. To assess the safety of FLIM illumination, n = 144 illuminated embryos were implanted into n = 12 mice, and n = 108 nonilluminated embryos were implanted into n = 9 mice.

    PARTICIPANTS/MATERIALS, SETTING, METHODS

    Experiments were performed in mouse embryos and oocytes. Samples were monitored with noninvasive, FLIM-based metabolic imaging of nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD) autofluorescence. Between NADH cytoplasm, NADH mitochondria and FAD mitochondria, a single metabolic measurement produces up to 12 quantitative parameters for characterizing the metabolic state of an embryo. For safety experiments, live birth rates and pup weights (mean ± SEM) were used as endpoints. For all test conditions, the level of significance was set at P < 0.05.

    MAIN RESULTS AND THE ROLE OF CHANCE

    Measured FLIM parameters were highly sensitive to metabolic changes due to both metabolic perturbations and embryo development. For oocytes, metabolic parameter values were compared before and after exposure to oxamate and rotenone. The metabolic measurements provided a basis for complete separation of the data sets. For embryos, metabolic parameter values were compared between the first division and morula stages, morula and blastocyst and first division and blastocyst. The metabolic measurements again completely separated the data sets. Exposure of embryos to excessive illumination dosages (24 measurements) had no significant effect on live birth rate (5.1 ± 0.94 pups/mouse for illuminated group; 5.7 ± 1.74 pups/mouse for control group) or pup weights (1.88 ± 0.10 g for illuminated group; 1.89 ± 0.11 g for control group).

    LIMITATIONS, REASONS FOR CAUTION

    The study was performed using a mouse model, so conclusions concerning sensitivity and safety may not generalize to human embryos. A limitation of the live birth data is also that although cages were routinely monitored, we could not preclude that some runt pups may have been eaten.

    WIDER IMPLICATIONS OF THE FINDINGS

    Promising proof-of-concept results demonstrate that FLIM with SHG provide detailed biological information that may be valuable for the assessment of embryo and oocyte quality. Live birth experiments support the method’s safety, arguing for further studies of the clinical utility of these techniques.

    STUDY FUNDING/COMPETING INTEREST(S)

    Supported by the Blavatnik Biomedical Accelerator Grant at Harvard University and by the Harvard Catalyst/The Harvard Clinical and Translational Science Center (National Institutes of Health Award UL1 TR001102), by NSF grants DMR-0820484 and PFI-TT-1827309 and by NIH grant R01HD092550-01. T.S. was supported by a National Science Foundation Postdoctoral Research Fellowship in Biology grant (1308878). S.F. and S.A. were supported by NSF MRSEC DMR-1420382. Becker and Hickl GmbH sponsored the research with the loaning of equipment for FLIM. T.S. and D.N. are cofounders and shareholders of LuminOva, Inc., and co-hold patents (US20150346100A1 and US20170039415A1) for metabolic imaging methods. D.S. is on the scientific advisory board for Cooper Surgical and has stock options with LuminOva, Inc.

    « less
  4. Abstract

    Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single cell DNA sequencingmore »data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analysed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.

    « less
  5. Abstract Background Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor’s clonal composition. Results To overcome these challenges, we formulate the identification of clones inmore »terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. Conclusion PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods.« less