skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Analysis of cellular phenotypes with unbiased image-based generative models
Observing changes in cellular phenotypes under experimental interventions is a powerful approach for studying biology and has many applications, including treatment design. Unfortunately, not all interventions can be tested experimentally, which limits our ability to study complex phenomena such as combinatorial treatments or continuous time or dose responses. In this work, we explore unbiased, image-based generative models to analyze phenotypic changes in cell morphology and tissue organization. The proposed approach is based on generative adversarial networks (GAN) conditioned on feature representations obtained with self-supervised learning. Our goal is to ensure that image-based phenotypes are accurately encoded in a latent space that can be later manipulated and used for generating images of novel phenotypic variations. We present an evaluation of our approach for phenotype analysis in a drug screen and a cancer tissue dataset.  more » « less
Award ID(s):
2348683 2134695
PAR ID:
10503917
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
NeurIPS GenBio Workshop
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Location:
New Orleans, LA
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  2. Abstract Background Access to quantitative information is crucial to obtain a deeper understanding of biological systems. In addition to being low-throughput, traditional image-based analysis is mostly limited to error-prone qualitative or semi-quantitative assessment of phenotypes, particularly for complex subcellular morphologies. The PVD neuron in Caenorhabditis elegans , which is responsible for harsh touch and thermosensation, undergoes structural degeneration as nematodes age characterized by the appearance of dendritic protrusions. Analysis of these neurodegenerative patterns is labor-intensive and limited to qualitative assessment. Results In this work, we apply deep learning to perform quantitative image-based analysis of complex neurodegeneration patterns exhibited by the PVD neuron in C. elegans . We apply a convolutional neural network algorithm (Mask R-CNN) to identify neurodegenerative subcellular protrusions that appear after cold-shock or as a result of aging. A multiparametric phenotypic profile captures the unique morphological changes induced by each perturbation. We identify that acute cold-shock-induced neurodegeneration is reversible and depends on rearing temperature and, importantly, that aging and cold-shock induce distinct neuronal beading patterns. Conclusion The results of this work indicate that implementing deep learning for challenging image segmentation of PVD neurodegeneration enables quantitatively tracking subtle morphological changes in an unbiased manner. This analysis revealed that distinct patterns of morphological alteration are induced by aging and cold-shock, suggesting different mechanisms at play. This approach can be used to identify the molecular components involved in orchestrating neurodegeneration and to characterize the effect of other stressors on PVD degeneration. 
    more » « less
  3. null (Ed.)
    Image-based cell classification has become a common tool to identify phenotypic changes in cell populations. However, this methodology is limited to organisms possessing well characterized species-specific reagents (e.g., antibodies) that allow cell identification, clustering and convolutional neural network (CNN) training. In the absence of such reagents, the power of image-based classification has remained mostly off-limits to many research organisms. We have developed an image-based classification methodology we named Image3C (Image-Cytometry Cell Classification) that does not require species-specific reagents nor pre-existing knowledge about the sample. Image3C combines image-based flow cytometry with an unbiased, high-throughput cell cluster pipeline and CNN integration. Image3C exploits intrinsic cellular features and non-species-specific dyes to perform de novo cell composition analysis and to detect changes in cellular composition between different conditions. Therefore, Image3C expands the use of imaged-based analyses of cell population composition to research organisms in which detailed cellular phenotypes are unknown or for which species-specific reagents are not available. 
    more » « less
  4. The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image edits. We present GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image-based object editing capabilities into a single method. Our key insight is to view image editing operations as geometric transformations. We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations. Our training-free optimization method uses an objective function that seeks to preserve object style but generate plausible images, for instance with accurate lighting and shadows. It also inpaints disoccluded parts of the image where the object was originally located. Given a natural image and user input, we segment the foreground object using SAM and estimate a corresponding transform which is used by our optimization approach for editing. GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal. We present quantitative results, including a perceptual study, that shows how our approach is better than existing methods. 
    more » « less
  5. Abstract Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits. 
    more » « less