skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Joint analysis of expression levels and histological images identifies genes associated with tissue morphology
Abstract Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.  more » « less
Award ID(s):
1750729
PAR ID:
10217031
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due tocis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect ofcis-regulatory genetic variation on expression. 
    more » « less
  2. Tishkoff, Sarah A. (Ed.)
    Facial morphology is highly variable, both within and among human populations, and a sizable portion of this variation is attributable to genetics. Previous genome scans have revealed more than 100 genetic loci associated with different aspects of normal-range facial variation. Most of these loci have been detected in Europeans, with few studies focusing on other ancestral groups. Consequently, the degree to which facial traits share a common genetic basis across diverse sets of humans remains largely unknown. We therefore investigated the genetic basis of facial morphology in an East African cohort. We applied an open-ended data-driven phenotyping approach to a sample of 2,595 3D facial images collected on Tanzanian children. This approach segments the face into hierarchically arranged, multivariate features that capture the shape variation after adjusting for age, sex, height, weight, facial size and population stratification. Genome scans of these multivariate shape phenotypes revealed significant (p < 2.5 × 10 −8 ) signals at 20 loci, which were enriched for active chromatin elements in human cranial neural crest cells and embryonic craniofacial tissue, consistent with an early developmental origin of the facial variation. Two of these associations were in highly conserved regions showing craniofacial-specific enhancer activity during embryological development (5q31.1 and 12q21.31). Six of the 20 loci surpassed a stricter threshold accounting for multiple phenotypes with study-wide significance (p < 6.25 × 10 −10 ). Cross-population comparisons indicated 10 association signals were shared with Europeans (seven sharing the same associated SNP), and facilitated fine-mapping of causal variants at previously reported loci. Taken together, these results may point to both shared and population-specific components to the genetic architecture of facial variation. 
    more » « less
  3. LMNA-related dilated cardiomyopathy (DCM) is an autosomal-dominant genetic condition with cardiomyocyte and conduction system dysfunction often resulting in heart failure or sudden death. The condition is caused by mutation in the Lamin A/C (LMNA) gene encoding Type-A nuclear lamin proteins involved in nuclear integrity, epigenetic regulation of gene expression, and differentiation. The molecular mechanisms of the disease are not completely understood, and there are no definitive treatments to reverse progression or prevent mortality. We investigated possible mechanisms of LMNA-related DCM using induced pluripotent stem cells derived from a family with a heterozygous LMNA c.357-2A>G splice-site mutation. We differentiated one LMNA-mutant iPSC line derived from an affected female (Patient) and two non-mutant iPSC lines derived from her unaffected sister (Control) and conducted single-cell RNA sequencing for 12 samples (four from Patients and eight from Controls) across seven time points: Day 0, 2, 4, 9, 16, 19, and 30. Our bioinformatics workflow identified 125,554 cells in raw data and 110,521 (88%) high-quality cells in sequentially processed data. Unsupervised clustering, cell annotation, and trajectory inference found complex heterogeneity: ten main cell types; many possible subtypes; and lineage bifurcation for cardiac progenitors to cardiomyocytes (CMs) and epicardium-derived cells (EPDCs). Data integration and comparative analyses of Patient and Control cells found cell type and lineage-specific differentially expressed genes (DEGs) with enrichment, supporting pathway dysregulation. Top DEGs and enriched pathways included 10 ZNF genes and RNA polymerase II transcription in pluripotent cells (PP); BMP4 and TGF Beta/BMP signaling, sarcomere gene subsets and cardiogenesis, CDH2 and EMT in CMs; LMNA and epigenetic regulation, as well as DDIT4 and mTORC1 signaling in EPDCs. Top DEGs also included XIST and other X-linked genes, six imprinted genes (SNRPN, PWAR6, NDN, PEG10, MEG3, MEG8), and enriched gene sets related to metabolism, proliferation, and homeostasis. We confirmed Lamin A/C haploinsufficiency by allelic expression and Western blot. Our complex Patient-derived iPSC model for Lamin A/C haploinsufficiency in PP, CM, and EPDC provided support for dysregulation of genes and pathways, many previously associated with Lamin A/C defects, such as epigenetic gene expression, signaling, and differentiation. Our findings support disruption of epigenomic developmental programs, as proposed in other LMNA disease models. We recognized other factors influencing epigenetics and differentiation; thus, our approach needs improvement to further investigate this mechanism in an iPSC-derived model. 
    more » « less
  4. The term “microgravity” is used to describe the “weightlessness” or “zero-g” circumstances that can only be found in space beyond earth’s atmosphere. Rhodobacter sphaeroides is a gram-negative purple phototroph, used as a model organism for this study due to its genomic complexity and metabolic versatility. Its genome has been completely sequenced, and profiles of the differential gene expression under aerobic, semi-aerobic, and photosynthetic conditions were examined. In this study, we hypothesized that R. sphaeroides will show altered growth characteristics, morphological properties, and gene expression patterns when grown under simulated microgravity. To test that, we measured the optical density and colony-forming units of cell cultures grown under both microgravity and normal gravity conditions. Differences in the cell morphology were observed using scanning electron microscopy (SEM) images by measuring the length and the surface area of the cells under both conditions. Furthermore, we also identified homologous genes of R. spheroides using the differential gene expression study of Acidovorax under microgravity in our laboratory. Growth kinetics results showed that R. sphaeroides cells grown under microgravity experience a shorter log phase and early stationary phase compared to the cells growing under normal gravity conditions. The length and surface area of the cells under microgravity were significantly higher confirming that bacterial cells experience altered morphological features when grown under microgravity conditions. Differentially expressed homologous gene analysis indicated that genes coding for several COG and GO functions, such as metabolism, signal-transduction, transcription, translation, chemotaxis, and cell motility are differentially expressed to adapt and survive microgravity. 
    more » « less
  5. Leslie, Christina S. (Ed.)
    Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS . 
    more » « less