skip to main content


Title: Deep learning for inferring gene relationships from single-cell expression data

Several methods were developed to mine gene–gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC’s encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.

 
more » « less
NSF-PAR ID:
10127008
Author(s) / Creator(s):
;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
116
Issue:
52
ISSN:
0027-8424
Page Range / eLocation ID:
p. 27151-27158
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Current biotechnologies can simultaneously measure multiple high-dimensional modalities (e.g., RNA, DNA accessibility, and protein) from the same cells. A combination of different analytical tasks (e.g., multi-modal integration and cross-modal analysis) is required to comprehensively understand such data, inferring how gene regulation drives biological diversity and functions. However, current analytical methods are designed to perform a single task, only providing a partial picture of the multi-modal data. Here, we present UnitedNet, an explainable multi-task deep neural network capable of integrating different tasks to analyze single-cell multi-modality data. Applied to various multi-modality datasets (e.g., Patch-seq, multiome ATAC + gene expression, and spatial transcriptomics), UnitedNet demonstrates similar or better accuracy in multi-modal integration and cross-modal prediction compared with state-of-the-art methods. Moreover, by dissecting the trained UnitedNet with the explainable machine learning algorithm, we can directly quantify the relationship between gene expression and other modalities with cell-type specificity. UnitedNet is a comprehensive end-to-end framework that could be broadly applicable to single-cell multi-modality biology. This framework has the potential to facilitate the discovery of cell-type-specific regulation kinetics across transcriptomics and other modalities.

     
    more » « less
  2. Abstract Background

    Autosomal dominant polycystic kidney disease (ADPKD) is one of the most prevalent monogenic human diseases. It is mostly caused by pathogenic variants inPKD1orPKD2genes that encode interacting transmembrane proteins polycystin-1 (PC1) and polycystin-2 (PC2). Among many pathogenic processes described in ADPKD, those associated with cAMP signaling, inflammation, and metabolic reprogramming appear to regulate the disease manifestations. Tolvaptan, a vasopressin receptor-2 antagonist that regulates cAMP pathway, is the only FDA-approved ADPKD therapeutic. Tolvaptan reduces renal cyst growth and kidney function loss, but it is not tolerated by many patients and is associated with idiosyncratic liver toxicity. Therefore, additional therapeutic options for ADPKD treatment are needed.

    Methods

    As drug repurposing of FDA-approved drug candidates can significantly decrease the time and cost associated with traditional drug discovery, we used the computational approach signature reversion to detect inversely related drug response gene expression signatures from the Library of Integrated Network-Based Cellular Signatures (LINCS) database and identified compounds predicted to reverse disease-associated transcriptomic signatures in three publicly availablePkd2kidney transcriptomic data sets of mouse ADPKD models. We focused on a pre-cystic model for signature reversion, as it was less impacted by confounding secondary disease mechanisms in ADPKD, and then compared the resulting candidates’ target differential expression in the two cystic mouse models. We further prioritized these drug candidates based on their known mechanism of action, FDA status, targets, and by functional enrichment analysis.

    Results

    With this in-silico approach, we prioritized 29 unique drug targets differentially expressed inPkd2ADPKD cystic models and 16 prioritized drug repurposing candidates that target them, including bromocriptine and mirtazapine, which can be further tested in-vitro and in-vivo.

    Conclusion

    Collectively, these results indicate drug targets and repurposing candidates that may effectively treat pre-cystic as well as cystic ADPKD.

    Graphical Abstract 
    more » « less
  3. Abstract Motivation

    Three-dimensional (3D) genome organization plays important functional roles in cells. User-friendly tools for reconstructing 3D genome models from chromosomal conformation capturing data and analyzing them are needed for the study of 3D genome organization.

    Results

    We built a comprehensive graphical tool (GenomeFlow) to facilitate the entire process of modeling and analysis of 3D genome organization. This process includes the mapping of Hi-C data to one-dimensional (1D) reference genomes, the generation, normalization and visualization of two-dimensional (2D) chromosomal contact maps, the reconstruction and the visualization of the 3D models of chromosome and genome, the analysis of 3D models and the integration of these models with functional genomics data. This graphical tool is the first of its kind in reconstructing, storing, analyzing and annotating 3D genome models. It can reconstruct 3D genome models from Hi-C data and visualize them in real-time. This tool also allows users to overlay gene annotation, gene expression data and genome methylation data on top of 3D genome models.

    Availability and implementation

    The source code and user manual: https://github.com/jianlin-cheng/GenomeFlow.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Summary

    Predicting gene regulatory networks (GRNs) from expression profiles is a common approach for identifying important biological regulators. Despite the increased use of inference methods, existing computational approaches often do not integrate RNA‐sequencing data analysis, are not automated or are restricted to users with bioinformatics backgrounds. To address these limitations, we developedtuxnet, a user‐friendly platform that can process raw RNA‐sequencing data from any organism with an existing reference genome using a modifiedtuxedopipeline (hisat 2 + cufflinkspackage) and infer GRNs from these processed data.tuxnetis implemented as a graphical user interface and can mine gene regulations, either by applying a dynamic Bayesian network (DBN) inference algorithm,genist, or a regression tree‐based pipeline,rtp‐star. We obtained time‐course expression data of aPERIANTHIA(PAN) inducible line and inferred a GRN usinggenistto illustrate the use oftuxnetwhile gaining insight into the regulations downstream of the Arabidopsis root stem cell regulatorPAN. Usingrtp‐star, we inferred the network ofATHB13, a downstream gene of PAN, for which we obtained wild‐type and mutant expression profiles. Additionally, we generated two networks using temporal data from developmental leaf data and spatial data from root cell‐type data to highlight the use oftuxnetto form new testable hypotheses from previously explored data. Our case studies feature the versatility oftuxnetwhen using different types of gene expression data to infer networks and its accessibility as a pipeline for non‐bioinformaticians to analyze transcriptome data, predict causal regulations, assess network topology and identify key regulators.

     
    more » « less
  5. Abstract

    Spatial gene expression in tissue is characterized by regions in which particular genes are enriched or depleted. Frequently, these regions contain nested inside them subregions with distinct expression patterns. Segmentation methods in spatial transcriptomic (ST) data extract disjoint regions maximizing similarity over the greatest number of genes, typically on a particular spatial scale, thus lacking the ability to find region-within-region structure. We present NeST, which extracts spatial structure through coexpression hotspots—regions exhibiting localized spatial coexpression of some set of genes. Coexpression hotspots identify structure on any spatial scale, over any possible subset of genes, and are highly explainable. NeST also performs spatial analysis of cell-cell interactions via ligand-receptor, identifying active areas de novo without restriction of cell type or other groupings, in both two and three dimensions. Through application on ST datasets of varying type and resolution, we demonstrate the ability of NeST to reveal a new level of biological structure.

     
    more » « less