skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Revisiting the use of graph centrality models in biological pathway analysis
Abstract The use of graph theory models is widespread in biological pathway analyses as it is often desired to evaluate the position of genes and proteins in their interaction networks of the biological systems. In this article, we argue that the common standard graph centrality measures do not sufficiently capture the informative topological organizations of the pathways, and thus, limit the biological inference. While key pathway elements may appear both upstream and downstream in pathways, standard directed graph centralities attribute significant topological importance to the upstream elements and evaluate the downstream elements as having no importance.We present a directed graph framework, Source/Sink Centrality (SSC), to address the limitations of standard models. SSC separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality. To validate SSC, we evaluate the topological position of known human cancer genes and mouse lethal genes in their respective KEGG annotated pathways and show that SSC-derived centralities provide an effective framework for associating higher positional importance to the genes with higher importance from a priori knowledge. While the presented work challenges some of the modeling assumptions in the common pathway analyses, it provides a straight-forward methodology to extend the existing models. The SSC extensions can result in more informative topological description of pathways, and thus, more informative biological inference.  more » « less
Award ID(s):
1652442
PAR ID:
10253640
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
BioData Mining
Volume:
13
Issue:
1
ISSN:
1756-0381
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective. To demonstrate the capability of utilizing graph feature-based supervised machine learning (ML) algorithm on intracranial electroencephalogram recordings for the identification of seizure onset zones (SOZs) in individuals with drug-resistant epilepsy.Approach. Utilizing three model-free measures of effective connectivity (EC)-directed information, mutual information-guided Granger causality index (MI-GCI), and frequency-domain convergent cross-mapping (FD-CCM) - directed graphs are generated. Graph centrality measures at different sparsity are used as the classifier’s features.Main results. The centrality features achieve high accuracies exceeding 90% in distinguishing SOZ electrodes from non-SOZ electrodes. Notably, a sparse graph representation with just ten features and simple ML models effectively achieves such performance. The study identifies FD-CCM centrality measures as particularly significant, with a mean AUC of 0.93, outperforming prior literature. The FD-CCM-based graph modeling also highlights elevated centrality measures among SOZ electrodes, emphasizing heightened activity relative to non-SOZ electrodes during ictogenesis.Significance. This research not only underscores the efficacy of automated SOZ identification but also illuminates the potential of specific EC measures in enhancing discriminative power within the context of epilepsy research. 
    more » « less
  2. In this work, we present a novel method for constructing a topological map of biological hotspots in an aquatic environment using a Fast Marching-based Voronoi segmentation. Using this topological map, we develop a closed form solution to the scheduling problem for any single path through the graph. Searching over the space of all paths allows us to compute a maximally informative path that traverses a subset of the hotspots, given some budget. Using a greedy-coverage algorithm we can then compute an informative path. We evaluate our method in a set of simulated trials, both with randomly generated environments and a real-world environment. In these trials, we show that our method produces a topological graph which more accurately captures features in the environment than standard thresholding techniques. Additionally, We show that our method can improve the performance of a greedy-coverage algorithm in the informative path planning problem by guiding it to different informative areas to help it escape from local maxima. 
    more » « less
  3. This paper describes a group-level classification of 14 patients with prefrontal cortex (pFC) lesions from 20 healthy controls using multi-layer graph convolutional networks (GCN) with features inferred from the scalp EEG recorded from the encoding phase of working memory (WM) trials. We first construct undirected and directed graphs to represent the WM encoding for each trial for each subject using distance correlation- based functional connectivity measures and differential directed information-based effective connectivity measures, respectively. Centrality measures of betweenness centrality, eigenvector centrality, and closeness centrality are inferred for each of the 64 channels from the brain connectivity. Along with the three centrality measures, each graph uses the relative band powers in the five frequency bands - delta, theta, alpha, beta, and gamma- as node features. The summarized graph representation is learned using two layers of GCN followed by mean pooling, and fully connected layers are used for classification. The final class label for a subject is decided using majority voting based on the results from all the subject's trials. The GCN-based model can correctly classify 28 of the 34 subjects (82.35% accuracy) with undirected edges represented by functional connectivity measure of distance correlation and classify all 34 subjects (100% accuracy) with directed edges characterized by effective connectivity measure of differential directed information. 
    more » « less
  4. null (Ed.)
    Abstract The network-based proximity between drug targets and disease genes can provide novel insights regarding the repercussions, interplay, and repositioning of drugs in the context of disease. Current understanding and treatment for reversing of the fibrotic process is limited in systemic sclerosis (SSc). We have developed a network-based analysis for drug effects that takes into account the human interactome network, proximity measures between drug targets and disease-associated genes, genome-wide gene expression and disease modules that emerge through pertinent analysis. Currently used and potential drugs showed a wide variation in proximity to SSc-associated genes and distinctive proximity to the SSc-relevant pathways, depending on their class and targets. Tyrosine kinase inhibitors (TyKIs) approach disease gene through multiple pathways, including both inflammatory and fibrosing processes. The SSc disease module includes the emerging molecular targets and is in better accord with the current knowledge of the pathophysiology of the disease. In the disease-module network, the greatest perturbing activity was shown by nintedanib, followed by imatinib, dasatinib, and acetylcysteine. Suppression of the SSc-relevant pathways and alleviation of the skin fibrosis was remarkable in the inflammatory subsets of the SSc patients receiving TyKI therapy. Our results show that network-based drug-disease proximity offers a novel perspective into a drug’s therapeutic effect in the SSc disease module. This could be applied to drug combinations or drug repositioning, and be helpful guiding clinical trial design and subgroup analysis. 
    more » « less
  5. Markov Chain Monte Carlo (MCMC) has been the de facto technique for sampling and inference of large graphs such as online social networks. At the heart of MCMC lies the ability to construct an ergodic Markov chain that attains any given stationary distribution \pi, often in the form of random walks or crawling agents on the graph. Most of the works around MCMC, however, presume that the graph is undirected or has reciprocal edges, and become inapplicable when the graph is directed and non-reciprocal. Here we develop a similar framework for directed graphs called Non- Markovian Monte Carlo (NMMC) by establishing a mapping to convert \pi into the quasi-stationary distribution of a carefully constructed transient Markov chain on an extended state space. As applications, we demonstrate how to achieve any given distribution \pi on a directed graph and estimate the eigenvector centrality using a set of non-Markovian, history-dependent random walks on the same graph in a distributed manner.We also provide numerical results on various real-world directed graphs to confirm our theoretical findings, and present several practical enhancements to make our NMMC method ready for practical use inmost directed graphs. To the best of our knowledge, the proposed NMMC framework for directed graphs is the first of its kind, unlocking all the limitations set by the standard MCMC methods for undirected graphs. 
    more » « less