skip to main content


Title: Revisiting the use of graph centrality models in biological pathway analysis
Abstract The use of graph theory models is widespread in biological pathway analyses as it is often desired to evaluate the position of genes and proteins in their interaction networks of the biological systems. In this article, we argue that the common standard graph centrality measures do not sufficiently capture the informative topological organizations of the pathways, and thus, limit the biological inference. While key pathway elements may appear both upstream and downstream in pathways, standard directed graph centralities attribute significant topological importance to the upstream elements and evaluate the downstream elements as having no importance.We present a directed graph framework, Source/Sink Centrality (SSC), to address the limitations of standard models. SSC separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality. To validate SSC, we evaluate the topological position of known human cancer genes and mouse lethal genes in their respective KEGG annotated pathways and show that SSC-derived centralities provide an effective framework for associating higher positional importance to the genes with higher importance from a priori knowledge. While the presented work challenges some of the modeling assumptions in the common pathway analyses, it provides a straight-forward methodology to extend the existing models. The SSC extensions can result in more informative topological description of pathways, and thus, more informative biological inference.  more » « less
Award ID(s):
1652442
NSF-PAR ID:
10253640
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
BioData Mining
Volume:
13
Issue:
1
ISSN:
1756-0381
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Environmental stress from ultraviolet radiation, elevated temperatures or metal toxicity can lead to reactive oxygen species in cells, leading to oxidative DNA damage, premature aging, neurodegenerative diseases, and cancer. The transcription factor nuclear factor (erythroid-derived 2)-like 2 (Nrf2) activates many cytoprotective proteins within the nucleus to maintain homeostasis during oxidative stress. In vertebrates, Nrf2 levels are regulated by the Kelch-family protein Keap1 (Kelch-like ECH-associated protein 1) in the absence of stress according to a canonical redox control pathway. Little, however, is known about the redox control pathway used in early diverging metazoans. Our study examines the presence of known oxidative stress regulatory elements within non-bilaterian metazoans including free living and parasitic cnidarians, ctenophores, placozoans, and sponges. Cnidarians, with their pivotal position as the sister phylum to bilaterians, play an important role in understanding the evolutionary history of response to oxidative stress. Through comparative genomic and transcriptomic analysis our results show that Nrf homologs evolved early in metazoans, whereas Keap1 appeared later in the last common ancestor of cnidarians and bilaterians. However, key Nrf–Keap1 interacting domains are not conserved within the cnidarian lineage, suggesting this important pathway evolved with the radiation of bilaterians. Several known downstream Nrf targets are present in cnidarians suggesting that cnidarian Nrf plays an important role in oxidative stress response even in the absence of Keap1. Comparative analyses of key oxidative stress sensing and response proteins in early diverging metazoans thus provide important insights into the molecular basis of how these lineages interact with their environment and suggest a shared evolutionary history of regulatory pathways. Exploration of these pathways may prove important for the study of cancer therapeutics and broader research in oxidative stress, senescence, and the functional responses of early diverging metazoans to environmental change. 
    more » « less
  2. Abstract

    A common way to integrate and analyze large amounts of biological “omic” data is through pathway reconstruction: using condition-specific omic data to create a subnetwork of a generic background network that represents some process or cellular state. A challenge in pathway reconstruction is that adjusting pathway reconstruction algorithms’ parameters produces pathways with drastically different topological properties and biological interpretations. Due to the exploratory nature of pathway reconstruction, there is no ground truth for direct evaluation, so parameter tuning methods typically used in statistics and machine learning are inapplicable. We developed the pathway parameter advising algorithm to tune pathway reconstruction algorithms to minimize biologically implausible predictions. We leverage background knowledge in pathway databases to select pathways whose high-level structure resembles that of manually curated biological pathways. At the core of this method is a graphlet decomposition metric, which measures topological similarity to curated biological pathways. In order to evaluate pathway parameter advising, we compare its performance in avoiding implausible networks and reconstructing pathways from the NetPath database with other parameter selection methods across four pathway reconstruction algorithms. We also demonstrate how pathway parameter advising can guide reconstruction of an influenza host factor network. Pathway parameter advising is method agnostic; it is applicable to any pathway reconstruction algorithm with tunable parameters.

     
    more » « less
  3. Abstract

    Recent advancements in network science showed that the topological credentials of the elements (i.e., links) in a network carry important implications. Likewise, roadway segments (i.e., links) in a road network should be assessed based on their network position along with traffic conditions at a given geographic scale. The goal of this study is to present a framework that can identify and select critical links in a road network based on their topological importance such as centrality, and the effects of systematic interventions conducted on such links in improving overall system performance (vehicle delay, travel time) to provide an adequate level of service (LOS). A real-world road network (Boise downtown) is investigated by applying lane interventions on roadways experiencing high congestion. Microscopic traffic simulation and analyses are conducted to estimate the traffic flow parameters hence the performance of the road segments. The findings of this study show that interventions applied to critical and congested road segments improve the serviceability from LOS F to LOS E as well as from LOS D to LOS C. Besides, reduced travel time and vehicular delay (after applying intervention on critical components) are also observed for high demand OD pairs of the road network. As such the proposed framework has the potential to incorporate the topological credentials with traffic flow parameters and improve the performance of the road network. This systematic approach will help traffic managers and practitioners to develop strategies that enhance road network performance.

     
    more » « less
  4. Introduction Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph. 
    more » « less
  5. Abstract

    Community structure is a fundamental topological characteristic of optimally organized brain networks. Currently, there is no clear standard or systematic approach for selecting the most appropriate community detection method. Furthermore, the impact of method choice on the accuracy and robustness of estimated communities (and network modularity), as well as method‐dependent relationships between network communities and cognitive and other individual measures, are not well understood. This study analyzed large datasets of real brain networks (estimated from resting‐state fMRI from = 5251 pre/early adolescents in the adolescent brain cognitive development [ABCD] study), and = 5338 synthetic networks with heterogeneous, data‐inspired topologies, with the goal to investigate and compare three classes of community detection methods: (i) modularity maximization‐based (Newman and Louvain), (ii) probabilistic (Bayesian inference within the framework of stochastic block modeling (SBM)), and (iii) geometric (based on graph Ricci flow). Extensive comparisons between methods and their individual accuracy (relative to the ground truth in synthetic networks), and reliability (when applied to multiple fMRI runs from the same brains) suggest that the underlying brain network topology plays a critical role in the accuracy, reliability and agreement of community detection methods. Consistent method (dis)similarities, and their correlations with topological properties, were estimated across fMRI runs. Based on synthetic graphs, most methods performed similarly and had comparable high accuracy only in some topological regimes, specifically those corresponding to developed connectomes with at least quasi‐optimal community organization. In contrast, in densely and/or weakly connected networks with difficult to detect communities, the methods yielded highly dissimilar results, with Bayesian inference within SBM having significantly higher accuracy compared to all others. Associations between method‐specific modularity and demographic, anthropometric, physiological and cognitive parameters showed mostly method invariance but some method dependence as well. Although method sensitivity to different levels of community structure may in part explain method‐dependent associations between modularity estimates and parameters of interest, method dependence also highlights potential issues of reliability and reproducibility. These findings suggest that a probabilistic approach, such as Bayesian inference in the framework of SBM, may provide consistently reliable estimates of community structure across network topologies. In addition, to maximize robustness of biological inferences, identified network communities and their cognitive, behavioral and other correlates should be confirmed with multiple reliable detection methods.

     
    more » « less