skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Identification of disease modules using higher-order network structure
Motivation: Higher-order interaction patterns among proteins have the potential to reveal mechanisms behind molecular processes and diseases. While clustering methods are used to identify functional groups within molecular interaction networks, these methods largely focus on edge density and do not explicitly take into consideration higher-order interactions. Disease genes in these networks have been shown to exhibit rich higher-order structure in their vicinity, and considering these higher-order interaction patterns in network clustering have the potential to reveal new disease-associated modules. Results: We propose a higher-order community detection method which identifies community structure in networks with respect to specific higher-order connectivity patterns beyond edges. Higher-order community detection on four different protein–protein interaction networks identifies biologically significant modules and disease modules that conventional edge-based clustering methods fail to discover. Higher-order clusters also identify disease modules from genome-wide association study data, including new modules that were not discovered by top-performing approaches in a Disease Module DREAM Challenge. Our approach provides a more comprehensive view of community structure that enables us to predict new disease–gene associations.  more » « less
Award ID(s):
1750981
PAR ID:
10515492
Author(s) / Creator(s):
; ;
Editor(s):
Forslund, Sofia
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics Advances
Volume:
3
Issue:
1
ISSN:
2635-0041
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis. 
    more » « less
  2. Abstract: The growing complexity of biological data has spurred the development of innovative computational techniques to extract meaningful information and uncover hidden patterns within vast datasets. Biological networks, such as gene regulatory networks and protein-protein interaction networks, hold critical insights into biological features’ connections and functions. Integrating and analyzing high-dimensional data, particularly in gene expression studies, stands prominent among the challenges in deciphering these networks. Clustering methods play a crucial role in addressing these challenges, with spectral clustering emerging as a potent unsupervised technique considering intrinsic geometric structures. However, spectral clustering’s user-defined cluster number can lead to inconsistent and sometimes orthogonal clustering regimes. We propose theMulti-layer Bundling (MLB)method to address this limitation, combining multiple prominent clustering regimes to offer a comprehensive data view. We call the outcome clusters “bundles”. This approach refines clustering outcomes, unravels hierarchical organization, and identifies bridge elements mediating communication between network components. By layering clustering results, MLB provides a global-to-local view of biological feature clusters enabling insights into intricate biological systems. Furthermore, the method enhances bundle network predictions by integrating thebundle co-cluster matrixwith the affinity matrix. The versatility of MLB extends beyond biological networks, making it applicable to various domains where understanding complex relationships and patterns is needed. 
    more » « less
  3. null (Ed.)
    Abstract We analyze networks of functional correlations between brain regions to identify changes in their structure caused by Attention Deficit Hyperactivity Disorder ( adhd ). We express the task for finding changes as a network anomaly detection problem on temporal networks. We propose the use of a curvature measure based on the Forman–Ricci curvature, which expresses higher-order correlations among two connected nodes. Our theoretical result on comparing this Forman–Ricci curvature with another well-known notion of network curvature, namely the Ollivier–Ricci curvature, lends further justification to the assertions that these two notions of network curvatures are not well correlated and therefore one of these curvature measures cannot be used as an universal substitute for the other measure. Our experimental results indicate nine critical edges whose curvature differs dramatically in brains of adhd patients compared to healthy brains. The importance of these edges is supported by existing neuroscience evidence. We demonstrate that comparative analysis of curvature identifies changes that more traditional approaches, for example analysis of edge weights, would not be able to identify. 
    more » « less
  4. Valencia, Alfonso (Ed.)
    Abstract Motivation Protein function prediction, based on the patterns of connection in a protein–protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein–protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein–protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. Results GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein–protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein–protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein–protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson’s Disease GWAS genes, rediscover many genes which have known involvement in Parkinson’s disease pathways, plus suggest some new genes to study. Availability and implementation All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. The structure of neural circuitry plays a crucial role in brain function. Previous studies of brain organization generally had to trade off between coarse descriptions at a large scale and fine descriptions on a small scale. Researchers have now reconstructed tens to hundreds of thousands of neurons at synaptic resolution, enabling investigations into the interplay between global, modular organization, and cell type-specific wiring. Analyzing data of this scale, however, presents unique challenges. To address this problem, we applied novel community detection methods to analyze the synapse-level reconstruction of an adult femaleDrosophila melanogasterbrain containing >20,000 neurons and 10 million synapses. Using a machine-learning algorithm, we find the most densely connected communities of neurons by maximizing a generalized modularity density measure. We resolve the community structure at a range of scales, from large (on the order of thousands of neurons) to small (on the order of tens of neurons). We find that the network is organized hierarchically, and larger-scale communities are composed of smaller-scale structures. Our methods identify well-known features of the fly brain, including its sensory pathways. Moreover, focusing on specific brain regions, we are able to identify subnetworks with distinct connectivity types. For example, manual efforts have identified layered structures in the fan-shaped body. Our methods not only automatically recover this layered structure, but also resolve finer connectivity patterns to downstream and upstream areas. We also find a novel modular organization of the superior neuropil, with distinct clusters of upstream and downstream brain regions dividing the neuropil into several pathways. These methods show that the fine-scale, local network reconstruction made possible by modern experimental methods are sufficiently detailed to identify the organization of the brain across scales, and enable novel predictions about the structure and function of its parts. Significance StatementThe Hemibrain is a partial connectome of an adult femaleDrosophila melanogasterbrain containing >20,000 neurons and 10 million synapses. Analyzing the structure of a network of this size requires novel and efficient computational tools. We applied a new community detection method to automatically uncover the modular structure in the Hemibrain dataset by maximizing a generalized modularity measure. This allowed us to resolve the community structure of the fly hemibrain at a range of spatial scales revealing a hierarchical organization of the network, where larger-scale modules are composed of smaller-scale structures. The method also allowed us to identify subnetworks with distinct cell and connectivity structures, such as the layered structures in the fan-shaped body, and the modular organization of the superior neuropil. Thus, network analysis methods can be adopted to the connectomes being reconstructed using modern experimental methods to reveal the organization of the brain across scales. This supports the view that such connectomes will allow us to uncover the organizational structure of the brain, which can ultimately lead to a better understanding of its function. 
    more » « less