skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local hypergraph clustering using capacity releasing diffusion
Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.  more » « less
Award ID(s):
1909528 2007481
PAR ID:
10542636
Author(s) / Creator(s):
;
Editor(s):
Sendiña-Nadal, Irene
Publisher / Repository:
Public Library of Science (PLoS)
Date Published:
Journal Name:
PLOS ONE
Volume:
15
Issue:
12
ISSN:
1932-6203
Page Range / eLocation ID:
e0243485
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study p -Laplacians and spectral clustering for a recently proposed hypergraph model that incorporates edge-dependent vertex weights (EDVW). These weights can reflect different importance of vertices within a hyperedge, thus conferring the hypergraph model higher expressivity and flexibility. By constructing submodular EDVW-based splitting functions, we convert hypergraphs with EDVW into submodular hypergraphs for which the spectral theory is better developed. In this way, existing concepts and theorems such as p -Laplacians and Cheeger inequalities proposed under the submodular hypergraph setting can be directly extended to hypergraphs with EDVW. For submodular hypergraphs with EDVW-based splitting functions, we propose an efficient algorithm to compute the eigenvector associated with the second smallest eigenvalue of the hypergraph 1-Laplacian. We then utilize this eigenvector to cluster the vertices, achieving higher clustering accuracy than traditional spectral clustering based on the 2-Laplacian. More broadly, the proposed algorithm works for all submodular hypergraphs that are graph reducible. Numerical experiments using real-world data demonstrate the effectiveness of combining spectral clustering based on the 1-Laplacian and EDVW. 
    more » « less
  2. The spectral properties of traditional (dyadic) graphs, where an edge connects exactly two vertices, are widely studied in different applications. These spectral properties are closely connected to the structural properties of dyadic graphs. We generalize such connections and characterize higher-order networks by their spectral information. We first split the higher-order graphs by their “edge orders” into several uniform hypergraphs. For each uniform hypergraph, we extract the corresponding spectral information from the transition matrices of carefully designed random walks. From each spectrum, we compute the first few spectral moments and use all such spectral moments across different “edge orders” as the higher-order graph representation. We show that these moments not only clearly indicate the return probabilities of random walks but are also closely related to various higher-order network properties such as degree distribution and clustering coefficient. Extensive experiments show the utility of this new representation in various settings. For instance, graph classification on higher-order graphs shows that this representation significantly outperforms other techniques. 
    more » « less
  3. This paper proposes and analyzes a novel clustering algorithm, called \emph{learning by unsupervised nonlinear diffusion (LUND)}, that combines graph-based diffusion geometry with techniques based on density and mode estimation. LUND is suitable for data generated from mixtures of distributions with densities that are both multimodal and supported near nonlinear sets. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process, and associated diffusion distances, as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures may be missed by the top eigenvectors of the graph Laplacian, commonly used in spectral clustering. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of LUND. We implement LUND and confirm its theoretical properties on illustrative data sets, demonstrating its theoretical and empirical advantages over both spectral and density-based clustering. 
    more » « less
  4. Path-based solutions have been shown to be useful for various graph analysis tasks, such as link prediction and graph clustering. However, they are no longer adequate for handling complex and gigantic graphs. Recently, motif-based analysis has attracted a lot of attention. A motif, or a small graph with a few nodes, is often considered as a fundamental unit of a graph. Motif-based analysis captures high-order structure between nodes, and performs better than traditional "edge-based" solutions. In this paper, we study motif-path , which is conceptually a concatenation of one or more motif instances. We examine how motif-paths can be used in three path-based mining tasks, namely link prediction, local graph clustering and node ranking. We further address the situation when two graph nodes are not connected through a motif-path, and develop a novel defragmentation method to enhance it. Experimental results on real graph datasets demonstrate the use of motif-paths and defragmentation techniques improves graph analysis effectiveness. 
    more » « less
  5. Abstract The growing complexity of biological data has spurred the development of innovative computational techniques to extract meaningful information and uncover hidden patterns within vast datasets. Biological networks, such as gene regulatory networks and protein-protein interaction networks, hold critical insights into biological features’ connections and functions. Integrating and analyzing high-dimensional data, particularly in gene expression studies, stands prominent among the challenges in deciphering these networks. Clustering methods play a crucial role in addressing these challenges, with spectral clustering emerging as a potent unsupervised technique considering intrinsic geometric structures. However, spectral clustering’s user-defined cluster number can lead to inconsistent and sometimes orthogonal clustering regimes. We propose theMulti-layer Bundling (MLB)method to address this limitation, combining multiple prominent clustering regimes to offer a comprehensive data view. We call the outcome clusters “bundles”. This approach refines clustering outcomes, unravels hierarchical organization, and identifies bridge elements mediating communication between network components. By layering clustering results, MLB provides a global-to-local view of biological feature clusters enabling insights into intricate biological systems. Furthermore, the method enhances bundle network predictions by integrating thebundle co-cluster matrixwith the affinity matrix. The versatility of MLB extends beyond biological networks, making it applicable to various domains where understanding complex relationships and patterns is needed. 
    more » « less