skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Mixed Membership from Adjacency Graph Via Systematic Edge Query: Identifiability and Algorithm
Graph clustering is a core technique for network analysis problems, e.g., community detection. This work puts forth a node clustering approach for largely incomplete adjacency graphs. Under the considered scenario, instead of having access to the complete graph, only a small amount of queries about the graph edges can be made for node clustering. This task is well-motivated in many large-scale network analysis problems, where complete graph acquisition is prohibitively costly. Prior work tackles this problem under the setting that the nodes only admit single membership and the clusters are disjoint, yet multiple membership nodes and overlapping clusters often arise in practice. Existing approaches also rely on random edge query patterns and convex optimization-based formulations, which give rise to a number of implementation and scalability challenges. This work offers a framework that provably learns the mixed membership of nodes from overlapping clusters using limited edge information. Our method is equipped with a systematic edge query pattern, which is arguably easier to implement relative to the random counterparts in certain applications, e.g., field survey based graph analysis. A lightweight scalable algorithm is proposed, and its performance characterizations are presented. Numerical experiments are used to showcase the effectiveness of our method  more » « less
Award ID(s):
2007836
PAR ID:
10292992
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
5370 to 5374
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This work proposes a new unsupervised (or self-supervised) node representation learning method that aims to leverage the coarse-grain information that is available in most graphs. This extends previous attempts that only leverage fine-grain information (similarities within local neighborhoods) or global graph information (similarities across all nodes). Intuitively, the proposed method identifies nodes that belong to the same clusters and maximizes their mutual information. Thus, coarse-grain (cluster-level) similarities that are shared between nodes are preserved in their representations. The core components of the proposed method are (i) a jointly optimized clustering of nodes during learning and (ii) an Infomax objective term that preserves the mutual information among nodes of the same clusters. Our method is able to outperform competing state-of-art methods in various downstream tasks, such as node classification, link prediction, and node clustering. Experiments show that the average gain is between 0.2% and 6.1%, over the best competing approach, over all tasks. Our code is publicly available at: https://github.com/cmavro/Graph-InfoClust-GIC. 
    more » « less
  2. Path-based solutions have been shown to be useful for various graph analysis tasks, such as link prediction and graph clustering. However, they are no longer adequate for handling complex and gigantic graphs. Recently, motif-based analysis has attracted a lot of attention. A motif, or a small graph with a few nodes, is often considered as a fundamental unit of a graph. Motif-based analysis captures high-order structure between nodes, and performs better than traditional "edge-based" solutions. In this paper, we study motif-path , which is conceptually a concatenation of one or more motif instances. We examine how motif-paths can be used in three path-based mining tasks, namely link prediction, local graph clustering and node ranking. We further address the situation when two graph nodes are not connected through a motif-path, and develop a novel defragmentation method to enhance it. Experimental results on real graph datasets demonstrate the use of motif-paths and defragmentation techniques improves graph analysis effectiveness. 
    more » « less
  3. Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis. 
    more » « less
  4. null (Ed.)
    This paper will demonstrate a novel method for consolidating data in an engineered hypercube network for the purpose of optimizing query processing. Query processing typically calls for merging data collected from a small subset of server nodes in a network. This poses the problem of managing efficiently the exchange of data between processing nodes to complete some relational data operation. The method developed here is designed to minimize data transfer, measured as the product of data quantity and network distance, by delegating the processing to a node that is relatively central to the subset. A hypercube not only supports simple computation of network distance between nodes, but also allows for identifying a node to serve as the center for any data consolidation operations.We will show how the consolidation process can be performed by selecting a subgraph of a complex network to simplify the selection of a central node and thus facilitate the computations required. We will also show a prototype implementation of a hypercube using Software-Defined Networking to support query optimization in a distributed heterogeneous database system, making use of network distance information and data quantity. 
    more » « less
  5. A source node updates its status as a point process and also forwards its updates to a network of observer nodes. Within the network of observers, these updates are forwarded as point processes from node to node. Each node wishes its knowledge of the source to be as timely as possible. In this network, timeliness is measured by a discrete form of age of information: each status change at the source is referred to as a version and the age at a node is how many versions out of date is its most recent update from the source. This work introduces a method for evaluating the average version age at each node in the network when nodes forward updates using a memoryless gossip protocol. This method is then demonstrated by version age analysis for a collection of simple networks. For gossip on a complete graph with symmetric updating rates, it is shown that each node has average age that grows as the logarithm of the network size. 
    more » « less