skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2025

Title: Finding conserved low‐diameter subgraphs in social and biological networks
Abstract The analysis of social and biological networks often involves modeling clusters of interest ascliquesor their graph‐theoretic generalizations. The ‐club model, which relaxes the requirement of pairwise adjacency in a clique to length‐bounded paths inside the cluster, has been used to model cohesive subgroups in social networks and functional modules or complexes in biological networks. However, if the graphs are time‐varying, or if they change under different conditions, we may be interested in clusters that preserve their property over time or under changes in conditions. To model such clusters that are conserved in a collection of graphs, we consider across‐graph‐clubmodel, a subset of nodes that forms a ‐club in every graph in the collection. In this article, we consider the canonical optimization problem of finding a cross‐graph ‐club of maximum cardinality in a graph collection. We develop integer programming approaches to solve this problem. Specifically, we introduce strengthened formulations, valid inequalities, and branch‐and‐cut algorithms based on delayed constraint generation. The results of our computational study indicate the significant benefits of using the approaches we introduce.  more » « less
Award ID(s):
2145553
PAR ID:
10560982
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Networks
Volume:
84
Issue:
4
ISSN:
0028-3045
Page Range / eLocation ID:
509 to 527
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The analysis of social and biological networks often involves model- ing clusters of interest as cliques or their graph-theoretic generaliza- tions. The 𝑘-club model, which relaxes the requirement of pairwise adjacency in a clique to length-bounded paths inside the cluster, has been used to model cohesive subgroups in social networks and functional modules/complexes in biological networks. However, if the graphs are time-varying, or if they change under different conditions, we may be interested in clusters that preserve their property over time or under changes in conditions. To model such clusters that are conserved in a collection of graphs, we consider a cross-graph 𝑘-club model, a subset of nodes that forms a 𝑘-club in every graph in the collection. In this paper, we consider the canonical optimization problem of finding a cross-graph 𝑘-club of maximum cardinality. We introduce algorithmic ideas to solve this problem and evaluate their performance on some benchmark instances. Published in: Proceedings of The International Network Optimization Conference (INOC) 2022, Aachen, Germany 
    more » « less
  2. Cliques and their generalizations are frequently used to model “tightly knit” clusters in graphs and identifying such clusters is a popular technique used in graph-based data mining. One such model is the s-club, which is a vertex subset that induces a subgraph of diameter at most s. This model has found use in a variety of fields because low-diameter clusters have practical significance in many applications. As this property is not hereditary on vertex-induced subgraphs, the diameter of a subgraph could increase upon the removal of some vertices and the subgraph could even become disconnected. For example, star graphs have diameter two but can be disconnected by removing the central vertex. The pursuit of a fault-tolerant extension of the s-club model has spawned two variants that we study in this article: robust s-clubs and hereditary s-clubs. We analyze the complexity of the verification and optimization problems associated with these variants. Then, we propose cut-like integer programming formulations for both variants whenever possible and investigate the separation complexity of the cut-like constraints. We demonstrate through our extensive computational experiments that the algorithmic ideas we introduce enable us to solve the problems to optimality on benchmark instances with several thousand vertices. This work lays the foundations for effective mathematical programming approaches for finding fault-tolerant s-clubs in large-scale networks. History: Accepted by David Alderson, Area Editor for Network Optimization: Algorithms & Applications. Funding: The computing for this project was performed at the High Performance Computing Center at Oklahoma State University supported in part through the National Science Foundation [Grant OAC-1531128]. This material is based upon work supported by the National Science Foundation under [Grants 1662757 and 1942065]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.1231 . 
    more » « less
  3. Graph clustering is a fundamental problem in social network analysis, the goal of which is to group vertices of a graph into a series of densely knitted clusters with each cluster well separated from all the others. Classical graph clustering methods take advantage of the graph topology to model and quantify vertex proximity. With the proliferation of rich graph contents, such as user profiles in social networks, and gene annotations in protein interaction networks, it is essential to consider both the structure and content information of graphs for high-quality graph clustering. In this paper, we propose a graph embedding approach to clustering content-enriched graphs. The key idea is to embed each vertex of a graph into a continuous vector space where the localized structural and attributive information of vertices can be encoded in a unified, latent representation. Specifically, we quantify vertex-wise attribute proximity into edge weights, and employ truncated, attribute-aware random walks to learn the latent representations for vertices. We evaluate our attribute-aware graph embedding method in real-world attributed graphs, and the results demonstrate its effectiveness in comparison with state-of-the-art algorithms. 
    more » « less
  4. Abstract Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complexaggregate graph queries(AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches. 
    more » « less
  5. We study a matrix completion problem that lever-ages a hierarchical structure of social similarity graphs as side information in the context of recommender systems. We assume that users are categorized into clusters, each of which comprises sub-clusters (or what we call “groups”). We consider a low-rank matrix model for the rating matrix, and a hierarchical stochastic block model that well respects practically-relevant social graphs.Under this setting, we characterize the information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) as a function of the quality of graph side information (to be detailed) by proving sharp upper and lower bounds on the sample complexity. Furthermore, we develop a matrix completion algorithm and empirically demonstrate via extensive experiments that the proposed algorithm achieves the optimal sample complexity. 
    more » « less