skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On Finding and Analyzing the Backbone of the k-Core Structure of a Graph
In many network applications, dense subgraphs have proven to be extremely useful. One particular type of dense subgraph known as the k-core has received a great deal of attention. k-cores have been used in a number of important applications, including identifying important nodes, speeding up community detection, network visualization, and others. However, little work has investigated the ‘skeletal’ structure of the k-core, and the effect of such structures on the properties of the overall k-core and network itself. In this paper, we propose the Skeletal Core Subgraph, which describes the backbone of the k-core structure of a graph. We show how to categorize graphs based on their skeletal cores, and demonstrate how to efficiently decompose a given graph into its Skeletal Core Subgraph. We show both theoretically and experimentally the relationship between the Skeletal Core Subgraph and properties of the graph, including its core resilience.  more » « less
Award ID(s):
1908048
PAR ID:
10462518
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 IEEE International Conference on Data Mining (ICDM)
Page Range / eLocation ID:
1017 to 1022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding from a big graph those subgraphs that satisfy certain conditions is useful in many applications such as community detection and subgraph matching. These problems have a high time complexity, but existing systems that attempt to scale them are all IO-bound in execution. We propose the first truly CPU-bound distributed framework called G-thinker for subgraph finding algorithms, which adopts a task-based computation model, and which also provides a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph finding algorithms that can be easily adapted from existing serial algorithms. To utilize all CPU cores of a cluster, G-thinker features (1) a highly concurrent vertex cache for parallel task access and (2) a lightweight task scheduling approach that ensures high task throughput. These designs well overlap communication with computation to minimize the idle time of CPU cores. To further improve load balancing on graphs where the workloads of individual tasks can be drastically different due to biased graph density distribution, we propose to prioritize the scheduling of those tasks that tend to be long running for processing and decomposition, plus a timeout mechanism for task decomposition to prevent long-running straggler tasks. The idea has been integrated into a novelty algorithm for maximum clique finding (MCF) that adopts a hybrid task decomposition strategy, which significantly improves the running time of MCF on dense and large graphs: The algorithm finds a maximum clique of size 1,109 on a large and dense WikiLinks graph dataset in 70 minutes. Extensive experiments demonstrate that G-thinker achieves orders of magnitude speedup compared even with the fastest existing subgraph-centric system, and it scales well to much larger and denser real network data. G-thinker is open-sourced at http://bit.ly/gthinker with detailed documentation. 
    more » « less
  2. The k-core of a graph is the largest induced sub-graph with minimum degree k. The problem of k-core decomposition finds the k-cores of a graph for all valid values of k, and it has many applications such as network analysis, computational biology and graph visualization. Currently, there are two types of parallel algorithms for k-core decomposition: (1) degree-based vertex peeling, and (2) iterative h-index refinement. There is, however, few studies on accelerating k-core decomposition using GPU. In this paper, we propose a highly optimized peeling algorithm on a GPU, and compare it with possible implementations on top of think-like-a-vertex graph-parallel GPU systems as well as existing serial and parallel k-core decomposition algorithms on CPUs. Extensive experiments show that our GPU algorithm is the overall winner in both time and space. Our source code is released at https://github.com/akhlaqueak/KCoreGPU. 
    more » « less
  3. Given a user-specified minimum degree threshold γ, a γ-quasi-clique is a subgraph where each vertex connects to at least γ fraction of the other vertices. Quasi-clique is a natural definition for dense structures, so finding large and hence statistically significant quasi-cliques is useful in applications such as community detection in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive, and even a recent algorithm for mining large maximal quasi-cliques is flawed and can lead to a lot of repeated searches. This paper proposes a parallel solution for mining maximal quasi-cliques that is able to fully utilize CPU cores. Our solution utilizes divide and conquer to decompose the workloads into independent tasks for parallel mining, and we addressed the problem of (i) drastic load imbalance among different tasks and (ii) difficulty in predicting the task running time and the time growth with task subgraph size, by (a) using a timeout-based task decomposition strategy, and by (b) utilizing a priority task queue to schedule long-running tasks earlier for mining and decomposition to avoid stragglers. Unlike our conference version in PVLDB 2020 where the solution was built on a distributed graph mining framework called G-thinker, this paper targets a single-machine multi-core environment which is more accessible to an average end user. A general framework called T-thinker is developed to facilitate the programming of parallel programs for algorithms that adopt divide and conquer, including but not limited to our quasi-clique mining algorithm. Additionally, we consider the problem of directly mining large quasi-cliques from dense parts of a graph, where we identify the repeated search issue of a recent method and address it using a carefully designed concurrent trie data structure. Extensive experiments verify that our parallel solution scales well with the number of CPU cores, achieving 26.68× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges with 32 mining threads. Additionally, mining large quasi-cliques from dense parts can provide an additional speedup of up to 89.46×. 
    more » « less
  4. null (Ed.)
    Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of the methods and their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph prediction tasks challenging to tackle in many impactful applications. Further, subgraph prediction tasks present several unique challenges: subgraphs can have non-trivial internal topology, but also carry a notion of position and external connectivity information relative to the underlying graph in which they exist. Here, we introduce SubGNN, a subgraph neural network to learn disentangled subgraph representations. We propose a novel subgraph routing mechanism that propagates neural messages between the subgraph's components and randomly sampled anchor patches from the underlying graph, yielding highly accurate subgraph representations. SubGNN specifies three channels, each designed to capture a distinct aspect of subgraph topology, and we provide empirical evidence that the channels encode their intended properties. We design a series of new synthetic and real-world subgraph datasets. Empirical results for subgraph classification on eight datasets show that SubGNN achieves considerable performance gains, outperforming strong baseline methods, including node-level and graph-level GNNs, by 19.8% over the strongest baseline. SubGNN performs exceptionally well on challenging biomedical datasets where subgraphs have complex topology and even comprise multiple disconnected components. 
    more » « less
  5. ABSTRACT The internal velocity structure within dense gaseous cores plays a crucial role in providing the initial conditions for star formation in molecular clouds. However, the kinematic properties of dense gas at core scales (∼0.01−0.1 pc) has not been extensively characterized because of instrument limitations until the unique capabilities of GBT-Argus became available. The ongoing GBT-Argus Large Program, Dynamics in Star-forming Cores (DiSCo) thus aims to investigate the origin and distribution of angular momentum of star-forming cores. DiSCo will survey all starless cores and Class 0 protostellar cores in the Perseus molecular complex down to ∼0.01 pc scales with <0.05 km s−1 velocity resolution using the dense gas tracer N2H+. Here, we present the first data sets from DiSCo towards the B1 and NGC 1333 regions in Perseus. Our results suggest that a dense core’s internal velocity structure has little correlation with other core-scale properties, indicating these gas motions may be originated externally from cloud-scale turbulence. These first data sets also reaffirm the ability of GBT-Argus for studying dense core velocity structure and provided an empirical basis for future studies that address the angular momentum problem with a statistically broad sample. 
    more » « less