skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Faster Depth-First Subgraph Matching on GPUs
Subgraph search problems such as maximal clique enumeration and subgraph matching generate a search-space tree which is traversed in depth-first manner by serial backtracking algorithms that are recursive. Since Jenkins et al. reported the backtracking paradigm to be sub-optimal for GPU acceleration, breadth-first traversal of the search-space tree is widely adopted by GPU algorithms. However, they produce a lot of intermediate subgraphs that exhaust the GPU device memory. Recent works revive the depth-first backtracking paradigm for GPU acceleration, where each warp is a basic processing unit with its own stack in device memory for subgraph backtracking. However, they adopt complicated methods for load balancing that incur a lot of overheads. They also use hardcoded fixed space for stacks that is determined ad-hoc and may lead to inaccuracy when the allocated space is insufficient. In this paper, we use subgraph matching as a case study to propose novel depth-first GPU solutions to address the above problems. Our approach, called T-DFS, decomposes the compu- tation into independent tasks that process search-space subtrees, which are managed by an efficient lock-free circular task queue. Tasks are distributed to different warps for parallel processing, and a novel timeout mechanism is used to eliminate straggler tasks to ensure load balancing. We also support flexible and fine- grained dynamic memory allocation for stack spaces to avoid the stack space allocation pitfalls of existing works. Extensive experi- ments on real graphs show that T-DFS significantly outperforms existing depth-first GPU solutions for the subgraph matching application.  more » « less
Award ID(s):
2229394
PAR ID:
10498838
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
40th IEEE International Conference on Data Engineering (ICDE)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding from a big graph those subgraphs that satisfy certain conditions (aka. subgraph search) is useful in many applications such as community detection and subgraph matching. These problems often generate a search-space tree with size exponential to the size of the input graph. GPUs with thousands of cores are a natural choice to speed up subgraph search, but existing GPU solutions either conduct BFS on the search-space tree which leads to memory overflow due to intermediate subgraph-size explosion, or they conduct DFS on the search-space tree which is memory-efficient but can be 2 orders of magnitude slower than a BFS solution. In this paper, we present G2-AIMD, a subgraph-centric framework for efficient subGraph Search on GPUs, which enjoys the efficiency of BFS on the search-space tree, while avoids intermediate subgraph-size explosion with novel system designs such as adaptive chunk-size adjustment and host-memory subgraph buffering, inspired by the additive-increase/multiplicative-decrease (AIMD) algorithm in TCP congestion control. G2-AIMD provides a convenient subgraph-centric programming interface to facilitate the implementation of subgraph search algorithms on top, so as to enjoy the above performance merits. G2-AIMD also supports multi-GPU execution where each GPU only needs to load a fraction of the input graph. To demonstrate the efficiency and scalability of G2-AIMD, two algorithms were implemented on top with additional optimization techniques, and they significantly outperform the existing GPU solutions. 
    more » « less
  2. Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high concurrency, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Our experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining. Our system code has been released at https://github.com/lyuheng/T-FSM. 
    more » « less
  3. In this demonstration paper, we describe FSM-Explorer, an interactive tool for that makes it easier for end-users to mine frequent subgraph patterns from a big graph G, and to explore the subgraph instances in G that match the patterns. FSM-Explorer not only supports the popular MNI support measure, but also the recently proposed Fraction-Score measure that is more accurate. Its backend engine is built on top of the recent T-FSM system that ensures high concurrency, bounded memory consumption, and effective load balancing. Using real-world data, we showcase how users can mine frequent subgraph patterns by parameter tuning in FSM-Explorer, and how they can conveniently examine the many matched instances in G one batch at a time to improve productivity. 
    more » « less
  4. In this paper, a data-driven method is proposed for fast cascading outage screening in power systems. The proposed method combines a deep convolutional neural network (deep CNN) and a depth-first search (DFS) algorithm. First, a deep CNN is constructed as a security assessment tool to evaluate system security status based on observable information.With its automatic feature extraction ability and the high generalization, a well-trained deep CNN can produce estimated AC optimal power flow (ACOPF) results for various uncertain operation scenarios, i.e., fluctuated load and system topology change, in a nearly computation-free manner. Second, a scenario tree is built to represent the potential operation scenarios and the associated cascading outages. The DFS algorithm is developed as a fast screening tool to calculate the expected security index value for each cascading outage path along the entire tree, which can be a reference for system operators to take predictive measures against system collapse. The simulation results of applying the proposed deep CNN and the DFS algorithm on standard test cases verify their accuracy, and the computational efficiency is thousands of times faster than the model-based traditional approach, which implies the great potential of the proposed algorithm for online applications. 
    more » « less
  5. Cloud computing has become an emerging trend for the software industry with the requirement of large infrastructure and resources. The future success of cloud computing depends on the effectiveness of instantiation of the infrastructure and utilization of available resources. Load Balancing ensures the fulfillment of these conditions to improve the cloud environment for the users. Load Balancing dynamically distributes the workload among the nodes in such a way that no single resource is either overwhelmed with tasks or underutilized. In this paper we propose a threshold based load balancing algorithm to ensure the equal distribution of the workload among the nodes. The main objective of the algorithms is to stop the VMs in the cloud being overloaded with tasks or being idle for lack allocation of tasks, when there are active tasks. We have simulated our proposed algorithm in the Cloudanalyst simulator with real world data scenarios. Simulation results shows that our proposed threshold based algorithm can provide a better response time for the task/requests and data processing time for the datacenters compared to the existing algorithms such as First Come First Serve (FCFS), Round Robin(RR) and Equally Spread Current Execution Load Balancing algorithm(ESCELB). 
    more » « less