skip to main content


Search for: All records

Creators/Authors contains: "Yan, Da"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Subgraph search problems such as maximal clique enumeration and subgraph matching generate a search-space tree which is traversed in depth-first manner by serial backtracking algorithms that are recursive. Since Jenkins et al. reported the backtracking paradigm to be sub-optimal for GPU acceleration, breadth-first traversal of the search-space tree is widely adopted by GPU algorithms. However, they produce a lot of intermediate subgraphs that exhaust the GPU device memory. Recent works revive the depth-first backtracking paradigm for GPU acceleration, where each warp is a basic processing unit with its own stack in device memory for subgraph backtracking. However, they adopt complicated methods for load balancing that incur a lot of overheads. They also use hardcoded fixed space for stacks that is determined ad-hoc and may lead to inaccuracy when the allocated space is insufficient. In this paper, we use subgraph matching as a case study to propose novel depth-first GPU solutions to address the above problems. Our approach, called T-DFS, decomposes the compu- tation into independent tasks that process search-space subtrees, which are managed by an efficient lock-free circular task queue. Tasks are distributed to different warps for parallel processing, and a novel timeout mechanism is used to eliminate straggler tasks to ensure load balancing. We also support flexible and fine- grained dynamic memory allocation for stack spaces to avoid the stack space allocation pitfalls of existing works. Extensive experi- ments on real graphs show that T-DFS significantly outperforms existing depth-first GPU solutions for the subgraph matching application. 
    more » « less
  2. Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand. In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems. 
    more » « less
  3. Finding from a big graph those subgraphs that satisfy certain conditions (aka. subgraph search) is useful in many applications such as community detection and subgraph matching. These problems often generate a search-space tree with size exponential to the size of the input graph. GPUs with thousands of cores are a natural choice to speed up subgraph search, but existing GPU solutions either conduct BFS on the search-space tree which leads to memory overflow due to intermediate subgraph-size explosion, or they conduct DFS on the search-space tree which is memory-efficient but can be 2 orders of magnitude slower than a BFS solution. In this paper, we present G2-AIMD, a subgraph-centric framework for efficient subGraph Search on GPUs, which enjoys the efficiency of BFS on the search-space tree, while avoids intermediate subgraph-size explosion with novel system designs such as adaptive chunk-size adjustment and host-memory subgraph buffering, inspired by the additive-increase/multiplicative-decrease (AIMD) algorithm in TCP congestion control. G2-AIMD provides a convenient subgraph-centric programming interface to facilitate the implementation of subgraph search algorithms on top, so as to enjoy the above performance merits. G2-AIMD also supports multi-GPU execution where each GPU only needs to load a fraction of the input graph. To demonstrate the efficiency and scalability of G2-AIMD, two algorithms were implemented on top with additional optimization techniques, and they significantly outperform the existing GPU solutions. 
    more » « less
  4. In this demonstration paper, we describe FSM-Explorer, an interactive tool for that makes it easier for end-users to mine frequent subgraph patterns from a big graph G, and to explore the subgraph instances in G that match the patterns. FSM-Explorer not only supports the popular MNI support measure, but also the recently proposed Fraction-Score measure that is more accurate. Its backend engine is built on top of the recent T-FSM system that ensures high concurrency, bounded memory consumption, and effective load balancing. Using real-world data, we showcase how users can mine frequent subgraph patterns by parameter tuning in FSM-Explorer, and how they can conveniently examine the many matched instances in G one batch at a time to improve productivity. 
    more » « less
  5. Free, publicly-accessible full text available November 27, 2024
  6. The prediction of stable crystal structures is an important part of designing solid-state crystalline materials with desired properties. Recent advances in structural feature representations and generative neural networks promise the ability to efficiently create new stable structures to use for inverse design and to search for materials with tailored functionalities. 
    more » « less
    Free, publicly-accessible full text available July 3, 2024
  7. Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high concurrency, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Our experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining. Our system code has been released at https://github.com/lyuheng/T-FSM. 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  8. The k-core of a graph is the largest induced sub-graph with minimum degree k. The problem of k-core decomposition finds the k-cores of a graph for all valid values of k, and it has many applications such as network analysis, computational biology and graph visualization. Currently, there are two types of parallel algorithms for k-core decomposition: (1) degree-based vertex peeling, and (2) iterative h-index refinement. There is, however, few studies on accelerating k-core decomposition using GPU. In this paper, we propose a highly optimized peeling algorithm on a GPU, and compare it with possible implementations on top of think-like-a-vertex graph-parallel GPU systems as well as existing serial and parallel k-core decomposition algorithms on CPUs. Extensive experiments show that our GPU algorithm is the overall winner in both time and space. Our source code is released at https://github.com/akhlaqueak/KCoreGPU. 
    more » « less
  9. Flood inundation mapping from Earth imagery plays a vital role in rapid disaster response and national water forecasting. However, the problem is non-trivial due to significant imagery noise and obstacles, complex spatial dependency on 3D terrains, spatial non-stationarity, and high computational cost. Existing machine learning approaches are mostly terrain-unaware and are prone to produce spurious results due to imagery noise and obstacles, requiring significant efforts in post-processing. Recently, several terrain- aware methods were proposed that incorporate complex spatial dependency (e.g., water flow directions on 3D terrains) but they assume that the inferred flood surface level is spatially stationary, making them insufficient for a large heterogeneous geographic area. To address these limitations, this paper proposes a novel spatial learning framework called hidden Markov forest, which decomposes a large heterogeneous area into local stationary zones, represents spatial dependency on 3D terrains via zonal trees (forest), and jointly infers the class map in different zonal trees with spatial regularization. We design efficient inference algorithms based on dynamic programming and multi-resolution filtering. Evaluations on real-world datasets show that our method outperforms baselines and our proposed computational refinement significantly reduces the time cost. 
    more » « less