skip to main content


This content will become publicly available on August 25, 2025

Title: Efficient Exact Subgraph Matching via GNN-based Path Dominance Embedding
The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications such as social network analysis, knowledge graph discovery in the Semantic Web, bibliographical network mining, and so on. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE approach for exact subgraph matching on both real and synthetic graph data.  more » « less
Award ID(s):
2217104
PAR ID:
10533448
Author(s) / Creator(s):
; ;
Publisher / Repository:
Very Large Data Base Endowment Inc.
Date Published:
Journal Name:
Proceedings of the International Conference on Very Large Data Bases
Volume:
17
Issue:
7
ISSN:
0278-2596
Page Range / eLocation ID:
1628-1641
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets—(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios.

     
    more » « less
  2. Skyline path queries (SPQs) extend skyline queries to multi-dimensional networks, such as multi-cost road networks (MCRNs). Such queries return a set of non-dominated paths between two given network nodes. Despite the existence of extensive works on evaluating different SPQ variants, SPQ evaluation is still very inefficient due to the nonexistence of efficient index structures to support such queries. Existing index building approaches for supporting shortest-path query execution, when directly extended to support SPQs, use an unreasonable amount of space and time to build, making them impractical for processing large graphs. In this paper, we propose a novel index structure,backbone index, and a corresponding index construction method that condenses an initial MCRN to multiple smaller summarized graphs with different granularity. We present efficient approaches to find approximate solutions to SPQs by utilizing the backbone index structure. Furthermore, considering making good use of historical query and query results, we propose two models,SkylinePathGraphNeuralNetwork (SP-GNN) andTransfer SP-GNN (TSP-GNN), to support effective SPQ processing. Our extensive experiments on real-world large road networks show that the backbone index can support finding meaningful approximate SPQ solutions efficiently. The backbone index can be constructed in a reasonable time, which dramatically outperforms the construction of other types of indexes for road networks. As far as we know, this is the first compact index structure that can support efficient approximate SPQ evaluation on large MCRNs. The results on the SP-GNN and TSP-GNN models also show that both models can help get approximate SPQ answers efficiently.

     
    more » « less
  3. Networks or graphs provide a natural and generic way for modeling rich structured data. Recent research on graph analysis has been focused on representation learning, of which the goal is to encode the network structures into distributed embedding vectors, so as to enable various downstream applications through off-the-shelf machine learning. However, existing methods mostly focus on node-level embedding, which is insufficient for subgraph analysis. Moreover, their leverage of network structures through path sampling or neighborhood preserving is implicit and coarse. Network motifs allow graph analysis in a finer granularity, but existing methods based on motif matching are limited to enumerated simple motifs and do not leverage node labels and supervision. In this paper, we develop NEST, a novel hierarchical network embedding method combining motif filtering and convolutional neural networks. Motif-based filtering enables NEST to capture exact small structures within networks, and convolution over the filtered embedding allows it to fully explore complex substructures and their combinations. NEST can be trivially applied to any domain and provide insight into particular network functional blocks. Extensive experiments on protein function prediction, drug toxicity prediction and social network community identification have demonstrated its effectiveness and efficiency. 
    more » « less
  4. Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M$+$ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph. 
    more » « less
  5. null (Ed.)
    Subgraph matching is a fundamental task in many applications which identifies all the embeddings of a query pattern in an input graph. Compilation-based subgraph matching systems generate specialized implementations for the provided patterns and often substantially outperform other systems. However, the generated code causes significant computation redundancy and the compilation process incurs too much overhead to be used online, both due to the inherent symmetry in the structure of the query pattern. In this paper, we propose an optimizing query compiler, named GraphZero, to completely address these limitations through symmetry breaking based on group theory. GraphZero implements three novel techniques. First, its schedule explorer efficiently prunes the schedule space without missing any high-performance schedule. Second, it automatically generates and enforces a set of restrictions to eliminate computation redundancy. Third, it generalizes orientation, a surprisingly effective optimization that was only used for clique patterns, to apply to arbitrary patterns. Evaluation on multiple query patterns shows that GraphZero outperforms two state-of-the-art compilation and non-compilation based systems by up to 40X and 2654X, respectively. 
    more » « less