skip to main content


Title: FIRST: Fast Interactive Attributed Subgraph Matching
Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M$+$ nodes; (2) preserving more than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.  more » « less
Award ID(s):
1651203 1947135
PAR ID:
10062450
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
KDD
Page Range / eLocation ID:
1447 to 1456
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Sylvester equation offers a powerful and unifying primitive for a variety of important graph mining tasks, including network alignment, graph kernel, node similarity, subgraph matching, etc. A major bottleneck of Sylvester equation lies in its high computational complexity. Despite tremendous effort, state-of-the-art methods still require a complexity that is at least \em quadratic in the number of nodes of graphs, even with approximations. In this paper, we propose a family of Krylov subspace based algorithms (\fasten) to speed up and scale up the computation of Sylvester equation for graph mining. The key idea of the proposed methods is to project the original equivalent linear system onto a Kronecker Krylov subspace. We further exploit (1) the implicit representation of the solution matrix as well as the associated computation, and (2) the decomposition of the original Sylvester equation into a set of inter-correlated Sylvester equations of smaller size. The proposed algorithms bear two distinctive features. First, they provide the \em exact solutions without any approximation error. Second, they significantly reduce the time and space complexity for solving Sylvester equation, with two of the proposed algorithms having a \em linear complexity in both time and space. Experimental evaluations on a diverse set of real networks, demonstrate that our methods (1) are up to $10,000\times$ faster against Conjugate Gradient method, the best known competitor that outputs the exact solution, and (2) scale up to million-node graphs. 
    more » « less
  2. The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications such as social network analysis, knowledge graph discovery in the Semantic Web, bibliographical network mining, and so on. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE approach for exact subgraph matching on both real and synthetic graph data. 
    more » « less
  3. null (Ed.)
    How can we identify the same or similar users from a collection of social network platforms (e.g., Facebook, Twitter, LinkedIn, etc.)? Which restaurant shall we recommend to a given user at the right time at the right location? Given a disease, which genes and drugs are most relevant? Multi-way association, which identifies strongly correlated node sets from multiple input networks, is the key to answering these questions. Despite its importance, very few multi-way association methods exist due to its high complexity. In this paper, we formulate multi-way association as a convex optimization problem, whose optimal solution can be obtained by a Sylvester tensor equation. Furthermore, we propose two fast algorithms to solve the Sylvester tensor equation, with a linear time and space complexity. We further provide theoretic analysis in terms of the sensitivity of the Sylvester tensor equation solution. Empirical evaluations demonstrate the efficacy of the proposed method. 
    more » « less
  4. null (Ed.)
    An active area of research in computational science is the design of algorithms for solving the subgraph matching problem to find copies of a given template graph in a larger world graph. Prior works have largely addressed single-channel networks using a variety of approaches. We present a suite of filtering methods for subgraph isomorphisms for multiplex networks (with different types of edges between nodes and more than one edge within each channel type). We aim to understand the entire solution space rather than focusing on finding one isomorphism. Results are shown on several classes of datasets: (a) Sudoku puzzles mapped to the subgraph isomorphism problem, (b) ErdsRnyi multigraphs, (c) real-world datasets from Twitter and transportation networks, (d) synthetic data created for the DARPA MAA program. 
    more » « less
  5. null (Ed.)
    Logical queries constitute an important subset of questions posed in knowledge graph question answering systems. Yet, effectively answering logical queries on large knowledge graphs remains a highly challenging problem. Traditional subgraph matching based methods might suffer from the noise and incompleteness of the underlying knowledge graph, often with a prolonged online response time. Recently, an alternative type of method has emerged whose key idea is to embed knowledge graph entities and the query in an embedding space so that the embedding of answer entities is close to that of the query. Compared with subgraph matching based methods, it can better handle the noisy or missing information in knowledge graph, with a faster online response. Promising as it might be, several fundamental limitations still exist, including the linear transformation assumption for modeling relations and the inability to answer complex queries with multiple variable nodes. In this paper, we propose an embedding based method (NewLook) to address these limitations. Our proposed method offers three major advantages. First (Applicability), it supports four types of logical operations and can answer queries with multiple variable nodes. Second (Effectiveness), the proposed NewLook goes beyond the linear transformation assumption, and thus consistently outperforms the existing methods. Third (Efficiency), compared with subgraph matching based methods, NewLook is at least 3 times faster in answering the queries; compared with the existing embed-ding based methods, NewLook bears a comparable or even faster online response and offline training time. 
    more » « less