The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications such as social network analysis, knowledge graph discovery in the Semantic Web, bibliographical network mining, and so on. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE approach for exact subgraph matching on both real and synthetic graph data.
more »
« less
Workload-aware Subgraph Query Caching and Processing in Large Graphs
A subgraph query q that finds as output all its subgraph-isomorphic embeddings from a data graph g has been core to modern declarative querying in large graphs. In this paper, we address subgraph queries with the availability of query workload information, W = {w1,...,wn}, where wi ∈ W is a previously issued query with all its subgraph isomorphic embeddings cached beforehand. We introduce a workload-aware subgraph querying framework, WaSQ, that leverages query workload for subgraph query rewriting, search plan refinement, partial results reusing, and false positive filtering towards facilitating the whole subgraph querying process. Experimental studies in real-world graphs demonstrate that WaSQ achieves significant and consistent performance gains in comparison with state-of-the-art, workload-oblivious solutions for large-scale subgraph querying.
more »
« less
- Award ID(s):
- 1743142
- PAR ID:
- 10095243
- Date Published:
- Journal Name:
- 2019 IEEE 35th International Conference on Data Engineering
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The celebrated Erdős-Pósa Theorem, in one formulation, asserts that for every c ∈ N, graphs with no subgraph (or equivalently, minor) isomorphic to the disjoint union of c cycles have bounded treewidth. What can we say about the treewidth of graphs containing no induced subgraph isomorphic to the disjoint union of c cycles? Let us call these graphs c-perforated. While 1-perforated graphs have treewidth one, complete graphs and complete bipartite graphs are examples of 2-perforated graphs with arbitrarily large treewidth. But there are sparse examples, too: Bonamy, Bonnet, Déprés, Esperet, Geniet, Hilaire, Thomassé and Wesolek constructed 2-perforated graphs with arbitrarily large treewidth and no induced subgraph isomorphic to K3 or K3,3; we call these graphs occultations. Indeed, it turns out that a mild (and inevitable) adjustment of occultations provides examples of 2-perforated graphs with arbitrarily large treewidth and arbitrarily large girth, which we refer to as full occultations. Our main result shows that the converse also holds: for every c ∈ N, a c-perforated graph has large treewidth if and only if it contains, as an induced subgraph, either a large complete graph, or a large complete bipartite graph, or a large full occultation. This distinguishes c-perforated graphs, among graph classes purely defined by forbidden induced subgraphs, as the first to admit a grid-type theorem incorporating obstructions other than subdivided walls and their line graphs. More generally, for all c, o ∈ N, we establish a full characterization of induced subgraph obstructions to bounded treewidth in graphs containing no induced subgraph isomorphic to the disjoint union of c cycles, each of length at least o + 2.more » « less
-
For unweighted graphs, finding isometric embeddings of a graph G is closely related to decompositions of G into Cartesian products of smaller graphs. When G is isomorphic to a Cartesian graph product, we call the factors of this product a factorization of G. When G is isomorphic to an isometric subgraph of a Cartesian graph product, we call those factors a pseudofactorization of G. Prior work has shown that an unweighted graph’s pseudofactorization can be used to generate a canonical isometric embedding into a product of the smallest possible pseudofactors. However, for arbitrary weighted graphs, which represent a richer variety of metric spaces, methods for finding isometric embeddings or determining their existence remain elusive, and indeed pseudofactorization and factorization have not previously been extended to this context. In this work, we address the problem of finding the factorization and pseudofactorization of a weighted graph G, where G satisfies the property that every edge constitutes a shortest path between its endpoints. We term such graphs minimal graphs, noting that every graph can be made minimal by removing edges not affecting its path metric. We generalize pseudofactorization and factorization to minimal graphs and develop new proof techniques that extend the previously proposed algorithms due to Graham and Winkler [Graham and Winkler, ’85] and Feder [Feder, ’92] for pseudofactorization and factorization of unweighted graphs. We show that any n-vertex, m-edge graph with positive integer edge weights can be factored in O(m2) time, plus the time to find all pairs shortest paths (APSP) distances in a weighted graph, resulting in an overall running time of O(m2+n2 log log n) time. We also show that a pseudofactorization for such a graph can be computed in O(mn) time, plus the time to solve APSP, resulting in an O(mn + n2 log log n) running time.more » « less
-
Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider a relaxed version of this problem in the setting of local algorithms. The relaxation is that the constructed subgraph is a sparse spanning subgraph containing at most (1+ϵ)n edges (where n is the number of vertices and ϵ is a given approximation/sparsity parameter). In the local setting, the goal is to quickly determine whether a given edge e belongs to such a subgraph, without constructing the whole subgraph, but rather by inspecting (querying) the local neighborhood of e. The challenge is to maintain consistency. That is, to provide answers concerning different edges according to the same spanning subgraph. We first show that for general bounded-degree graphs, the query complexity of any such algorithm must be Ω(n−−√). This lower bound holds for constant-degree graphs that have high expansion. Next we design an algorithm for (bounded-degree) graphs with high expansion, obtaining a result that roughly matches the lower bound. We then turn to study graphs that exclude a fixed minor (and are hence non-expanding). We design an algorithm for such graphs, which may have an unbounded maximum degree. The query complexity of this algorithm is poly(1/ϵ,h) (independent of n and the maximum degree), where h is the number of vertices in the excluded minor. Though our two algorithms are designed for very different types of graphs (and have very different complexities), on a high-level there are several similarities, and we highlight both the similarities and the differences.more » « less
-
Abstract The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets—(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios.more » « less
An official website of the United States government

