Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a crossnetwork node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without resolving it from scratch. Experimental results show that our method can achieve (1) up to 16x speedup when applying on networks with 6M$+$ nodes; (2) preserving moremore »
GFinder: Approximate Attributed Subgraph Matching
Subgraph matching is a core primitive across a
number of disciplines, ranging from data mining, databases,
information retrieval, computer vision to natural language processing.
Despite decades of efforts, it is still highly challenging to
balance between the matching accuracy and the computational
efficiency, especially when the query graph and/or the data graph
are large. In this paper, we propose an indexbased algorithm
(GFINDER) to find the topk approximate matching subgraphs.
At the heart of the proposed algorithm are two techniques,
including (1) a novel auxiliary data structure (LOOKUPTABLE)
in conjunction with a neighborhood expansion method to effectively
and efficiently index candidate vertices, and (2) a dynamic
filtering and refinement strategy to prune the false candidates at
an early stage. The proposed GFINDER bears some distinctive
features, including (1) generality, being able to handle different
types of inexact matching (e.g., missing nodes, missing edges,
intermediate vertices) on node attributed and/or edge attributed
graphs or multigraphs; (2) effectiveness, achieving up to 30%
F1Score improvement over the best known competitor; and (3)
efficiency, scaling nearlinearly w.r.t. the size of the data graph
as well as the query graph.
 Publication Date:
 NSFPAR ID:
 10159289
 Journal Name:
 BigData
 Page Range or eLocationID:
 513 to 522
 Sponsoring Org:
 National Science Foundation
More Like this


For a graph G on n vertices, naively sampling the position of a random walk of at time t requires work Ω(t). We desire local access algorithms supporting positionG(t) queries, which return the position of a random walk from some fixed start vertex s at time t, where the joint distribution of returned positions is 1/ poly(n) close to those of a uniformly random walk in ℓ1 distance. We first give an algorithm for local access to random walks on a given undirected dregular graph with eO( 1 1−λ √ n) runtime per query, where λ is the secondlargest eigenvalue of the random walk matrix of the graph in absolute value. Since random dregular graphs G(n, d) are expanders with high probability, this gives an eO(√ n) algorithm for a graph drawn from G(n, d) whp, which improves on the naive method for small numbers of queries. We then prove that no algorithm with subconstant error given probe access to an input dregular graph can have runtime better than Ω(√ n/ log(n)) per query in expectation when the input graph is drawn from G(n, d), obtaining a nearly matching lower bound. We further show an Ω(n1/4) runtime per query lowermore »

Subgraph matching query is to find out the subgraphs of data graph G which match a given query graph Q. Traditional methods can not deal with big data graphs due to their high computational complex. In this paper, we propose a distributed topk subgraph search method over big graphs. The proposed method is designed at the level of single vertex and all vertices obtain their matching state separately without requiring global graph information. Therefore, it can be easily deployed in distributed platform like Hadoop. The evaluations of running time, number of messages and supersteps show the efficiency and scalability of the proposed method.

Subgraph matching query is to find out the subgraphs of data graph G which match a given query graph Q. Traditional methods can not deal with big data graphs due to their high computational complex. In this paper, we propose a distributed topk subgraph search method over big graphs. The proposed method is designed at the level of single vertex and all vertices obtain their matching state separately without requiring global graph information. Therefore, it can be easily deployed in distributed platform like Hadoop. The evaluations of running time, number of messages and supersteps show the efficiency and scalability of the proposed method.

Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "onthefly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sublinear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the ErdösRényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous localaccess implementations for random graphs, we support VertexPair and NextNeighbor queries. In addition, we introduce a new RandomNeighbor query. Next, we give the first localaccess implementation for AllNeighbors queries inmore »