NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Structural Clustering in Large Uncertain Graphs

https://doi.org/10.1109/ICDE48307.2020.00215

Liang, Yongjiang; Hu, Tingting; Zhao, Peixiang (April 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE))

Clustering uncertain graphs based on the probabilisticgraph model has sparked extensive research and widely varying applications. Existing structural clustering methods rely heavily on the computation of pairwise reliable structural similarity between vertices, which has proven to be extremely costly, especially in large uncertain graphs. In this paper, we develop a new, decomposition-based method, ProbSCAN, for efficient reliable structural similarity computation with theoretically improved complexity. We further design a cost-effective index structure UCNO-Index, and a series of powerful pruning strategies to expedite reliable structural similarity computation in uncertain graphs. Experimental studies on eight real-world uncertain graphs demonstrate the effectiveness of our proposed solutions, which achieves orders of magnitude improvement of clustering efficiency, compared with the state-of-the-art structural clustering methods in large uncertain graphs.
more » « less
Full Text Available
Workload-aware Subgraph Query Caching and Processing in Large Graphs

Liang, Yongjiang; Zhao, Peixiang (January 2019, 2019 IEEE 35th International Conference on Data Engineering)

A subgraph query q that finds as output all its subgraph-isomorphic embeddings from a data graph g has been core to modern declarative querying in large graphs. In this paper, we address subgraph queries with the availability of query workload information, W = {w1,...,wn}, where wi ∈ W is a previously issued query with all its subgraph isomorphic embeddings cached beforehand. We introduce a workload-aware subgraph querying framework, WaSQ, that leverages query workload for subgraph query rewriting, search plan refinement, partial results reusing, and false positive filtering towards facilitating the whole subgraph querying process. Experimental studies in real-world graphs demonstrate that WaSQ achieves significant and consistent performance gains in comparison with state-of-the-art, workload-oblivious solutions for large-scale subgraph querying.
more » « less
Full Text Available
Similarity Search in Graph Databases: A Multi-Layered Indexing Approach

Liang, Yongjiang; Zhao, Peixiang (January 2017, 33rd IEEE International Conference on Data Engineering, ICDE 2017)

We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g1, g2, . . . , gn} and a query graph q, we aim to search the graph gi ∈ g such that the graph edit distance between gi and q, GED(gi, q), is within a user-specified GED threshold, τ. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound cross-checking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases.
more » « less
Full Text Available

Search for: All records