We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g1, g2, . . . , gn} and a query graph q, we aim to search the graph gi ∈ g such that the graph edit distance between gi and q, GED(gi, q), is within a user-specified GED threshold, τ. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound cross-checking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases. 
                        more » 
                        « less   
                    
                            
                            Wiener Indices of Minuscule Lattices
                        
                    
    
            The Wiener index of a finite graph $$G$$ is the sum over all pairs $(p,q)$ of vertices of $$G$$ of the distance between $$p$$ and $$q$$. When $$P$$ is a finite poset, we define its Wiener index as the Wiener index of the graph of its Hasse diagram. In this paper, we find exact expressions for the Wiener indices of the distributive lattices of order ideals in minuscule posets. For infinite families of such posets, we also provide results on the asymptotic distribution of the distance between two random order ideals. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2246877
- PAR ID:
- 10510578
- Publisher / Repository:
- Electronic Journal of Combinatorics
- Date Published:
- Journal Name:
- The Electronic Journal of Combinatorics
- Volume:
- 31
- Issue:
- 1
- ISSN:
- 1077-8926
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)Abstract For positive integers n and d > 0, let $$G(\mathbb {Q}^n,\; d)$$ denote the graph whose vertices are the set of rational points $$\mathbb {Q}^n$$ , with $$u,v \in \mathbb {Q}^n$$ being adjacent if and only if the Euclidean distance between u and v is equal to d . Such a graph is deemed “non-trivial” if d is actually realized as a distance between points of $$\mathbb {Q}^n$$ . In this paper, we show that a space $$\mathbb {Q}^n$$ has the property that all pairs of non-trivial distance graphs $$G(\mathbb {Q}^n,\; d_1)$$ and $$G(\mathbb {Q}^n,\; d_2)$$ are isomorphic if and only if n is equal to 1, 2, or a multiple of 4. Along the way, we make a number of observations concerning the clique number of $$G(\mathbb {Q}^n,\; d)$$ .more » « less
- 
            We consider several variants of the map-matching problem, which seeks to find a path Q in graph G that has the smallest distance to a given trajectory P (which is likely not to be exactly on the graph). In a typical application setting, P models a noisy GPS trajectory from a person traveling on a road network, and the desired path Q should ideally correspond to the actual path in G that the person has traveled. Existing map-matching algorithms in the literature consider all possible paths in G as potential candidates for Q. We find solutions to the map-matching problem under different settings. In particular, we restrict the set of paths to shortest paths, or concatenations of shortest paths, in G. As a distance measure, we use the Fréchet distance, which is a suitable distance measure for curves since it takes the continuity of the curves into account.more » « less
- 
            Given a representation of a finite group G over some commutative base ring k, the cofixed space is the largest quotient of the representation on which the group acts trivially. If G acts by k-algebra automorphisms, then the cofixed space is a module over the ring of G-invariants. When the order of G is not invertible in the base ring, little is known about this module structure. We study the cofixed space in the case that G is the symmetric group on n letters acting on a polynomial ring by permuting its variables. When k has characteristic 0, the cofixed space is isomorphic to an ideal of the ring of symmetric polynomials in n variables. Localizing k at a prime integer p while letting n vary reveals striking behavior in these ideals. As n grows, the ideals stay stable in a sense, then jump in complexity each time n reaches a multiple of p.more » « less
- 
            We show that the square Hellinger distance between two Bayesian networks on the same directed graph, G, is subadditive with respect to the neighborhoods of G. Namely, if P and Q are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, H2(P,Q), between P and Q is upper bounded by the sum, ∑vH2(P{v}∪Πv,Q{v}∪Πv), of the square Hellinger distances between the marginals of P and Q on every node v and its parents Πv in the DAG. Importantly, our bound does not involve the conditionals but the marginals of P and Q. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks P and Q on the same (but potentially unknown) DAG satisfy P=Q vs dTV(P,Q)>ϵ can be performed from Õ (|Σ|3/4(d+1)⋅n/ϵ2) samples, where d is the maximum in-degree of the DAG and Σ the domain of each variable of the Bayesian networks. If P and Q are defined on potentially different and potentially unknown trees, the sample complexity becomes Õ (|Σ|4.5n/ϵ2), whose dependence on n,ϵ is optimal up to logarithmic factors. Lastly, if P and Q are product distributions over {0,1}n and Q is known, the sample complexity becomes O(n‾√/ϵ2), which is optimal up to constant factors.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    