skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Random Graph Matching with Improved Noise Robustness
Graph matching, also known as network alignment, refers to finding a bijection between the vertex sets of two given graphs so as to maximally align their edges. This fundamental computational problem arises frequently in multiple fields such as computer vision and biology. Recently, there has been a plethora of work studying efficient algorithms for graph matching under probabilistic models. In this work, we propose a new algorithm for graph matching: Our algorithm associates each vertex with a signature vector using a multistage procedure and then matches a pair of vertices from the two graphs if their signature vectors are close to each other. We show that, for two Erdős–Rényi graphs with edge correlation 1−α, our algorithm recovers the underlying matching exactly with high probability when α≤1/(loglogn)C, where n is the number of vertices in each graph and C denotes a positive universal constant. This improves the condition α≤1/(logn)C achieved in previous work.  more » « less
Award ID(s):
1807316
PAR ID:
10324891
Author(s) / Creator(s):
; ;
Editor(s):
Belkin, Mikhail; Samory Kpotufe
Date Published:
Journal Name:
Proceedings of Thirty Fourth Conference on Learning Theory
Volume:
134
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Belkin, Mikhail; Kpotufe, Samory (Ed.)
    Graph matching, also known as network alignment, refers to finding a bijection between the vertex sets of two given graphs so as to maximally align their edges. This fundamental computational problem arises frequently in multiple fields such as computer vision and biology. Recently, there has been a plethora of work studying efficient algorithms for graph matching under probabilistic models. In this work, we propose a new algorithm for graph matching: Our algorithm associates each vertex with a signature vector using a multistage procedure and then matches a pair of vertices from the two graphs if their signature vectors are close to each other. We show that, for two Erdős–Rényi graphs with edge correlation $$1-\alpha$$, our algorithm recovers the underlying matching exactly with high probability when $$\alpha \le 1 / (\log \log n)^C$$, where $$n$$ is the number of vertices in each graph and $$C$$ denotes a positive universal constant. This improves the condition $$\alpha \le 1 / (\log n)^C$$ achieved in previous work. 
    more » « less
  2. Barman, Siddharth; Lasota, Sławomir (Ed.)
    We consider the problem of estimating the size of a maximum matching in low-arboricity graphs in the dynamic graph stream model. In this setting, an algorithm with limited memory makes multiple passes over a stream of edge insertions and deletions, resulting in a low-arboricity graph. Let n be the number of vertices of the input graph, and α be its arboricity. We give the following results. 1) As our main result, we give a three-pass streaming algorithm that produces an (α + 2)(1 + ε)-approximation and uses space O(ε^{-2}⋅α²⋅n^{1/2}⋅log n). This result should be contrasted with the Ω(α^{-5/2}⋅n^{1/2}) space lower bound established by [Assadi et al., SODA'17] for one-pass algorithms, showing that, for graphs of constant arboricity, the one-pass space lower bound can be achieved in three passes (up to poly-logarithmic factors). Furthermore, we obtain a two-pass algorithm that uses space O(ε^{-2}⋅α²⋅n^{3/5}⋅log n). 2) We also give a (1+ε)-approximation multi-pass algorithm, where the space used is parameterized by an upper bound on the size of a largest matching. For example, using O(log log n) passes, the space required is O(ε^{-1}⋅α²⋅k⋅log n), where k denotes an upper bound on the size of a largest matching. Finally, we define a notion of arboricity in the context of matrices. This is a natural measure of the sparsity of a matrix that is more nuanced than simply bounding the total number of nonzero entries, but less restrictive than bounding the number of nonzero entries in each row and column. For such matrices, we exploit our results on estimating matching size to present upper bounds for the problem of rank estimation in the dynamic data stream model. 
    more » « less
  3. Bojanczyk, M. et (Ed.)
    Embedding graphs in a geographical or latent space, i.e. inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where n points are scattered uniformly in a square of area n, and two points have an edge between them if and only if their Euclidean distance is less than r. The reconstruction problem then consists of inferring the vertex positions, up to the symmetries of the square, given only the adjacency matrix of the resulting graph. We give an algorithm that, if r = n^α for α > 0, with high probability reconstructs the vertex positions with a maximum error of O(n^β) where β = 1/2-(4/3)α, until α ≥ 3/8 where β = 0 and the error becomes O(√{log n}). This improves over earlier results, which were unable to reconstruct with error less than r. Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors. We extend our results to the surface of the sphere in ℝ³ and to hypercubes in any constant dimension. 
    more » « less
  4. null (Ed.)
    This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $$1$$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined D-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their D-hop neighborhoods. This approach significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of graphs. Under the Chung-Lu random graph model with n vertices, max degree Θ(√n), and the power-law exponent 2<β<3, we show that as soon as D> 4-β/3-β, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only Ω((log n)4-β) initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires n1/2+ε seeds (for any small constant ε>0). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm. 
    more » « less
  5. We consider the maximum vertex-weighted matching problem (MVM), in which non-negative weights are assigned to the vertices of a graph, and the weight of a matching is the sum of the weights of the matched vertices. Although exact algorithms for MVM are faster than exact algorithms for the maximum edge-weighted matching problem, there are graphs on which these exact algorithms could take hundreds of hours. For a natural number k, we design a k/(k + 1)approximation algorithm for MVM on non-bipartite graphs that updates the matching along certain short paths in the graph: either augmenting paths of length at most 2k + 1 or weight-increasing paths of length at most 2k. The choice of k = 2 leads to a 2/3-approximation algorithm that computes nearly optimal weights fast. This algorithm could be initialized with a 2/3-approximate maximum cardinality matching to reduce its runtime in practice. A 1/2-approximation algorithm may be obtained using k = 1, which is faster than the 2/3-approximation algorithm but it computes lower weights. The 2/3-approximation algorithm has time complexity O(Δ2m) while the time complexity of the 1/2-approximation algorithm is O(Δm), where m is the number of edges and Δ is the maximum degree of a vertex. Results from our serial implementations show that on average the 1/2-approximation algorithm runs faster than the Suitor algorithm (currently the fastest 1/2-approximation algorithm) while the 2/3-approximation algorithm runs as fast as the Suitor algorithm but obtains higher weights for the matching. One advantage of the proposed algorithms is that they are well-suited for parallel implementation since they can process vertices to match in any order. The 1/2- and 2/3-approximation algorithms have been implemented on a shared memory parallel computer, and both approximation algorithms obtain good speedups, while the former algorithm runs faster on average than the parallel Suitor algorithm. Care is needed to design the parallel algorithm to avoid cyclic waits, and we prove that it is live-lock free. 
    more » « less