skip to main content


Title: Recovering a hidden community beyond the Kesten–Stigum threshold in O(|E|log*|V|) time
Abstract Community detection is considered for a stochastic block model graph of n vertices, with K vertices in the planted community, edge probability p for pairs of vertices both in the community, and edge probability q for other pairs of vertices. The main focus of the paper is on weak recovery of the community based on the graph G , with o ( K ) misclassified vertices on average, in the sublinear regime n 1- o (1) ≤ K ≤ o ( n ). A critical parameter is the effective signal-to-noise ratio λ = K 2 ( p - q ) 2 / (( n - K ) q ), with λ = 1 corresponding to the Kesten–Stigum threshold. We show that a belief propagation (BP) algorithm achieves weak recovery if λ > 1 / e, beyond the Kesten–Stigum threshold by a factor of 1 / e. The BP algorithm only needs to run for log * n + O (1) iterations, with the total time complexity O (| E |log * n ), where log * n is the iterated logarithm of n . Conversely, if λ ≤ 1 / e, no local algorithm can asymptotically outperform trivial random guessing. Furthermore, a linear message-passing algorithm that corresponds to applying a power iteration to the nonbacktracking matrix of the graph is shown to attain weak recovery if and only if λ > 1. In addition, the BP algorithm can be combined with a linear-time voting procedure to achieve the information limit of exact recovery (correctly classify all vertices with high probability) for all K ≥ ( n / log n ) (ρ BP + o (1)), where ρ BP is a function of p / q .  more » « less
Award ID(s):
1651588
NSF-PAR ID:
10089330
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Applied Probability
Volume:
55
Issue:
02
ISSN:
0021-9002
Page Range / eLocation ID:
325 to 352
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider a variant of the planted clique problem where we are allowed unbounded computational time but can only investigate a small part of the graph by adaptive edge queries. We determine (up to logarithmic factors) the number of queries necessary both for detecting the presence of a planted clique and for finding the planted clique. Specifically, let $G \sim G(n,1/2,k)$ be a random graph on $n$ vertices with a planted clique of size $k$. We show that no algorithm that makes at most $q = o(n^2 / k^2 + n)$ adaptive queries to the adjacency matrix of $G$ is likely to find the planted clique. On the other hand, when $k \geq (2+\epsilon) \log_2 n$ there exists a simple algorithm (with unbounded computational power) that finds the planted clique with high probability by making $q = O( (n^2 / k^2) \log^2 n + n \log n)$ adaptive queries. For detection, the additive $n$ term is not necessary: the number of queries needed to detect the presence of a planted clique is $n^2 / k^2$ (up to logarithmic factors). 
    more » « less
  2. Extremal combinatorics often deals with problems of maximizing a specific quantity related to substructures in large discrete structures. The first question of this kind that comes to one's mind is perhaps determining the maximum possible number of induced subgraphs isomorphic to a fixed graph $H$ in an $n$-vertex graph. The asymptotic behavior of this number is captured by the limit of the ratio of the maximum number of induced subgraphs isomorphic to $H$ and the number of all subgraphs with the same number vertices as $H$; this quantity is known as the _inducibility_ of $H$. More generally, one can define the inducibility of a family of graphs in the analogous way.Among all graphs with $k$ vertices, the only two graphs with inducibility equal to one are the empty graph and the complete graph. However, how large can the inducibility of other graphs with $k$ vertices be? Fix $k$, consider a graph with $n$ vertices join each pair of vertices independently by an edge with probability $\binom{k}{2}^{-1}$. The expected number of $k$-vertex induced subgraphs with exactly one edge is $e^{-1}+o(1)$. So, the inducibility of large graphs with a single edge is at least $e^{-1}+o(1)$. This article establishes that this bound is the best possible in the following stronger form, which proves a conjecture of Alon, Hefetz, Krivelevich and Tyomkyn: the inducibility of the family of $k$-vertex graphs with exactly $l$ edges where $0 more » « less
  3. Suppose a graph G is stochastically created by uniformly sampling vertices along a line segment and connecting each pair of vertices with a probability that is a known decreasing function of their distance. We ask if it is possible to reconstruct the actual positions of the vertices in G by only observing the generated unlabeled graph. We study this question for two natural edge probability functions — one where the probability of an edge decays exponentially with the distance and another where this probability decays only linearly. We initiate our study with the weaker goal of recovering only the order in which vertices appear on the line segment. For a segment of length n and a precision parameter δ, we show that for both exponential and linear decay edge probability functions, there is an efficient algorithm that correctly recovers (up to reflection symmetry) the order of all vertices that are at least δ apart, using only ˜ O( n / δ^2) samples (vertices). Building on this result, we then show that O( n^2 log n / δ^2) vertices (samples) are sufficient to additionally recover the location of each vertex on the line to within a precision of δ. We complement this result with an Ω( n^ 1.5 / δ ) lower bound on samples needed for reconstructing positions (even by a computationally unbounded algorithm), showing that the task of recovering positions is information-theoretically harder than recovering the order. We give experimental results showing that our algorithm recovers the positions of almost all points with high accuracy. 
    more » « less
  4. Abstract

    We establish rapid mixing of the random-cluster Glauber dynamics on random$$\varDelta $$Δ-regular graphs for all$$q\ge 1$$q1and$$pp<pu(q,Δ), where the threshold$$p_u(q,\varDelta )$$pu(q,Δ)corresponds to a uniqueness/non-uniqueness phase transition for the random-cluster model on the (infinite)$$\varDelta $$Δ-regular tree. It is expected that this threshold is sharp, and for$$q>2$$q>2the Glauber dynamics on random$$\varDelta $$Δ-regular graphs undergoes an exponential slowdown at$$p_u(q,\varDelta )$$pu(q,Δ). More precisely, we show that for every$$q\ge 1$$q1,$$\varDelta \ge 3$$Δ3, and$$pp<pu(q,Δ), with probability$$1-o(1)$$1-o(1)over the choice of a random$$\varDelta $$Δ-regular graph onnvertices, the Glauber dynamics for the random-cluster model has$$\varTheta (n \log n)$$Θ(nlogn)mixing time. As a corollary, we deduce fast mixing of the Swendsen–Wang dynamics for the Potts model on random$$\varDelta $$Δ-regular graphs for every$$q\ge 2$$q2, in the tree uniqueness region. Our proof relies on a sharp bound on the “shattering time”, i.e., the number of steps required to break up any configuration into$$O(\log n)$$O(logn)sized clusters. This is established by analyzing a delicate and novel iterative scheme to simultaneously reveal the underlying random graph with clusters of the Glauber dynamics configuration on it, at a given time.

     
    more » « less
  5. null (Ed.)
    Given a weighted planar bipartite graph G ( A ∪ B , E ) where each edge has an integer edge cost, we give an Õ( n 4/3 log nC ) time algorithm to compute minimum-cost perfect matching; here C is the maximum edge cost in the graph. The previous best-known planarity exploiting algorithm has a running time of O ( n 3/2 log n ) and is achieved by using planar separators (Lipton and Tarjan ’80). Our algorithm is based on the bit-scaling paradigm (Gabow and Tarjan ’89). For each scale, our algorithm first executes O ( n 1/3 ) iterations of Gabow and Tarjan’s algorithm in O ( n 4/3 ) time leaving only O ( n 2/3 ) vertices unmatched. Next, it constructs a compressed residual graph H with O ( n 2/3 ) vertices and O ( n ) edges. This is achieved by using an r -division of the planar graph G with r = n 2/3 . For each partition of the r -division, there is an edge between two vertices of H if and only if they are connected by a directed path inside the partition. Using existing efficient shortest-path data structures, the remaining O ( n 2/3 ) vertices are matched by iteratively computing a minimum-cost augmenting path, each taking Õ( n 2/3 ) time. Augmentation changes the residual graph, so the algorithm updates the compressed representation for each partition affected by the change in Õ( n 2/3 ) time. We bound the total number of affected partitions over all the augmenting paths by O ( n 2/3 log n ). Therefore, the total time taken by the algorithm is Õ( n 4/3 ). 
    more » « less