skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recovering a hidden community beyond the Kesten–Stigum threshold in O(|E|log*|V|) time
Abstract Community detection is considered for a stochastic block model graph of n vertices, with K vertices in the planted community, edge probability p for pairs of vertices both in the community, and edge probability q for other pairs of vertices. The main focus of the paper is on weak recovery of the community based on the graph G , with o ( K ) misclassified vertices on average, in the sublinear regime n 1- o (1) ≤ K ≤ o ( n ). A critical parameter is the effective signal-to-noise ratio λ = K 2 ( p - q ) 2 / (( n - K ) q ), with λ = 1 corresponding to the Kesten–Stigum threshold. We show that a belief propagation (BP) algorithm achieves weak recovery if λ > 1 / e, beyond the Kesten–Stigum threshold by a factor of 1 / e. The BP algorithm only needs to run for log * n + O (1) iterations, with the total time complexity O (| E |log * n ), where log * n is the iterated logarithm of n . Conversely, if λ ≤ 1 / e, no local algorithm can asymptotically outperform trivial random guessing. Furthermore, a linear message-passing algorithm that corresponds to applying a power iteration to the nonbacktracking matrix of the graph is shown to attain weak recovery if and only if λ > 1. In addition, the BP algorithm can be combined with a linear-time voting procedure to achieve the information limit of exact recovery (correctly classify all vertices with high probability) for all K ≥ ( n / log n ) (ρ BP + o (1)), where ρ BP is a function of p / q .  more » « less
Award ID(s):
1651588
PAR ID:
10089330
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Applied Probability
Volume:
55
Issue:
02
ISSN:
0021-9002
Page Range / eLocation ID:
325 to 352
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider a variant of the planted clique problem where we are allowed unbounded computational time but can only investigate a small part of the graph by adaptive edge queries. We determine (up to logarithmic factors) the number of queries necessary both for detecting the presence of a planted clique and for finding the planted clique. Specifically, let $$G \sim G(n,1/2,k)$$ be a random graph on $$n$$ vertices with a planted clique of size $$k$$. We show that no algorithm that makes at most $q = o(n^2 / k^2 + n)$ adaptive queries to the adjacency matrix of $$G$$ is likely to find the planted clique. On the other hand, when $$k \geq (2+\epsilon) \log_2 n$$ there exists a simple algorithm (with unbounded computational power) that finds the planted clique with high probability by making $$q = O( (n^2 / k^2) \log^2 n + n \log n)$$ adaptive queries. For detection, the additive $$n$$ term is not necessary: the number of queries needed to detect the presence of a planted clique is $n^2 / k^2$ (up to logarithmic factors). 
    more » « less
  2. null (Ed.)
    Given a weighted planar bipartite graph G ( A ∪ B , E ) where each edge has an integer edge cost, we give an Õ( n 4/3 log nC ) time algorithm to compute minimum-cost perfect matching; here C is the maximum edge cost in the graph. The previous best-known planarity exploiting algorithm has a running time of O ( n 3/2 log n ) and is achieved by using planar separators (Lipton and Tarjan ’80). Our algorithm is based on the bit-scaling paradigm (Gabow and Tarjan ’89). For each scale, our algorithm first executes O ( n 1/3 ) iterations of Gabow and Tarjan’s algorithm in O ( n 4/3 ) time leaving only O ( n 2/3 ) vertices unmatched. Next, it constructs a compressed residual graph H with O ( n 2/3 ) vertices and O ( n ) edges. This is achieved by using an r -division of the planar graph G with r = n 2/3 . For each partition of the r -division, there is an edge between two vertices of H if and only if they are connected by a directed path inside the partition. Using existing efficient shortest-path data structures, the remaining O ( n 2/3 ) vertices are matched by iteratively computing a minimum-cost augmenting path, each taking Õ( n 2/3 ) time. Augmentation changes the residual graph, so the algorithm updates the compressed representation for each partition affected by the change in Õ( n 2/3 ) time. We bound the total number of affected partitions over all the augmenting paths by O ( n 2/3 log n ). Therefore, the total time taken by the algorithm is Õ( n 4/3 ). 
    more » « less
  3. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We give the first almost-linear time algorithm for computing the maximal k-edge-connected subgraphs of an undirected unweighted graph for any constant k. More specifically, given an n-vertex m-edge graph G = (V,E) and a number k = log^o(1) n, we can deterministically compute in O(m+n^{1+o(1)}) time the unique vertex partition {V_1,… ,V_z} such that, for every i, V_i induces a k-edge-connected subgraph while every superset V'_i ⊃ V_{i} does not. Previous algorithms with linear time work only when k ≤ 2 [Tarjan SICOMP'72], otherwise they all require Ω(m+n√n) time even when k = 3 [Chechik et al. SODA'17; Forster et al. SODA'20]. Our algorithm also extends to the decremental graph setting; we can deterministically maintain the maximal k-edge-connected subgraphs of a graph undergoing edge deletions in m^{1+o(1)} total update time. Our key idea is a reduction to the dynamic algorithm supporting pairwise k-edge-connectivity queries [Jin and Sun FOCS'20]. 
    more » « less
  4. Suppose a graph G is stochastically created by uniformly sampling vertices along a line segment and connecting each pair of vertices with a probability that is a known decreasing function of their distance. We ask if it is possible to reconstruct the actual positions of the vertices in G by only observing the generated unlabeled graph. We study this question for two natural edge probability functions — one where the probability of an edge decays exponentially with the distance and another where this probability decays only linearly. We initiate our study with the weaker goal of recovering only the order in which vertices appear on the line segment. For a segment of length n and a precision parameter δ, we show that for both exponential and linear decay edge probability functions, there is an efficient algorithm that correctly recovers (up to reflection symmetry) the order of all vertices that are at least δ apart, using only ˜ O( n / δ^2) samples (vertices). Building on this result, we then show that O( n^2 log n / δ^2) vertices (samples) are sufficient to additionally recover the location of each vertex on the line to within a precision of δ. We complement this result with an Ω( n^ 1.5 / δ ) lower bound on samples needed for reconstructing positions (even by a computationally unbounded algorithm), showing that the task of recovering positions is information-theoretically harder than recovering the order. We give experimental results showing that our algorithm recovers the positions of almost all points with high accuracy. 
    more » « less
  5. We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ϵ-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of Θ∗(n/ √ m) for sampling a single edge in general graphs (where O ∗(·) suppresses poly(1/ϵ) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O∗(√ q · (n/ √ m)), which is strictly preferable to O∗(q · (n/ √ m)) cost resulting from q invocations of the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, Tětek and Thorup (arXiv, preprint) proved that this bound is essentially optimal. 
    more » « less