skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Evaluating Stability in Massive Social Networks: Efficient Streaming Algorithms for Structural Balance
Structural balance theory studies stability in networks. Given a n-vertex complete graph G = (V,E) whose edges are labeled positive or negative, the graph is considered balanced if every triangle either consists of three positive edges (three mutual "friends"), or one positive edge and two negative edges (two "friends" with a common "enemy"). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: (i) detecting whether a given graph is balanced, or (ii) finding a partition that approximates the frustration index, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on memory efficiency. We provide randomized single-pass algorithms for: (i) determining whether an input graph is balanced with O(log n) memory, and (ii) finding a partition that induces a (1 + ε)-approximation to the frustration index with O(n ⋅ polylog(n)) memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independently-chosen vertices in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the state-of-the-art correlation clustering with two clusters (Giotis-Guruswami algorithm [SODA 2006]) from n^O(1/ε²) to O(n²log³n/ε² + n log n ⋅ (1/ε)^O(1/ε⁴)) time for (1+ε)-approximation. These results may be of independent interest.  more » « less
Award ID(s):
2047061
NSF-PAR ID:
10488810
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Megow, Nicole; Smith, Adam
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Journal Name:
APPROX-RANDOM
Subject(s) / Keyword(s):
Streaming algorithms structural balance pseudo-randomness generator Theory of computation → Streaming, sublinear and near linear time algorithms Theory of computation → Pseudorandomness and derandomization
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bojańczyk, Mikołaj ; Merelli, Emanuela ; Woodruff, David P (Ed.)
    Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2). 
    more » « less
  2. null (Ed.)
    We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$-vertex $m$-edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$-approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/\epsilon}$-approximation in time $m^{1+O(1/\sqrt{\epsilon})}$ for any constant $\epsilon$, and a $(\log m)^{f(m)}$-approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $n$-vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs. 
    more » « less
  3. Gørtz, Inge Li ; Farach-Colton, Martin ; Puglisi, Simon J ; Herman, Grzegorz (Ed.)
    We consider variants of the classic Multiway Cut problem. Multiway Cut asks to partition a graph G into k parts so as to separate k given terminals. Recently, Chandrasekaran and Wang (ESA 2021) introduced l_p-norm Multiway Cut, a generalization of the problem, in which the goal is to minimize the l_p norm of the edge boundaries of k parts. We provide an O(log^{1/2} n log^{1/2 + 1/p} k) approximation algorithm for this problem, improving upon the approximation guarantee of O(log^{3/2} n log^{1/2} k) due to Chandrasekaran and Wang. We also introduce and study Norm Multiway Cut, a further generalization of Multiway Cut. We assume that we are given access to an oracle, which answers certain queries about the norm. We present an O(log^{1/2} n log^{7/2} k) approximation algorithm with a weaker oracle and an O(log^{1/2} n log^{5/2} k) approximation algorithm with a stronger oracle. Additionally, we show that without any oracle access, there is no n^{1/4-ε} approximation algorithm for every ε > 0 assuming the Hypergraph Dense-vs-Random Conjecture. 
    more » « less
  4. We present O(log logn)-round algorithms in the Massively Parallel Computation (MPC) model, with ˜O(n) memory per machine, that compute a maximal independent set, a 1 + ε approximation of maximum matching, and a 2 + ε approximation of minimum vertex cover, for any n-vertex graph and any constant ε > 0. These improve the state of the art as follows: • Our MIS algorithm leads to a simple O(log log Δ)-round MIS algorithm in the CONGESTED-CLIQUE model of distributed computing, which improves on the ˜O (plog Δ)-round algorithm of Ghaffari [PODC’17]. • OurO(log logn)-round (1+ε)-approximate maximum matching algorithm simplifies or improves on the following prior work: O(log2 logn)-round (1 + ε)-approximation algorithm of Czumaj et al. [STOC’18] and O(log logn)-round (1 + ε)- approximation algorithm of Assadi et al. [arXiv’17]. • Our O(log logn)-round (2+ε)-approximate minimum vertex cover algorithm improves on an O(log logn)-round O(1)- approximation of Assadi et al. [arXiv’17]. 
    more » « less
  5. We consider the maximum matching problem in the semi-streaming model formalized by Feigenbaum, Kannan, McGregor, Suri, and Zhang that is inspired by giant graphs of today. As our main result, we give a two-pass (1/2 + 1/16)-approximation algorithm for triangle-free graphs and a two-pass (1/2 + 1/32)-approximation algorithm for general graphs; these improve the approximation ratios of 1/2 + 1/52 for bipartite graphs and 1/2 + 1/140 for general graphs by Konrad, Magniez, and Mathieu. In three passes, we are able to achieve approximation ratios of 1/2 + 1/10 for triangle-free graphs and 1/2 + 1/19.753 for general graphs. We also give a multi-pass algorithm where we bound the number of passes precisely—we give a (2/3 − ε)- approximation algorithm that uses 2/(3ε) passes for triangle-free graphs and 4/(3ε) passes for general graphs. Our algorithms are simple and combinatorial, use O(n log n) space, and (can be implemented to) have O(1) update time per edge. For general graphs, our multi-pass algorithm improves the best known deterministic algorithms in terms of the number of passes: * Ahn and Guha give a (2/3−ε)-approximation algorithm that uses O(log(1/ε)/ε2) passes, whereas our (2/3 − ε)-approximation algorithm uses 4/(3ε) passes; * they also give a (1 − ε)-approximation algorithm that uses O(log n · poly(1/ε)) passes, where n is the number of vertices of the input graph; although our algorithm is (2/3−ε)-approximation, our number of passes do not depend on n. Earlier multi-pass algorithms either have a large constant inside big-O notation for the number of passes or the constant cannot be determined due to the involved analysis, so our multi-pass algorithm should use much fewer passes for approximation ratios bounded slightly below 2/3. 
    more » « less