skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Simultaneously Approximating All 𝓁_p-Norms in Correlation Clustering
This paper considers correlation clustering on unweighted complete graphs. We give a combinatorial algorithm that returns a single clustering solution that is simultaneously O(1)-approximate for all 𝓁_p-norms of the disagreement vector; in other words, a combinatorial O(1)-approximation of the all-norms objective for correlation clustering. This is the first proof that minimal sacrifice is needed in order to optimize different norms of the disagreement vector. In addition, our algorithm is the first combinatorial approximation algorithm for the 𝓁₂-norm objective, and more generally the first combinatorial algorithm for the 𝓁_p-norm objective when 1 < p < ∞. It is also faster than all previous algorithms that minimize the 𝓁_p-norm of the disagreement vector, with run-time O(n^ω), where O(n^ω) is the time for matrix multiplication on n × n matrices. When the maximum positive degree in the graph is at most Δ, this can be improved to a run-time of O(nΔ² log n).  more » « less
Award ID(s):
2121744 1845146
PAR ID:
10563546
Author(s) / Creator(s):
; ;
Editor(s):
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
297
ISSN:
1868-8969
ISBN:
978-3-95977-322-5
Page Range / eLocation ID:
297-297
Subject(s) / Keyword(s):
Approximation algorithms correlation clustering all-norms lp-norms Theory of computation → Approximation algorithms analysis
Format(s):
Medium: X Size: 20 pages; 1043813 bytes Other: application/pdf
Size(s):
20 pages 1043813 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. Tauman_Kalai, Yael (Ed.)
    In the weighted load balancing problem, the input is an n-vertex bipartite graph between a set of clients and a set of servers, and each client comes with some nonnegative real weight. The output is an assignment that maps each client to one of its adjacent servers, and the load of a server is then the sum of the weights of the clients assigned to it. The goal is to find an assignment that is well-balanced, typically captured by (approximately) minimizing either the 𝓁_∞- or 𝓁₂-norm of the server loads. Generalizing both of these objectives, the all-norm load balancing problem asks for an assignment that approximately minimizes all 𝓁_p-norm objectives for p ≥ 1, including p = ∞, simultaneously. Our main result is a deterministic O(log n)-pass O(1)-approximation semi-streaming algorithm for the all-norm load balancing problem. Prior to our work, only an O(log n)-pass O(log n)-approximation algorithm for the 𝓁_∞-norm objective was known in the semi-streaming setting. Our algorithm uses a novel application of the multiplicative weights update method to a mixed covering/packing convex program for the all-norm load balancing problem involving an infinite number of constraints. 
    more » « less
  2. null (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector is a vector indexed by the vertices of G such that the v-th coordinate of the disagreements vector equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α ≤ 1 and w > 0 is a scaling parameter). We give an O((1/α)^{1/2−1/(2p)} log 1/α) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  3. null (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector, Agree, is a vector indexed by the vertices of G such that the v-th coordinate Disagre equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α≤1 and w>0 is a scaling parameter). We give an O((1/α)^{1/2−1/2p}⋅log(1/α)) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  4. Aichholzer, Oswin; Wang, Haitao (Ed.)
    The 𝓁₂² min-sum k-clustering problem is to partition an input set into clusters C_1,…,C_k to minimize ∑_{i=1}^k ∑_{p,q ∈ C_i} ‖p-q‖₂². Although 𝓁₂² min-sum k-clustering is NP-hard, it is not known whether it is NP-hard to approximate 𝓁₂² min-sum k-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the 𝓁₂² min-sum k-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than 1.056 and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a fast PTAS for 𝓁₂² min-sum k-clustering. Specifically, our algorithm runs in time O(n^{1+o(1)}d⋅ 2^{(k/ε)^O(1)}), which is the first nearly linear time algorithm for this problem. We also consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label i ∈ [k] for input point, thereby implicitly partitioning the input dataset into k clusters that induce an approximately optimal solution, up to some amount of adversarial error α ∈ [0,1/2). We give a polynomial-time algorithm that outputs a (1+γα)/(1-α)²-approximation to 𝓁₂² min-sum k-clustering, for a fixed constant γ > 0. 
    more » « less
  5. Meila, Marina; Zhang, Tong (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph $$G$$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $$\mathcal{C}$$ of graph $$G$$, a similar edge is in disagreement with $$\mathcal{C}$$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $$\mathcal{C}$$ if its endpoints belong to the same cluster. The disagreements vector, $$\mathbf{disagree}$$, is a vector indexed by the vertices of $$G$$ such that the $$v$$-th coordinate $$\mathbf{disagree}_v$$ equals the weight of all disagreeing edges incident on $$v$$. The goal is to produce a clustering that minimizes the $$\ell_p$$ norm of the disagreements vector for $$p\geq 1$$. We study the $$\ell_p$$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $$[\alpha\mathbf{w},\mathbf{w}]$$ and every dissimilar edge has weight at least $$\alpha\mathbf{w}$$ (where $$\alpha \leq 1$$ and $$\mathbf{w}>0$$ is a scaling parameter). We give an $$O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less