skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Local Correlation Clustering with Asymmetric Classification Errors
In the Correlation Clustering problem, we are given a complete weighted graph $$G$$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $$\mathcal{C}$$ of graph $$G$$, a similar edge is in disagreement with $$\mathcal{C}$$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $$\mathcal{C}$$ if its endpoints belong to the same cluster. The disagreements vector, $$\mathbf{disagree}$$, is a vector indexed by the vertices of $$G$$ such that the $$v$$-th coordinate $$\mathbf{disagree}_v$$ equals the weight of all disagreeing edges incident on $$v$$. The goal is to produce a clustering that minimizes the $$\ell_p$$ norm of the disagreements vector for $$p\geq 1$$. We study the $$\ell_p$$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $$[\alpha\mathbf{w},\mathbf{w}]$$ and every dissimilar edge has weight at least $$\alpha\mathbf{w}$$ (where $$\alpha \leq 1$$ and $$\mathbf{w}>0$$ is a scaling parameter). We give an $$O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.  more » « less
Award ID(s):
1955173 1934843 1718820
PAR ID:
10336938
Author(s) / Creator(s):
; ; ;
Editor(s):
Meila, Marina; Zhang, Tong
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
139
ISSN:
2640-3498
Page Range / eLocation ID:
4677-4686
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector is a vector indexed by the vertices of G such that the v-th coordinate of the disagreements vector equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α ≤ 1 and w > 0 is a scaling parameter). We give an O((1/α)^{1/2−1/(2p)} log 1/α) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  2. null (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector, Agree, is a vector indexed by the vertices of G such that the v-th coordinate Disagre equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α≤1 and w>0 is a scaling parameter). We give an O((1/α)^{1/2−1/2p}⋅log(1/α)) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  3. In the Correlation Clustering problem, we are given a weighted graph $$G$$ with its edges labelled as "similar" or "dissimilar" by a binary classifier. The goal is to produce a clustering that minimizes the weight of "disagreements": the sum of the weights of "similar" edges across clusters and "dissimilar" edges within clusters. We study the correlation clustering problem under the following assumption: Every "similar" edge $$e$$ has weight $$w_e \in [ \alpha w, w ]$$ and every "dissimilar" edge $$e$$ has weight $$w_e \geq \alpha w$$ (where $$\alpha \leq 1$$ and $w > 0$ is a scaling parameter). We give a $$(3 + 2 \log_e (1/\alpha))$$ approximation algorithm for this problem. This assumption captures well the scenario when classification errors are asymmetric. Additionally, we show an asymptotically matching Linear Programming integrality gap of $$\Omega(\log 1/\alpha)$$ 
    more » « less
  4. Given a simple graph $$G$$, the irregularity strength of $$G$$, denoted $s(G)$, is the least positive integer $$k$$ such that there is a weight assignment on edges $$f: E(G) \to \{1,2,\dots, k\}$$ for which each vertex weight $$f^V(v):= \sum_{u: \{u,v\}\in E(G)} f(\{u,v\})$$ is unique amongst all $$v\in V(G)$$. In 1987, Faudree and Lehel conjectured that there is a constant $$c$$ such that $$s(G) \leq n/d + c$$ for all $$d$$-regular graphs $$G$$ on $$n$$ vertices with $d>1$, whereas it is trivial that $$s(G) \geq n/d$$. In this short note we prove that the Faudree-Lehel Conjecture holds when $$d \geq n^{0.8+\epsilon}$$ for any fixed $$\epsilon >0$$, with a small additive constant $c=28$ for $$n$$ large enough. Furthermore, we confirm the conjecture asymptotically by proving that for any fixed $$\beta\in(0,1/4)$$ there is a constant $$C$$ such that for all $$d$$-regular graphs $$G$$, $$s(G) \leq \frac{n}{d}(1+\frac{C}{d^{\beta}})+28$$, extending and improving a recent result of Przybyło that $$s(G) \leq \frac{n}{d}(1+ \frac{1}{\ln^{\epsilon/19}n})$$ whenever $$d\in [\ln^{1+\epsilon} n, n/\ln^{\epsilon}n]$$ and $$n$$ is large enough. 
    more » « less
  5. We present a weighted approach to compute a maximum cardinality matching in an arbitrary bipartite graph. Our main result is a new algorithm that takes as input a weighted bipartite graph G(A cup B,E) with edge weights of 0 or 1. Let w <= n be an upper bound on the weight of any matching in G. Consider the subgraph induced by all the edges of G with a weight 0. Suppose every connected component in this subgraph has O(r) vertices and O(mr/n) edges. We present an algorithm to compute a maximum cardinality matching in G in O~(m(sqrt{w} + sqrt{r} + wr/n)) time. When all the edge weights are 1 (symmetrically when all weights are 0), our algorithm will be identical to the well-known Hopcroft-Karp (HK) algorithm, which runs in O(m sqrt{n}) time. However, if we can carefully assign weights of 0 and 1 on its edges such that both w and r are sub-linear in n and wr=O(n^{gamma}) for gamma < 3/2, then we can compute maximum cardinality matching in G in o(m sqrt{n}) time. Using our algorithm, we obtain a new O~(n^{4/3}/epsilon^4) time algorithm to compute an epsilon-approximate bottleneck matching of A,B subsetR^2 and an 1/(epsilon^{O(d)}}n^{1+(d-1)/(2d-1)}) poly log n time algorithm for computing epsilon-approximate bottleneck matching in d-dimensions. All previous algorithms take Omega(n^{3/2}) time. Given any graph G(A cup B,E) that has an easily computable balanced vertex separator for every subgraph G'(V',E') of size |V'|^{delta}, for delta in [1/2,1), we can apply our algorithm to compute a maximum matching in O~(mn^{delta/1+delta}) time improving upon the O(m sqrt{n}) time taken by the HK-Algorithm. 
    more » « less