This content will become publicly available on June 30, 2025
 Award ID(s):
 1955351
 NSFPAR ID:
 10519132
 Publisher / Repository:
 International Machine Learning Society
 Date Published:
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector is a vector indexed by the vertices of G such that the vth coordinate of the disagreements vector equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α ≤ 1 and w > 0 is a scaling parameter). We give an O((1/α)^{1/2−1/(2p)} log 1/α) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.more » « less

null (Ed.)In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector, Agree, is a vector indexed by the vertices of G such that the vth coordinate Disagre equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α≤1 and w>0 is a scaling parameter). We give an O((1/α)^{1/2−1/2p}⋅log(1/α)) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.more » « less

Megow, Nicole ; Smith, Adam (Ed.)Structural balance theory studies stability in networks. Given a nvertex complete graph G = (V,E) whose edges are labeled positive or negative, the graph is considered balanced if every triangle either consists of three positive edges (three mutual "friends"), or one positive edge and two negative edges (two "friends" with a common "enemy"). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: (i) detecting whether a given graph is balanced, or (ii) finding a partition that approximates the frustration index, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on memory efficiency. We provide randomized singlepass algorithms for: (i) determining whether an input graph is balanced with O(log n) memory, and (ii) finding a partition that induces a (1 + ε)approximation to the frustration index with O(n ⋅ polylog(n)) memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independentlychosen vertices in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the stateoftheart correlation clustering with two clusters (GiotisGuruswami algorithm [SODA 2006]) from n^O(1/ε²) to O(n²log³n/ε² + n log n ⋅ (1/ε)^O(1/ε⁴)) time for (1+ε)approximation. These results may be of independent interest.more » « less

Meila, Marina ; Zhang, Tong (Ed.)In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\mathbf{disagree}$, is a vector indexed by the vertices of $G$ such that the $v$th coordinate $\mathbf{disagree}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}{\alpha})^{\frac{1}{2}\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.more » « less

In recent years, incomplete multiview clustering (IMVC), which studies the challenging multiview clustering problem on missing views, has received growing research interests. Previous IMVC methods suffer from the following issues: (1) the inaccurate imputation for missing data, which leads to suboptimal clustering performance, and (2) most existing IMVC models merely consider the explicit presence of graph structure in data, ignoring the fact that latent graphs of different views also provide valuable information for the clustering task. To overcome such challenges, we present a novel method, termed Adaptive feature imputation with latent graph for incomplete multiview clustering (AGDIMC). Specifically, it captures the embbedded features of each view by incorporating the viewspecific deep encoders. Then, we construct partial latent graphs on complete data, which can consolidate the intrinsic relationships within each view while preserving the topological information. With the aim of estimating the missing sample based on the available information, we utilize an adaptive imputation layer to impute the embedded feature of missing data by using crossview soft cluster assignments and global cluster centroids. As the imputation progresses, the portion of complete data increases, contributing to enhancing the discriminative information contained in global pseudolabels. Meanwhile, to alleviate the negative impact caused by inferior impute samples and the discrepancy of cluster structures, we further design an adaptive imputation strategy based on the global pseudolabel and the local cluster assignment. Experimental results on multiple realworld datasets demonstrate the effectiveness of our method over existing approaches.