We introduce a novel criterion in clustering that seeks clusters with limited
This content will become publicly available on March 25, 2025
Motivated by applications to classification problems on metric data, we study Weighted Metric Clustering problem: given a metric d over n points and a k x k symmetric matrix A with nonnegative entries, the goal is to find a kpartition of these points into clusters C1,...,Ck, while minimizing the sum of A[i,j] * d(u,v) over all pairs of clusters Ci and Cj and all pairs of points u from Ci and v from Cj. Specific choices of A lead to Weighted Metric Clustering capturing wellstudied graph partitioning problems in metric spaces, such as MinUncut, MinkSum, MinkCut, and more.Our main result is that Weighted Metric Clustering admits a polynomialtime approximation scheme (PTAS). Our algorithm handles all the above problems using the SheraliAdams linear programming relaxation. This subsumes several prior works, unifies many of the techniques for various metric clustering objectives, and yields a PTAS for several new problems, including metric clustering on manifolds and a new family of hierarchical clustering objectives. Our experiments on the hierarchical clustering objective show that it better captures the groundtruth structural information compared to the popular Dasgupta's objective.
more » « less Award ID(s):
 1955351
 NSFPAR ID:
 10519129
 Publisher / Repository:
 Association for the Advancement of Artificial Intelligence
 Date Published:
 Journal Name:
 Proceedings of the AAAI Conference on Artificial Intelligence
 Volume:
 38
 Issue:
 8
 ISSN:
 21595399
 Page Range / eLocation ID:
 7926 to 7934
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

range of values associated with each cluster's elements. In clustering or classification the objective is to partition a set of objects into subsets, called clusters or classes, consisting of similar objects so that different clusters are as dissimilar as possible. We propose a number of objective functions that employ the range of the clusters as part of the objective function. Several of the proposed objectives mimic objectives based on sums of similarities. These objective functions are motivated by image segmentation problems, where the diameter, or range of values associated with objects in each cluster, should be small. It is demonstrated that range‐based problems are in general easier, in terms of their complexity, than the analogous similarity‐sum problems. Several of the problems we present could therefore be viable alternatives to existing clustering problems which are NP‐hard, offering the advantage of efficient algorithms. 
Belkin, Mikhail ; Kpotufe, Samor (Ed.)We present an $e^{O(p)} (\log \ell) / (\log \log \ell)$approximation algorithm for socially fair clustering with the $\ell_p$objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$medians, $k$means, or, more generally, $\ell_p$clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group } j} d(u, C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $O(\ell)$approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta((\log \ell) / (\log \log \ell))$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).more » « less

Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric tspanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a tspanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)spanner algorithm with competitive ratio O_d(ε^{d} log n), improving the previous bound of O_d(ε^{(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1d}log ε^{1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{3/2}logε^{1}log n), by comparing the online spanner with an instanceoptimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{d}) lower bound for the competitive ratio for online (1+ε)spanner algorithms in ℝ^d under the L₁norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{1}logε^{1})⋅ n^{1+1/k} edges and O(ε^{1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the tradeoff among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)spanner for ultrametrics with O(ε^{1}logε^{1})⋅ n edges and O(ε^{2}) lightness.more » « less

Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric tspanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a tspanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)spanner algorithm with competitive ratio O_d(ε^{d} log n), improving the previous bound of O_d(ε^{(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1d}log ε^{1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{3/2}logε^{1}log n), by comparing the online spanner with an instanceoptimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{d}) lower bound for the competitive ratio for online (1+ε)spanner algorithms in ℝ^d under the L₁norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{1}logε^{1})⋅ n^{1+1/k} edges and O(ε^{1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the tradeoff among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)spanner for ultrametrics with O(ε^{1}logε^{1})⋅ n edges and O(ε^{2}) lightness.more » « less

null (Ed.)Understanding the structure of minorfree metrics, namely shortest path metrics obtained over a weighted graph excluding a fixed minor, has been an important research direction since the fundamental work of Robertson and Seymour. A fundamental idea that helps both to understand the structural properties of these metrics and lead to strong algorithmic results is to construct a “smallcomplexity” graph that approximately preserves distances between pairs of points of the metric. We show the two following structural results for minorfree metrics: 1) Construction of a light subset spanner. Given a subset of vertices called terminals, and ϵ, in polynomial time we construct a sub graph that preserves all pairwise distances between terminals up to a multiplicative 1+ϵ factor, of total weight at most Oϵ(1) times the weight of the minimal Steiner tree spanning the terminals. 2) Construction of a stochastic metric embedding into low treewidth graphs with expected additive distortion ϵD. Namely, given a minorfree graph G=(V,E,w) of diameter D, and parameter ϵ, we construct a distribution D over dominating metric embeddings into treewidthOϵ(log n) graphs such that ∀u,v∈V, Ef∼D[dH(f(u),f(v))]≤dG(u,v)+ϵD. Our results have the following algorithmic consequences: (1) the first efficient approximation scheme for subset TSP in minorfree metrics; (2) the first approximation scheme for boundedcapacity vehicle routing in minorfree metrics; (3) the first efficient approximation scheme for boundedcapacity vehicle routing on bounded genus metrics. En route to the latter result, we design the first FPT approximation scheme for boundedcapacity vehicle routing on boundedtreewidth graphs (parameterized by the treewidth).more » « less