his paper presents universal algorithms for clustering problems, including the widely studied k-median, k-means, and k-center objectives. The input is a metric space containing all potential client locations. The algorithm must select k cluster centers such that they are a good solution for any subset of clients that actually realize. Specifically, we aim for low regret, defined as the maximum over all subsets of the difference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solution sol for a clustering problem is said to be an (α, β)-approximation if for all subsets of clients C', it satisfies sol(C') ≤ α ⋅ opt(C') + β ⋅ mr, where opt(C') is the cost of the optimal solution for clients C' and mr is the minimum regret achievable by any solution.
Our main results are universal algorithms for the standard clustering objectives of k-median, k-means, and k-center that achieve (O(1), O(1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other 𝓁_p-objectives and the setting where some subset of the clients are fixed. We also give hardness results showing that (α, β)-approximation is NP-hard if α or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, (O(1), O(1))-approximation is the strongest type of guarantee obtainable for universal clustering. more »« less
Ganesh, Arun; Maggs, Bruce M; Panigrahi, Debmalya(
, ACM Transactions on Algorithms)
This article presentsuniversalalgorithms for clustering problems, including the widely studiedk-median,k-means, andk-center objectives. The input is a metric space containing allpotentialclient locations. The algorithm must selectkcluster centers such that they are a good solution foranysubset of clients that actually realize. Specifically, we aim for lowregret, defined as the maximum over all subsets of the difference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solutionSolfor a clustering problem is said to be an α , β-approximation if for all subsets of clientsC^{′}, it satisfiessol(C^{′}) ≤ α ċopt(C′) + β ċmr, whereopt(C′ is the cost of the optimal solution for clients (C′) andmris the minimum regret achievable by any solution.
Our main results are universal algorithms for the standard clustering objectives ofk-median,k-means, andk-center that achieve (O(1),O(1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other ℓ_{p}-objectives and the setting where some subset of the clients arefixed. We also give hardness results showing that (α, β)-approximation is NP-hard if α or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, (O(1),O(1))-approximation is the strongest type of guarantee obtainable for universal clustering.
Fox, Kyle; Huang, Hongyao; Raichel, Benjamin(
, 33rd International Symposium on Algorithms and Computation)
Bae, Sang Won; Park, Heejin
(Ed.)
In this paper we introduce and formally study the problem of k-clustering with faulty centers. Specifically, we study the faulty versions of k-center, k-median, and k-means clustering, where centers have some probability of not existing, as opposed to prior work where clients had some probability of not existing. For all three problems we provide fixed parameter tractable algorithms, in the parameters k, d, and ε, that (1+ε)-approximate the minimum expected cost solutions for points in d dimensional Euclidean space. For Faulty k-center we additionally provide a 5-approximation for general metrics. Significantly, all of our algorithms have a small dependence on n. Specifically, our Faulty k-center algorithms have only linear dependence on n, while for our algorithms for Faulty k-median and Faulty k-means the dependence is still only n^(1 + o(1)).
Nguyen, Thy; Chaturvedi, Anamay; Nguyen, Huy L(
, International Conference on Learning Representations)
We consider the problem of clustering in the learning-augmented setting. We are given a data set in $d$-dimensional Euclidean space, and a label for each data point given by a predictor indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an $\alpha \in (0,c)$ for some $c<1$ fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost $\mathrm{OPT}$. For a dataset of size $m$, we propose a deterministic $k$-means algorithm that produces centers with an improved bound on the clustering cost compared to the previous randomized state-of-the-art algorithm while preserving the $O( d m \log m)$ runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e., our cost bound holds for $\alpha$ up to $1/2$, an improvement from $\alpha$ being at most $1/7$ in previous work. For the $k$-medians problem we again improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter $\alpha$ to get a cost of $(1+O(\alpha))\mathrm{OPT}$, while requiring essentially just $O(md \log^3 m/\alpha)$ runtime.
In this paper we introduce and formally study the problem of $k$-clustering with faulty centers. Specifically, we study the faulty versions of $k$-center, $k$-median, and $k$-means clustering, where centers have some probability of not existing, as opposed to prior work where clients had some probability of not existing. For all three problems we provide fixed parameter tractable algorithms, in the parameters $k$, $d$, and $\eps$, that $(1+\eps)$-approximate the minimum expected cost solutions for points in $d$ dimensional Euclidean space. For Faulty $k$-center we additionally provide a 5-approximation for general metrics. Significantly, all of our algorithms have only a linear dependence on $n$.
Dai, Zhen; Makarychev, Yury; Vakilian, Ali(
, 2022 ACM Conference on Fairness, Accountability, and Transparency)
We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)).
We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time.
Bruce Maggs, Arun Ganesh. Universal Algorithms for Clustering. Retrieved from https://par.nsf.gov/biblio/10379274. Leibniz international proceedings in informatics 198.
Bruce Maggs, Arun Ganesh. Universal Algorithms for Clustering. Leibniz international proceedings in informatics, 198 (). Retrieved from https://par.nsf.gov/biblio/10379274.
Bruce Maggs, Arun Ganesh.
"Universal Algorithms for Clustering". Leibniz international proceedings in informatics 198 (). Country unknown/Code not available. https://par.nsf.gov/biblio/10379274.
@article{osti_10379274,
place = {Country unknown/Code not available},
title = {Universal Algorithms for Clustering},
url = {https://par.nsf.gov/biblio/10379274},
abstractNote = {his paper presents universal algorithms for clustering problems, including the widely studied k-median, k-means, and k-center objectives. The input is a metric space containing all potential client locations. The algorithm must select k cluster centers such that they are a good solution for any subset of clients that actually realize. Specifically, we aim for low regret, defined as the maximum over all subsets of the difference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solution sol for a clustering problem is said to be an (α, β)-approximation if for all subsets of clients C', it satisfies sol(C') ≤ α ⋅ opt(C') + β ⋅ mr, where opt(C') is the cost of the optimal solution for clients C' and mr is the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering objectives of k-median, k-means, and k-center that achieve (O(1), O(1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other 𝓁_p-objectives and the setting where some subset of the clients are fixed. We also give hardness results showing that (α, β)-approximation is NP-hard if α or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, (O(1), O(1))-approximation is the strongest type of guarantee obtainable for universal clustering.},
journal = {Leibniz international proceedings in informatics},
volume = {198},
author = {Bruce Maggs, Arun Ganesh},
editor = {Bansal, Nikhil and}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.