Given a data set of size n in d'dimensional Euclidean space, the kmeans problem asks for a set of k points (called centers) such that the sum of the l_2^2distances between the data points and the set of centers is minimized. Previous work on this problem in the local differential privacy setting shows how to achieve multiplicative approximation factors arbitrarily close to optimal, but suffers high additive error. The additive error has also been seen to be an issue in implementations of differentially private kmeans clustering algorithms in both the central and local settings. In this work, we introduce a new locally private kmeans clustering algorithm that achieves nearoptimal additive error whilst retaining constant multiplicative approximation factors and round complexity. Concretely, given any c>sqrt(2), our algorithm achieves O(k^(1 + O(1/(2c^21))) * sqrt(d' n) * log d' * poly log n) additive error with an O(c^2) multiplicative approximation factor.
more »
« less
Differentially Private kMeans via Exponential Mechanism and Max Cover
We introduce a new (ϵₚ, δₚ)differentially private algorithm for the kmeans clustering problem. Given a dataset in Euclidean space, the kmeans clustering problem requires one to find k points in that space such that the sum of squares of Euclidean distances between each data point and its closest respective point among the k returned is minimised. Although there exist privacypreserving methods with good theoretical guarantees to solve this problem, in practice it is seen that it is the additive error which dictates the practical performance of these methods. By reducing the problem to a sequence of instances of maximum coverage on a grid, we are able to derive a new method that achieves lower additive error than previous works. For input datasets with cardinality n and diameter Δ, our algorithm has an O(Δ² (k log² n log(1/δₚ)/ϵₚ + k √(d log(1/δₚ))/ϵₚ)) additive error whilst maintaining constant multiplicative error. We conclude with some experiments and find an improvement over previously implemented work for this problem.
more »
« less
 Award ID(s):
 1750716
 NSFPAR ID:
 10316109
 Date Published:
 Journal Name:
 Proceedings of the AAAI Conference on Artificial Intelligence
 Volume:
 35
 Issue:
 8
 ISSN:
 21595399
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We study subtrajectory clustering under the Fréchet distance. Given one or more trajectories, the task is to split the trajectories into several parts, such that the parts have a good clustering structure. We approach this problem via a new set cover formulation, which we think provides a natural formalization of the problem as it is studied in many applications. Given a polygonal curve P with n vertices in fixed dimension, integers k, ℓ ≥ 1, and a real value Δ > 0, the goal is to find k center curves of complexity at most ℓ such that every point on P is covered by a subtrajectory that has small Fréchet distance to one of the k center curves (≤ Δ). In many application scenarios, one is interested in finding clusters of small complexity, which is controlled by the parameter ℓ. Our main result is a bicriterial approximation algorithm: if there exists a solution for given parameters k, ℓ, and Δ, then our algorithm finds a set of k' center curves of complexity at most ℓ with covering radius Δ' with k' in O(kℓ2 log (kℓ)), and Δ' ≤ 19Δ. Moreover, within these approximation bounds, we can minimize k while keeping the other parameters fixed. If ℓ is a constant independent of n, then, the approximation factor for the number of clusters k is O(log k) and the approximation factor for the radius Δ is constant. In this case, the algorithm has expected running time in Õ(km2 + mn) and uses space in O(n + m), where m=⌈L/Δ⌉ and L is the total arclength of the curve P.more » « less

This paper studies the problem of clustering in metric spaces while preserving the privacy of individual data. Specifically, we examine differentially private variants of the kmedians and Euclidean kmeans problems. We present polynomial algorithms with constant multiplicative error and lower additive error than the previous stateoftheart for each problem. Additionally, our algorithms use a clustering algorithm without differential privacy as a blackbox. This allows practitioners to control the tradeoff between runtime and approximation factor by choosing a suitable clustering algorithm to use.more » « less

Belkin, Mikhail ; Kpotufe, Samor (Ed.)We present an $e^{O(p)} (\log \ell) / (\log \log \ell)$approximation algorithm for socially fair clustering with the $\ell_p$objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$medians, $k$means, or, more generally, $\ell_p$clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group } j} d(u, C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $O(\ell)$approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta((\log \ell) / (\log \log \ell))$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).more » « less

Consider an instance of Euclidean kmeans or kmedians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k /ε) / ε2)dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction map satisfying a mild subGaussiantail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean kclustering with the distances raised to the pth power for any constant p. For kmeans, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for kmedians, it answers a question raised by Kannan.more » « less