skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 26, 2026

Title: Private Geometric Median in Nearly-Linear Time
Estimating the geometric median of a dataset is a robust counterpart to mean estimation, and is a fundamental problem in computational geometry. Recently, [HSU24] gave an (ε,δ)-differentially private algorithm obtaining an α-multiplicative approximation to the geometric median objective, 1n∑i∈[n]‖⋅−xi‖, given a dataset :={xi}i∈[n]⊂ℝd. Their algorithm requires n≳d‾‾√⋅1αε samples, which they prove is information-theoretically optimal. This result is surprising because its error scales with the \emph{effective radius} of  (i.e., of a ball capturing most points), rather than the worst-case radius. We give an improved algorithm that obtains the same approximation quality, also using n≳d‾‾√⋅1αϵ samples, but in time O˜(nd+dα2). Our runtime is nearly-linear, plus the cost of the cheapest non-private first-order method due to [CLM+16]. To achieve our results, we use subsampling and geometric aggregation tools inspired by FriendlyCore [TCK+22] to speed up the "warm start" component of the [HSU24] algorithm, combined with a careful custom analysis of DP-SGD's sensitivity for the geometric median objective.  more » « less
Award ID(s):
2505865
PAR ID:
10631406
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
https://doi.org/10.48550/arXiv.2505.20189
Date Published:
ISSN:
2505.20189
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mulzer, Wolfgang; Phillips, Jeff M (Ed.)
    A fundamental question is whether one can maintain a maximum independent set (MIS) in polylogarithmic update time for a dynamic collection of geometric objects in Euclidean space. For a set of intervals, it is known that no dynamic algorithm can maintain an exact MIS in sublinear update time. Therefore, the typical objective is to explore the trade-off between update time and solution size. Substantial efforts have been made in recent years to understand this question for various families of geometric objects, such as intervals, hypercubes, hyperrectangles, and fat objects. We present the first fully dynamic approximation algorithm for disks of arbitrary radii in the plane that maintains a constant-factor approximate MIS in polylogarithmic expected amortized update time. Moreover, for a fully dynamic set of n unit disks in the plane, we show that a 12-approximate MIS can be maintained with worst-case update time O(log n), and optimal output-sensitive reporting. This result generalizes to fat objects of comparable sizes in any fixed dimension d, where the approximation ratio depends on the dimension and the fatness parameter. Further, we note that, even for a dynamic set of disks of unit radius in the plane, it is impossible to maintain O(1+ε)-approximate MIS in truly sublinear update time, under standard complexity assumptions. Our results build on two recent technical tools: (i) The MIX algorithm by Cardinal et al. (ESA 2021) that can smoothly transition from one independent set to another; hence it suffices to maintain a family of independent sets where the largest one is an O(1)-approximate MIS. (ii) A dynamic nearest/farthest neighbor data structure for disks by Kaplan et al. (DCG 2020) and Liu (SICOMP 2022), which generalizes the dynamic convex hull data structure by Chan (JACM 2010), and quickly yields a "replacement" disk (if any) when a disk in one of our independent sets is deleted. 
    more » « less
  2. Given a set of points $$P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$$ for some constant $$d$$ and a supply function $$\mu:P\to \mathbb{R}$$ such that $$\mu(p) > 0~\forall p \in P^+$$, $$\mu(p) < 0~\forall p \in P^-$$, and $$\sum_{p\in P}{\mu(p)} = 0$$, the geometric transportation problem asks one to find a transportation map $$\tau: P^+\times P^-\to \mathbb{R}_{\ge 0}$$ such that $$\sum_{q\in P^-}{\tau(p, q)} = \mu(p)~\forall p \in P^+$$, $$\sum_{p\in P^+}{\tau(p, q)} = -\mu(q) \forall q \in P^-$$, and the weighted sum of Euclidean distances for the pairs $$\sum_{(p,q)\in P^+\times P^-}\tau(p, q)\cdot ||q-p||_2$$ is minimized. We present the first deterministic algorithm that computes, in near-linear time, a transportation map whose cost is within a $$(1 + \varepsilon)$$ factor of optimal. More precisely, our algorithm runs in $$O(n\varepsilon^{-(d+2)}\log^5{n}\log{\log{n}})$$ time for any constant $$\varepsilon > 0$$. While a randomized $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time algorithm for this problem was discovered in the last few years, all previously known deterministic $$(1 + \varepsilon)$$-approximation algorithms run in~$$\Omega(n^{3/2})$$ time. A similar situation existed for geometric bipartite matching, the special case of geometric transportation where all supplies are unit, until a deterministic $$n\varepsilon^{-O(d)}\log^{O(d)}{n}$$ time $$(1 + \varepsilon)$$-approximation algorithm was presented at STOC 2022. Surprisingly, our result is not only a generalization of the bipartite matching one to arbitrary instances of geometric transportation, but it also reduces the running time for all previously known $$(1 + \varepsilon)$$-approximation algorithms, randomized or deterministic, even for geometric bipartite matching. In particular, we give the first $$(1 + \varepsilon)$$-approximate deterministic algorithm for geometric bipartite matching and the first $$(1 + \varepsilon)$$-approximate deterministic or randomized algorithm for geometric transportation with no dependence on $$d$$ in the exponent of the running time's polylog. As an additional application of our main ideas, we also give the first randomized near-linear $$O(\varepsilon^{-2} m \log^{O(1)} n)$$ time $$(1 + \varepsilon)$$-approximation algorithm for the uncapacitated minimum cost flow (transshipment) problem in undirected graphs with arbitrary \emph{real} edge costs. 
    more » « less
  3. We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)). We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time. 
    more » « less
  4. Given a data set of size n in d'-dimensional Euclidean space, the k-means problem asks for a set of k points (called centers) such that the sum of the l_2^2-distances between the data points and the set of centers is minimized. Previous work on this problem in the local differential privacy setting shows how to achieve multiplicative approximation factors arbitrarily close to optimal, but suffers high additive error. The additive error has also been seen to be an issue in implementations of differentially private k-means clustering algorithms in both the central and local settings. In this work, we introduce a new locally private k-means clustering algorithm that achieves near-optimal additive error whilst retaining constant multiplicative approximation factors and round complexity. Concretely, given any c>sqrt(2), our algorithm achieves O(k^(1 + O(1/(2c^2-1))) * sqrt(d' n) * log d' * poly log n) additive error with an O(c^2) multiplicative approximation factor. 
    more » « less
  5. We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, for d-dimensional Gaussian distributions with known covariance Σ, we can test whether the distribution comes from N(μ∗,Σ) for some fixed μ∗ or from some N(μ,Σ) with total variation distance at least α from N(μ∗,Σ) with (ε,0)-differential privacy, using only O~(d1/2α2+d1/3α4/3⋅ε2/3+1α⋅ε) samples if the algorithm is allowed to be computationally inefficient, and only O~(d1/2α2+d1/4α⋅ε) samples for a computationally efficient algorithm. We also provide a matching lower bound showing that our computationally inefficient algorithm has optimal sample complexity. We also extend our algorithms to various related problems, including mean testing of Gaussians with bounded but unknown covariance, uniformity testing of product distributions over {−1,1}d, and tolerant testing. Our results improve over the previous best work of Canonne et al.~\cite{CanonneKMUZ20} for both computationally efficient and inefficient algorithms, and even our computationally efficient algorithm matches the optimal \emph{non-private} sample complexity of O(d√α2) in many standard parameter settings. In addition, our results show that, surprisingly, private identity testing of d-dimensional Gaussians can be done with fewer samples than private identity testing of discrete distributions over a domain of size d \cite{AcharyaSZ18}, which refutes a conjectured lower bound of~\cite{CanonneKMUZ20}. 
    more » « less