skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Approximation Algorithms for Socially Fair Clustering
We present an $$e^{O(p)} (\log \ell) / (\log \log \ell)$$-approximation algorithm for socially fair clustering with the $$\ell_p$$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $$\ell$$ groups. The goal is to find a $$k$$-medians, $$k$$-means, or, more generally, $$\ell_p$$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $$k$$ centers $$C$$ so as to minimize the maximum over all groups $$j$$ of $$\sum_{u \text{ in group } j} d(u, C)^p$$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $$O(\ell)$$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $$\Omega(\ell)$$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $$\Theta((\log \ell) / (\log \log \ell))$$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).  more » « less
Award ID(s):
1955173 1934843 1718820
PAR ID:
10336944
Author(s) / Creator(s):
;
Editor(s):
Belkin, Mikhail; Kpotufe, Samor
Date Published:
Journal Name:
Proceedings of the Conference on Learning Theory, PMLR
Volume:
134
Page Range / eLocation ID:
3246-3264
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)). We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time. 
    more » « less
  2. Meila, Marina; Zhang, Tong (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph $$G$$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $$\mathcal{C}$$ of graph $$G$$, a similar edge is in disagreement with $$\mathcal{C}$$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $$\mathcal{C}$$ if its endpoints belong to the same cluster. The disagreements vector, $$\mathbf{disagree}$$, is a vector indexed by the vertices of $$G$$ such that the $$v$$-th coordinate $$\mathbf{disagree}_v$$ equals the weight of all disagreeing edges incident on $$v$$. The goal is to produce a clustering that minimizes the $$\ell_p$$ norm of the disagreements vector for $$p\geq 1$$. We study the $$\ell_p$$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $$[\alpha\mathbf{w},\mathbf{w}]$$ and every dissimilar edge has weight at least $$\alpha\mathbf{w}$$ (where $$\alpha \leq 1$$ and $$\mathbf{w}>0$$ is a scaling parameter). We give an $$O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  3. Megow, Nicole; Smith, Adam (Ed.)
    In this paper, we study the weighted k-server problem on the uniform metric in both the offline and online settings. We start with the offline setting. In contrast to the (unweighted) k-server problem which has a polynomial-time solution using min-cost flows, there are strong computational lower bounds for the weighted k-server problem, even on the uniform metric. Specifically, we show that assuming the unique games conjecture, there are no polynomial-time algorithms with a sub-polynomial approximation factor, even if we use c-resource augmentation for c < 2. Furthermore, if we consider the natural LP relaxation of the problem, then obtaining a bounded integrality gap requires us to use at least 𝓁 resource augmentation, where 𝓁 is the number of distinct server weights. We complement these results by obtaining a constant-approximation algorithm via LP rounding, with a resource augmentation of (2+ε)𝓁 for any constant ε > 0. In the online setting, an exp(k) lower bound is known for the competitive ratio of any randomized algorithm for the weighted k-server problem on the uniform metric. In contrast, we show that 2𝓁-resource augmentation can bring the competitive ratio down by an exponential factor to only O(𝓁² log 𝓁). Our online algorithm uses the two-stage approach of first obtaining a fractional solution using the online primal-dual framework, and then rounding it online. 
    more » « less
  4. Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
    We consider two-cost network design models in which edges of the input graph have an associated cost and length. We build upon recent advances in hop-constrained oblivious routing to obtain two sets of results. We address multicommodity buy-at-bulk network design in the nonuniform setting. Existing poly-logarithmic approximations are based on the junction tree approach [Chekuri et al., 2010; Guy Kortsarz and Zeev Nutov, 2011]. We obtain a new polylogarithmic approximation via a natural LP relaxation. This establishes an upper bound on its integrality gap and affirmatively answers an open question raised in [Chekuri et al., 2010]. The rounding is based on recent results in hop-constrained oblivious routing [Ghaffari et al., 2021], and this technique yields a polylogarithmic approximation in more general settings such as set connectivity. Our algorithm for buy-at-bulk network design is based on an LP-based reduction to h-hop constrained network design for which we obtain LP-based bicriteria approximation algorithms. We also consider a fault-tolerant version of h-hop constrained network design where one wants to design a low-cost network to guarantee short paths between a given set of source-sink pairs even when k-1 edges can fail. This model has been considered in network design [Luis Gouveia and Markus Leitner, 2017; Gouveia et al., 2018; Arslan et al., 2020] but no approximation algorithms were known. We obtain polylogarithmic bicriteria approximation algorithms for the single-source setting for any fixed k. We build upon the single-source algorithm and the junction-tree approach to obtain an approximation algorithm for the multicommodity setting when at most one edge can fail. 
    more » « less
  5. Chan, Timothy; Fischer, Johannes; Iacono, John; Herman, Grzegorz (Ed.)
    In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph G = (V,E), a root vertex r and a set S ⊆ V of k terminals. The goal is to find a min-cost subgraph that connects r to each of the terminals. DST admits an O(log² k/log log k)-approximation in quasi-polynomial time [Grandoni et al., 2022; Rohan Ghuge and Viswanath Nagarajan, 2022], and an O(k^{ε})-approximation for any fixed ε > 0 in polynomial-time [Alexander Zelikovsky, 1997; Moses Charikar et al., 1999]. Resolving the existence of a polynomial-time poly-logarithmic approximation is a major open problem in approximation algorithms. In a recent work, Friggstad and Mousavi [Zachary Friggstad and Ramin Mousavi, 2023] obtained a simple and elegant polynomial-time O(log k)-approximation for DST in planar digraphs via Thorup’s shortest path separator theorem [Thorup, 2004]. We build on their work and obtain several new results on DST and related problems. - We develop a tree embedding technique for rooted problems in planar digraphs via an interpretation of the recursion in [Zachary Friggstad and Ramin Mousavi, 2023]. Using this we obtain polynomial-time poly-logarithmic approximations for Group Steiner Tree [Naveen Garg et al., 2000], Covering Steiner Tree [Goran Konjevod et al., 2002] and the Polymatroid Steiner Tree [Gruia Călinescu and Alexander Zelikovsky, 2005] problems in planar digraphs. All these problems are hard to approximate to within a factor of Ω(log² n/log log n) even in trees [Eran Halperin and Robert Krauthgamer, 2003; Grandoni et al., 2022]. - We prove that the natural cut-based LP relaxation for DST has an integrality gap of O(log² k) in planar digraphs. This is in contrast to general graphs where the integrality gap of this LP is known to be Ω(√k) [Leonid Zosin and Samir Khuller, 2002] and Ω(n^{δ}) for some fixed δ > 0 [Shi Li and Bundit Laekhanukit, 2022]. - We combine the preceding results with density based arguments to obtain poly-logarithmic approximations for the multi-rooted versions of the problems in planar digraphs. For DST our result improves the O(R + log k) approximation of [Zachary Friggstad and Ramin Mousavi, 2023] when R = ω(log² k). 
    more » « less