skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: Approximation Algorithms for Socially Fair Clustering
We present an $e^{O(p)} (\log \ell) / (\log \log \ell)$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group } j} d(u, C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta((\log \ell) / (\log \log \ell))$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).  more » « less
Award ID(s):
1955173 1934843 1718820
NSF-PAR ID:
10336944
Author(s) / Creator(s):
;
Editor(s):
Belkin, Mikhail; Kpotufe, Samor
Date Published:
Journal Name:
Proceedings of the Conference on Learning Theory, PMLR
Volume:
134
Page Range / eLocation ID:
3246-3264
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)). We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time. 
    more » « less
  2. Megow, Nicole ; Smith, Adam (Ed.)
    In this paper, we study the weighted k-server problem on the uniform metric in both the offline and online settings. We start with the offline setting. In contrast to the (unweighted) k-server problem which has a polynomial-time solution using min-cost flows, there are strong computational lower bounds for the weighted k-server problem, even on the uniform metric. Specifically, we show that assuming the unique games conjecture, there are no polynomial-time algorithms with a sub-polynomial approximation factor, even if we use c-resource augmentation for c < 2. Furthermore, if we consider the natural LP relaxation of the problem, then obtaining a bounded integrality gap requires us to use at least 𝓁 resource augmentation, where 𝓁 is the number of distinct server weights. We complement these results by obtaining a constant-approximation algorithm via LP rounding, with a resource augmentation of (2+ε)𝓁 for any constant ε > 0. In the online setting, an exp(k) lower bound is known for the competitive ratio of any randomized algorithm for the weighted k-server problem on the uniform metric. In contrast, we show that 2𝓁-resource augmentation can bring the competitive ratio down by an exponential factor to only O(𝓁² log 𝓁). Our online algorithm uses the two-stage approach of first obtaining a fractional solution using the online primal-dual framework, and then rounding it online. 
    more » « less
  3. Meila, Marina ; Zhang, Tong (Ed.)
    In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\mathbf{disagree}$, is a vector indexed by the vertices of $G$ such that the $v$-th coordinate $\mathbf{disagree}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap. 
    more » « less
  4. null (Ed.)
    Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph G = (V, E) with edge costs c ∈ ℝ_{≥ 0}^E, a root r ∈ V and k terminals K ⊆ V, we need to output a minimum-cost arborescence in G that contains an rrightarrow t path for every t ∈ K. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time O(log²k/log log k)-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound d_v on each vertex v ∈ V, and we require that every vertex v in the output tree has at most d_v children. We give a quasi-polynomial time (O(log n log k), O(log² n))-bicriteria approximation: The algorithm produces a solution with cost at most O(log nlog k) times the cost of the optimum solution that violates the degree constraints by at most a factor of O(log²n). This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of O(log²n) is an O(log n)-factor away from the approximation lower bound of Ω(log n) from the Set Cover hardness. The hardness result holds even on the special case of the Degree-Bounded Group Steiner Tree problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an (O(log nlog k), O(log n))-bicriteria approximation algorithm for DB-GST-T. 
    more » « less
  5. Del Pia, Alberto ; Kaibel, Volker (Ed.)
    A long-standing conjecture for the traveling salesman problem (TSP) states that the integrality gap of the standard linear programming relaxation of the TSP (sometimes called the Subtour LP or the Held-Karp bound) is at most 4/3 for symmetric instances of the TSP obeying the triangle inequality. In this paper we consider the half-integral case, in which a feasible solution to the LP has solution values in {0,1/2,1} . Karlin, Klein, and Oveis Gharan [9], in a breakthrough result, were able to show that in the half-integral case, the integrality gap is at most 1.49993; Gupta et al. [6] showed a slight improvement of this result to 1.4983. Both of these papers consider a hierarchy of critical tight sets in the support graph of the LP solution, in which some of the sets correspond to cycle cuts and the others to degree cuts. Here we show that if all the sets in the hierarchy correspond to cycle cuts, then we can find a distribution of tours whose expected cost is at most 4/3 times the value of the half-integral LP solution; sampling from the distribution gives us a randomized 4/3-approximation algorithm. We note that known bad cases for the integrality gap have a gap of 4/3 and have a half-integral LP solution in which all the critical tight sets in the hierarchy are cycle cuts; thus our result is tight. 
    more » « less