We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)).
We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time.
more »
« less
Approximation Algorithms for Socially Fair Clustering
We present an $e^{O(p)} (\log \ell) / (\log \log \ell)$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group } j} d(u, C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta((\log \ell) / (\log \log \ell))$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).
more »
« less
- NSF-PAR ID:
- 10336944
- Editor(s):
- Belkin, Mikhail; Kpotufe, Samor
- Date Published:
- Journal Name:
- Proceedings of the Conference on Learning Theory, PMLR
- Volume:
- 134
- Page Range / eLocation ID:
- 3246-3264
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Chan, Timothy ; Fischer, Johannes ; Iacono, John ; Herman, Grzegorz (Ed.)We consider two-cost network design models in which edges of the input graph have an associated cost and length. We build upon recent advances in hop-constrained oblivious routing to obtain two sets of results. We address multicommodity buy-at-bulk network design in the nonuniform setting. Existing poly-logarithmic approximations are based on the junction tree approach [Chekuri et al., 2010; Guy Kortsarz and Zeev Nutov, 2011]. We obtain a new polylogarithmic approximation via a natural LP relaxation. This establishes an upper bound on its integrality gap and affirmatively answers an open question raised in [Chekuri et al., 2010]. The rounding is based on recent results in hop-constrained oblivious routing [Ghaffari et al., 2021], and this technique yields a polylogarithmic approximation in more general settings such as set connectivity. Our algorithm for buy-at-bulk network design is based on an LP-based reduction to h-hop constrained network design for which we obtain LP-based bicriteria approximation algorithms. We also consider a fault-tolerant version of h-hop constrained network design where one wants to design a low-cost network to guarantee short paths between a given set of source-sink pairs even when k-1 edges can fail. This model has been considered in network design [Luis Gouveia and Markus Leitner, 2017; Gouveia et al., 2018; Arslan et al., 2020] but no approximation algorithms were known. We obtain polylogarithmic bicriteria approximation algorithms for the single-source setting for any fixed k. We build upon the single-source algorithm and the junction-tree approach to obtain an approximation algorithm for the multicommodity setting when at most one edge can fail.more » « less
-
Megow, Nicole ; Smith, Adam (Ed.)In this paper, we study the weighted k-server problem on the uniform metric in both the offline and online settings. We start with the offline setting. In contrast to the (unweighted) k-server problem which has a polynomial-time solution using min-cost flows, there are strong computational lower bounds for the weighted k-server problem, even on the uniform metric. Specifically, we show that assuming the unique games conjecture, there are no polynomial-time algorithms with a sub-polynomial approximation factor, even if we use c-resource augmentation for c < 2. Furthermore, if we consider the natural LP relaxation of the problem, then obtaining a bounded integrality gap requires us to use at least 𝓁 resource augmentation, where 𝓁 is the number of distinct server weights. We complement these results by obtaining a constant-approximation algorithm via LP rounding, with a resource augmentation of (2+ε)𝓁 for any constant ε > 0. In the online setting, an exp(k) lower bound is known for the competitive ratio of any randomized algorithm for the weighted k-server problem on the uniform metric. In contrast, we show that 2𝓁-resource augmentation can bring the competitive ratio down by an exponential factor to only O(𝓁² log 𝓁). Our online algorithm uses the two-stage approach of first obtaining a fractional solution using the online primal-dual framework, and then rounding it online.more » « less
-
Meila, Marina ; Zhang, Tong (Ed.)In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\mathbf{disagree}$, is a vector indexed by the vertices of $G$ such that the $v$-th coordinate $\mathbf{disagree}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}{\alpha})^{\frac{1}{2}-\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.more » « less
-
null (Ed.)Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph G = (V, E) with edge costs c ∈ ℝ_{≥ 0}^E, a root r ∈ V and k terminals K ⊆ V, we need to output a minimum-cost arborescence in G that contains an rrightarrow t path for every t ∈ K. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time O(log²k/log log k)-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound d_v on each vertex v ∈ V, and we require that every vertex v in the output tree has at most d_v children. We give a quasi-polynomial time (O(log n log k), O(log² n))-bicriteria approximation: The algorithm produces a solution with cost at most O(log nlog k) times the cost of the optimum solution that violates the degree constraints by at most a factor of O(log²n). This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of O(log²n) is an O(log n)-factor away from the approximation lower bound of Ω(log n) from the Set Cover hardness. The hardness result holds even on the special case of the Degree-Bounded Group Steiner Tree problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an (O(log nlog k), O(log n))-bicriteria approximation algorithm for DB-GST-T.more » « less