skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fair Representation Clustering with Several Protected Classes
We study the problem of fair k-median where each cluster is required to have a fair representation of individuals from different groups. In the fair representation k-median problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a k-clustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j |C_i| and β_j |C_i| for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1-objective ∑_{x ∈ X} d(x, ϕ(x)). We present an O(log k)-approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case, we present an O(log k)-approximation algorithm that runs in time.  more » « less
Award ID(s):
1718820 1934843 1955173
PAR ID:
10392770
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 ACM Conference on Fairness, Accountability, and Transparency
Page Range / eLocation ID:
814 - 823
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Belkin, Mikhail; Kpotufe, Samor (Ed.)
    We present an $$e^{O(p)} (\log \ell) / (\log \log \ell)$$-approximation algorithm for socially fair clustering with the $$\ell_p$$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $$\ell$$ groups. The goal is to find a $$k$$-medians, $$k$$-means, or, more generally, $$\ell_p$$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $$k$$ centers $$C$$ so as to minimize the maximum over all groups $$j$$ of $$\sum_{u \text{ in group } j} d(u, C)^p$$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $$O(\ell)$$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $$\Omega(\ell)$$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $$\Theta((\log \ell) / (\log \log \ell))$$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021). 
    more » « less
  2. This article presentsuniversalalgorithms for clustering problems, including the widely studiedk-median,k-means, andk-center objectives. The input is a metric space containing allpotentialclient locations. The algorithm must selectkcluster centers such that they are a good solution foranysubset of clients that actually realize. Specifically, we aim for lowregret, defined as the maximum over all subsets of the difference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solutionSolfor a clustering problem is said to be an α , β-approximation if for all subsets of clientsC, it satisfiessol(C) ≤ α ċopt(C′) + β ċmr, whereopt(C′ is the cost of the optimal solution for clients (C′) andmris the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering objectives ofk-median,k-means, andk-center that achieve (O(1),O(1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other ℓp-objectives and the setting where some subset of the clients arefixed. We also give hardness results showing that (α, β)-approximation is NP-hard if α or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, (O(1),O(1))-approximation is the strongest type of guarantee obtainable for universal clustering. 
    more » « less
  3. Bojańczyk, Mikołaj; Merelli, Emanuela; Woodruff, David P (Ed.)
    Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2). 
    more » « less
  4. Clustering plays a crucial role in computer science, facilitating data analysis and problem-solving across numerous fields. By partitioning large datasets into meaningful groups, clustering reveals hidden structures and relationships within the data, aiding tasks such as unsupervised learning, classification, anomaly detection, and recommendation systems. Particularly in relational databases, where data is distributed across multiple tables, efficient clustering is essential yet challenging due to the computational complexity of joining tables. This paper addresses this challenge by introducing efficient algorithms for k-median and k-means clustering on relational data without the need for pre-computing the join query results. For the relational k-median clustering, we propose the first efficient relative approximation algorithm. For the relational k-means clustering, our algorithm significantly improves both the approximation factor and the running time of the known relational k-means clustering algorithms, which suffer either from large constant approximation factors, or expensive running time. Given a join query q and a database instance D of O(N) tuples, for both k-median and k-means clustering on the results of q on D, we propose randomized (1+ε)γ-approximation algorithms that run in roughly O(k2Nfhw)+T_γ(k2) time, where ε ∈ (0,1) is a constant parameter decided by the user, \fhw is the fractional hyper-tree width of Q, while γ and T_γ(x) represent the approximation factor and the running time, respectively, of a traditional clustering algorithm in the standard computational setting over x points. 
    more » « less
  5. Clustering is a fundamental problem in unsupervised machine learning, and due to its numerous societal implications fair variants of it have recently received significant attention. In this work we introduce a novel definition of individual fairness for clustering problems. Specifically, in our model, each point j has a set of other points S(j) that it perceives as similar to itself, and it feels that it is being fairly treated if the quality of service it receives in the solution is α-close (in a multiplicative sense, for some given α ≥ 1) to that of the points in S(j). We begin our study by answering questions regarding the combinatorial structure of the problem, namely for what values of α the problem is well-defined, and what the behavior of the Price of Fairness (PoF) for it is. For the well-defined region of α, we provide efficient and easily-implementable approximation algorithms for the k-center objective, which in certain cases also enjoy bounded-PoF guarantees. We finally complement our analysis by an extensive suite of experiments that validates the effectiveness of our theoretical results. 
    more » « less