<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The Fairness-Quality Tradeoff in Clustering</title></titleStmt>
			<publicationStmt>
				<publisher>Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024</publisher>
				<date>12/10/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10574568</idno>
					<idno type="doi"></idno>
					
					<author>Rashida Hakim</author><author>Ana-Andreea Stoica</author><author>Christos H Papadimitriou</author><author>Mihalis Yannakakis</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Fairness in clustering has been considered extensively in the past; however, the trade-off between the two objectives -e.g., can we sacrifice just a little in the quality of the clustering to significantly increase fairness, or vice-versa? -has rarely been addressed. We introduce novel algorithms for tracing the complete trade-off curve, or Pareto front, between quality and fairness in clustering problems; that is, computing all clusterings that are not dominated in both objectives by other clusterings. Unlike previous work that deals with specific objectives for quality and fairness, we deal with all objectives for fairness and quality in two general classes encompassing most of the special cases addressed in previous work. Our algorithm must take exponential time in the worst case as the Pareto front itself can be exponential. Even when the Pareto front is polynomial, our algorithm may take exponential time, and we prove that this is inevitable unless P = NP. However, we also present a new polynomial-time algorithm for computing the entire Pareto front when the cluster centers are fixed, and for a fairness objective that minimizes the sum, over all clusters, of the imbalance between the two groups in each cluster.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Clustering is a fundamental problem in unsupervised learning with many applications, in which data points must be grouped into clusters so that the quality or cost of the clustering is optimized -where the cost can be k-median or k-center, among a long array of different quality objectives treated in the literature. In some of these applications, the data points being clustered are people -for example, when households are clustered to determine the location of a bus stop or a hospital -and in those cases considerations such as fairness and equity come into play. In recent work, various frameworks and methods have been proposed for improving the representation of different sensitive groups in clustering, in cases where optimizing for clustering cost alone may be quite unfair. In such works, the fairness criterion is often set as a constraint, for which we must compute the optimal solution. This is generally an intractable computational problem even for the simplest fairness objectives, since clustering on its own is NP-hard. The best one could hope for is an approximate solution based on a vanilla approximation algorithm for clustering that improves fairness; such approaches have indeed been recently proposed, e.g., Esmaeili et al. <ref type="bibr">[26]</ref>, see Chhabra et al. <ref type="bibr">[18]</ref> for a survey.</p><p>We take a more general approach to fair clustering: We aim to recover the entire Pareto front of clustering cost and fairness, rather than a single point of that front (such as the best-quality clustering under a fairness constraint). Our work aims to enable a practitioner to choose any trade-off point, rather than computing one of them. We give general sufficient conditions for which we can recover the Pareto front, up to an approximation inherited from the clustering problem itself, and encompassing a variety of quality and fairness objectives used in the literature.</p><p>38th Conference on Neural Information Processing Systems (NeurIPS 2024).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Our Contributions</head><p>We present a novel family of algorithms for tracing the Pareto front between a quality/cost objective and a fairness objective, where each objective can belong in a general class that encompasses most of the ones previously proposed in the literature. Our algorithm requires that the fairness objective satisfy an intuitive property: that it is pattern-based. Informally, a fairness objective is pattern-based if it is solely a function of the number of nodes of different attributes in each cluster (see Definition 2.1). Many fairness objectives used in the literature are pattern-based, from the well-known balance <ref type="bibr">[19]</ref> to objectives that measure the proportional violation of pre-specified upper and lower bounds for a sensitive attribute within a cluster <ref type="bibr">[10,</ref><ref type="bibr">40,</ref><ref type="bibr">2,</ref><ref type="bibr">9,</ref><ref type="bibr">26,</ref><ref type="bibr">32,</ref><ref type="bibr">22]</ref>. For the quality objective, we consider metric-based cost functions. We study the computation of the quality/fairness trade-off in two settings: in the assignment problem, the centers of the clusters are given and the cost is the sum of the distances of the data points from their cluster center; whereas in the clustering problem, determining the centers is a part of the problem. We show that our algorithm finds the exact Pareto front of the assignment problem, and an approximation of the Pareto front of the clustering problem (Theorems 3.4, 3.5), with an approximation ratio that depends on the baseline approximation ratio (denoted &#179;) for the minimum cost clustering problem without a fairness objective. The running time is O(kn l(k-1) ), where k is the number of clusters, l is the number of sensitive attribute values, and n is the size of the dataset.</p><p>Our Pareto front algorithms take time exponential in &#8467; and k, and this is necessary. One reason is that the Pareto front may be exponential in size. For many fairness objectives the Pareto front is provably polynomial in size, but even in those cases exponential time is still necessary in general because the baseline problem is NP-hard. Importantly, we also present the first nontrivial algorithmic result for the quality/fairness Pareto front problem in clustering: A polynomial-time algorithm for computing the Pareto front for a new fairness objective, namely the one minimizing the sum (or the maximum), over all clusters, of the deviations from parity of the two protected attributes in each cluster (see <ref type="bibr">Theorem 3.6)</ref>.</p><p>We empirically explore faster methods for computing an approximate Pareto front by adapting constraint optimization methods from fair clustering <ref type="bibr">[26]</ref> in Section 4. These experiments explore the trade-off between accuracy and speed for our problem by comparing the tighter approximation obtained through our proposed algorithms with the looser Pareto front approximation obtained using faster methods.</p><p>We believe that our work is the first to address and carry out the computation of the entire Pareto front, and also the first to simultaneously cover a large range of quality and fairness criteria. In comparison, extant literature on fair clustering typically optimizes one objective subject to a constraint on the other objective, with an individualized approach for each combination. Our work is particularly applicable in cases where a decision-maker may be willing to be suboptimal in one objective in order to achieve a better value in a second objective, but does not know a priori what bounds to set on either objective. Our algorithms allow them to explore the entire trade-off, from the highest quality clustering all the way down to the fairest clustering. Finally, our polynomial-time algorithm for computing the Pareto front when the fairness objective is the sum or maximum of imbalances (see <ref type="bibr">Theorem 3.6</ref>) is another novel contribution to the subject of fair clustering.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Related Work</head><p>Our work is inspired by a plethora of recent studies on fair clustering that propose various metrics for improving the representation of different sensitive attributes within clusters. Unsupervised clustering is known to be NP-hard even in simple settings <ref type="bibr">[4]</ref>. A variety of algorithms have been proposed to approximate the best possible clustering, with approximation ratios differing for different cost objectives (e.g., k-means <ref type="bibr">[3]</ref>, k-median <ref type="bibr">[14]</ref>, k-center <ref type="bibr">[38]</ref>). Fair clustering goals range from maximizing the worst ratio between two groups across clusters <ref type="bibr">[19]</ref>, to minimizing the discrepancy between cluster representation and proportional representation of a group in a population <ref type="bibr">[2,</ref><ref type="bibr">9,</ref><ref type="bibr">10,</ref><ref type="bibr">22,</ref><ref type="bibr">32,</ref><ref type="bibr">26,</ref><ref type="bibr">40]</ref>, and to equalizing the clustering cost for various groups <ref type="bibr">[7,</ref><ref type="bibr">15,</ref><ref type="bibr">31,</ref><ref type="bibr">50,</ref><ref type="bibr">54,</ref><ref type="bibr">62]</ref>. Many of these works focus on building specialized approaches for each of the fairness objectives defined, and their objective is to find the best quality clustering that satisfies a specified fairness constraint. Optimizing even for simplest fairness objectives can be NP-hard <ref type="bibr">[26]</ref>, with approximation guarantees involving a multiplicative and additive factor, which may depend on the particular objective form and data topology. Our work differs in two significant ways: first, we compute the entire trade-off curve rather than a single optimization point on the curve; second, we provide algorithms that are agnostic to the specific objectives used, giving sufficient conditions on the objectives. We note that our conditions favor most group-fairness objectives defined in the literature. We discuss extensions to other definitions of fairness and limitations of our work in Section 5.</p><p>Our work takes a different approach, by aiming to compute the entire Pareto front between seemingly opposing objectives. Knowing the entire Pareto front can aid decision-makers when faced with choosing optimal trade-offs. Many real-world applications are faced with such trade-offs: for example, in facility location problems where a central agent decides on the location of facilities (e.g., buses, hospitals, etc.) based on the location of individuals within a certain region. It is well-known that people of lower socio-economic status often travel longer distances <ref type="bibr">[57]</ref> and have fewer health facilities in their region <ref type="bibr">[29]</ref>. A decision maker has multiple objectives to balance: minimal travel time for the entire population and equal access to facilities for different populations. The Pareto front allows the decision maker to balance these objectives in the optimal way: how much improvement can a group of people gain by moving a facility slightly away from the solution given by a single objective (e.g., by performing k-means)? Computing the Pareto front for facility location problems with multiple objectives has a long history motivated by such questions, with real-world case studies in the Singapore road network <ref type="bibr">[39]</ref> and ambulance placement in Barbados <ref type="bibr">[35]</ref>. Further applications are found in extensive surveys <ref type="bibr">[28]</ref>.</p><p>From a theoretical standpoint, computing the entire Pareto front remains a difficult problem. When an underlying single-objective optimization problem is NP-hard, such as in many combinatorial optimization problems, computing an approximation for the Pareto front is the best one could hope for <ref type="bibr">[55]</ref>, for example for the multi-objective shortest path problem <ref type="bibr">[11,</ref><ref type="bibr">36,</ref><ref type="bibr">44,</ref><ref type="bibr">64]</ref>, zero one knapsack problem <ref type="bibr">[25,</ref><ref type="bibr">42,</ref><ref type="bibr">63]</ref>, the multi-objective spanning tree problem <ref type="bibr">[8,</ref><ref type="bibr">21,</ref><ref type="bibr">55]</ref>, among others. To our knowledge, our work is novel in proposing algorithms for solving a bi-objective optimization problem based on clustering and fairness objectives which require only general properties for the fairness objectives to recover an approximate Pareto front with theoretical guarantees.</p><p>More generally, multi-objective optimization has seen a variety of methods for computing points close to the Pareto front in various domains, with methods ranging from evolutionary algorithms <ref type="bibr">[20,</ref><ref type="bibr">56,</ref><ref type="bibr">61,</ref><ref type="bibr">65]</ref> to gradient-based methods <ref type="bibr">[12,</ref><ref type="bibr">47,</ref><ref type="bibr">49,</ref><ref type="bibr">51]</ref>, and more recently to hypernetworks <ref type="bibr">[16,</ref><ref type="bibr">37,</ref><ref type="bibr">48,</ref><ref type="bibr">60]</ref> (often requiring inputting a preference vector of the objective values that will output a point on the Pareto front). These methods have also been applied to a problem closely related to clustering: that of facility location with multiple objectives <ref type="bibr">[34,</ref><ref type="bibr">58,</ref><ref type="bibr">59]</ref>, considering specific objectives and without theoretical approximation guarantees.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>We denote by X the set of data points in R d . Assume that X = {x 1 , . . . , x n }, and for j &#8804; n define X j = {x 1 , . . . , x j }. A clustering map is defined as &#981; : X &#8594; S, where S is a set of k cluster centers in R d and k is the number of clusters, considered fixed. A cluster C i is defined as all the data points for which &#981;(x) = s i , where s i &#8712; S is the center for the i-th cluster. We call a clustering C the set of all clusters C i (so |C| = k), and by K the set of all possible clusterings. Finally, we denote by &#195; : X &#8594; [l] the map between data points and a set of sensitive attributes, indexed from 1 to l (which may represent demographic groups, interest groups, etc). We denote by C a the set of data points in a cluster C with attribute a, and by X a the set of data points with attribute a.</p><p>Clustering and Assignment Problems. Unsupervised clustering optimizes a clustering cost objective c, often defined as the sum of distances between points and a set of candidate cluster centers. In a general form, the cost of a clustering is x&#8712;X d p (x, &#981;(x)) 1/p , for metric d and some value of p. We call such objectives metric-based. By varying p = 1, 2, &#8734; we can obtain the k-median, k-means, and k-center objectives, respectively. The clustering problem is thus finding the clustering that has the minimum cost, over all possible sets of centers and maps from X to the set of centers. Since the clustering problem is hard, one often considers, as a stepping stone, the assignment problem, where the centers have been fixed. We shall consider both problems in this paper. In the fair clustering problem we have two objectives, the clustering cost objective c, and a fairness objective f , that aims to balance the representation of sensitive attributes in clusters, further detailed below.</p><p>Pareto Front. Consider the set of all clusterings K of X , and the two objectives c (the cost objective) and f (the fairness objective<ref type="foot">foot_0</ref> ), each mapping K to R. We say that clustering C dominates clustering</p><p>, and one of the inequalities is strict. Intuitively, if a clustering C &#8242; is dominated, it is unworthy of further consideration, because it lags behind in both objectives of interest. If however a clustering C is undominated, that is, there is no clustering in K that is simultaneously better on both fronts, then it is part of the solution. The Pareto front is the set of all undominated clusterings. Fairness Objectives. Our main contribution is an algorithm for computing the Pareto front of the clustering and assignment problems for any metric-based quality function, and any fairness objective -always a function to be minimized -that satisfies the following general condition. </p><p>We discuss the pattern-based and mergeability properties in Appendix B, where we give examples of fairness objectives that are not pattern-based or mergeable. Below, we introduce several fairness objectives commonly found in the literature, which we use in our experiments. All of these objectives are pattern-based and mergeable, as proved formally in Appendix B.1.</p><p>Balance objective (Definition 1 in Chierichetti et al. <ref type="bibr">[19]</ref>): For sensitive attributes that can take two values, indexed 1 and 2, the balance of a cluster C is defined as BALANCE</p><p>The aim is to maximize balance, or equivalently, to minimize the negative balance f (C) = -BALANCE(C). BALANCE has been used in fair clustering as a measure of equalizing proportions of different groups across clusters <ref type="bibr">[19]</ref>. In practice, optimizing BALANCE is both difficult and does not measure how far the proportions of sensitive groups in a clustering are from their true proportions in the population. For this reason, objectives based on proportional violation have been proposed, allowing a central decision-maker to input upper and lower bounds for desired proportions of groups in each cluster, and measured the deviation from these bounds <ref type="bibr">[2,</ref><ref type="bibr">9,</ref><ref type="bibr">10,</ref><ref type="bibr">22,</ref><ref type="bibr">26,</ref><ref type="bibr">32,</ref><ref type="bibr">40]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proportional violation objectives:</head><p>As defined in Esmaeili et al. <ref type="bibr">[26]</ref>, for every sensitive attribute a &#8712; [l], define upper and lower bounds &#179; a and &#180;a with the aim of satisfying &#180;a|C| &#8804; |C a | &#8804; &#179; a |C| for all clusters C &#8712; C. Since this is not always feasible, we define the worst proportional violation of attribute a in cluster C as the minimum non-negative value</p><p>Then, the proportional violation-based objectives are defined as:</p><p>GROUP EGALITARIAN = min max</p><p>These objective operationalize utilitarian and egalitarian concepts from social choice <ref type="bibr">[13]</ref>, minimizing either the sum of proportional violations or the worst violation across attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sum of imbalances objective:</head><p>Finally, for two interest groups in the population (l = 2) the following objective is quite natural:</p><p>that is to say, the sum of the deviations from equality between the two attribute values in the clusters. This objective is most appropriate when datasets contain relatively equal proportions of the two groups. Rather remarkably, the Pareto front for this objective can be computed in polynomial time.</p><p>3 Algorithms for Computing the Pareto Front We shall now present the main Algorithm. Define an X -pattern P to be a k-tuple of l-tuples of nonnegative integers such that i&#8712;</p><p>] specifies the number of data points in cluster i of attribute a. That is, an X -pattern is a clustering except only the attribute values of the points have been specified. <ref type="foot">2</ref> Complete proofs to all subsequent results can be found in Appendix A.</p><p>Algorithm 1 Dynamic Programming Algorithm for Computing the Assignment Pareto Front 1: Input: Number of clusters k, a set of n data points X with l attribute values, k centers S = {s 1 , . . . , s k }, and metric-based cost objective c parameterized by d, p. 2: Output: A table T n containing the solutions of the assignment problem for X for all X -patterns. 3: Method: Dynamic programming. We shall compute T 0 , T 1 , . . . T n . 4: Initialize T 0 to contain the null pattern with cost 0 and the empty clustering. 5: for j = 1 to n do 6:</p><p>Generate all X j -patterns, where X j = {x 1 , . . . , x j }.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>7:</head><p>For each X j -pattern P , and for each cluster i such that P [i, a] &gt; 0, where a is the attribute 8: value of x j , look up in T j-1 the cost of the pattern P i , which is P with x j omitted from 9:</p><p>cluster i, and compute T j-1 (P i ) + d p (x j , s i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>10:</head><p>Let i * be the cluster index that minimizes T j-1 (P i ) + d p (x j , s i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>11:</head><p>Set T j (P ) &#8592; T j-1 (P i * ) + d p (x j , s i * ).</p><p>12:</p><p>Store at T j (P ) the clustering from T [P i * ], with x j added at cluster i * . return T n At the conclusion of the algorithm, the table T n contains the lowest cost clustering C for each X -pattern P , such that P C = P , together with its cost c(C). Then, we can find the Pareto front by first sorting these clusterings for all X -patterns P in increasing c, and then traversing them in order, computing the unfairness of each pattern, remembering the smallest unfairness we have seen so far, and omitting any pattern that has unfairness larger than the smallest seen so far.</p><p>Remark 3.1. The above calculation of T n can be achieved alternatively by a simpler to state but slightly slower algorithm: first generate all X -patterns, and then compute the optimum assignment of each by min-cost flow. Theorem 3.2. Algorithm 1 finds the Pareto front of the assignment problem in time O(kn l(k-1) ), for any metric-based clustering objective and any pattern-based fairness objective.</p><p>Proof Sketch: We prove this theorem by finding an invariant of Algorithm 1: T j [P ] stores the lowest cost assignment of X j that maps to pattern P . We maintain this invariant as we build up our table by searching over possible smaller patterns that we can add our next datapoint x j to and creating the lowest cost assignment that maps to P . Therefore, the assignments stored at T n [P ] are the candidate points for the Pareto front. A simple filtering heuristic, as described above, removes the dominated points from this set of candidates. In terms of running time, the number of possible patterns of total size up to n is upper bounded by n l(k-1) , since each of the lk entries of P takes values between 0 and n, and the tuple corresponding to the last cluster k is fully determined by the other clusters. For each considered pattern, we need to look up at most k previous entries of the table T .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Approximating the Pareto Front for the Clustering Problem</head><p>As we saw, Algorithm 1 computes the Pareto front of the Assignment problem exactly for any input centers S. We next show that it also provides an approximation for the Pareto front of the clustering problem. For this purpose, we first use a vanilla clustering algorithm A for the single-objective problem of minimizing the cost c, to obtain the set S of cluster centers, and then apply Algorithm 1 with this set of centers S. Let &#179; be the approximation ratio of algorithm A. Definition 3.3 (W-approximation of the Pareto Set for clustering). For parameters W = (w c , w f ), we define the W-approximation of the Pareto set X P as a set of feasible points</p><p>This definition is a direct generalization of &#1013;-approximate Pareto set defined by Papadimitriou and Yannakakis <ref type="bibr">[55]</ref>. One may recover the &#1013;-approximate definition by setting &#1013; = max(w c , w f ) -1.</p><p>Theorem 3.4. Algorithm 1 finds a (2 + &#179;, 1)-approximation of the Pareto set of the clustering problem with a metric-based cost objective c and a pattern-based and mergeable fairness objective f . Proof Sketch: We argue that for any clustering map &#981; * with centers S * in the Pareto set for clustering, there exists an assignment to the centers S found by an approximate vanilla clustering algorithm that achieves the same or better fairness and at most (2 + &#179;) times the clustering cost. Then, since Algorithm 1 finds the Pareto set of the assignment problem, we are guaranteed the stated approximation. We construct this assignment using a "routing" argument, first introduced in Bera et al. [9]: we create an assignment &#981; &#8242; by routing all points in &#981; * with center s * &#8712; S * to the center in S nearest to s * . Given that the clustering cost is metric-based, we use the triangle inequality on the cost objective to argue that the cost of &#981; &#8242; w.r.t. S is not more than (2 + &#179;) times the cost of &#981; * w.r.t. S * . Then, we use the mergeability property of the fairness objective to argue that &#981; &#8242; has a weakly better fairness than &#981; * . We can, in fact, modify Algorithm 1 to include non-mergeable fairness objectives, guaranteeing the same approximation ratio and time complexity: Proof Sketch: The trick here is to transform non-mergeable fairness objectives into mergeable ones, and then apply Algorithm 1. In doing so, we control this transformation through re-assigning the centers of potentially empty clusters (which become an issue in non-mergeable fairness functions). Specifically, for patterns that have empty clusters, we search for the best possible fairness over all ways to reassign the empty clusters' centers to other centers' locations and divide up the points in the non-empty centers. A detailed description and proof for this modified algorithm can be found in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Intractability</head><p>The running time of Algorithm 1 for computing the Pareto front has the parameters k, l in the exponent of n. What evidence do we have that this computation is necessary?</p><p>One reason would be that the sheer size of the Pareto front may be exponential. For some objectives, including the BALANCE and the GROUP EGALITARIAN objectives defined in Section 2, the Pareto front is provably of polynomial size. The reason is that all possible values of these objectives are rational numbers involving integers that are all smaller than n, and there are O(n 2 ) possible different such rational numbers. For other objectives, there may be instances of exponential Pareto fronts, as the objectives that sum over the clusters.</p><p>However, a different argument provides a justification of the algorithm's exponential performance: In several papers in recent literature (e.g., see <ref type="bibr">[26,</ref><ref type="bibr">27]</ref>), it is shown that, for a variety of cost functions c and fairness objectives f (including all mentioned proportional violation based objectives), it is NP-hard to find the assignment C that has the smallest c(C) under the constraint that f (C) &#8804; F (for some bound F ). We conclude that, unless P = NP, there is no polynomial-time algorithm for outputting the Pareto front, for at least some combinations of c and f . As a footnote to this discussion, the above complexity argument rules out polynomial algorithms, but not exponential algorithms of the form, for example, O(2 k n 3 ), which is much more benign than O(n (k-1)l ). These would be ruled out, subject to complexity conjectures, if the problem of finding the least costly clustering subject to fairness constraints were shown to be W-complete, a more severe form of complexity that rules out this more benign exponential performance <ref type="bibr">[23]</ref>. We leave this as an interesting open question in the theory of the fairness-cost trade-off of clustering.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">A Polynomial Algorithm for the Pareto Front of the Sum of Imbalances</head><p>For specific cost and fairness objectives, there is still hope that polynomial-time algorithms exist. As we show below, for a simple objective derived from the BALANCE objective, such an algorithm is possible. For the SUM OF IMBALANCES fairness objective and a clustering objective defined in Section 2, we want to compute the Pareto front when l = 2. Surprisingly, it turns out that this problem can be solved in polynomial time by a reduction to the weighted matching problem.</p><p>Theorem 3.6. If l = 2 and the fairness objective f is the sum of imbalances</p><p>then the Pareto front of the assignment problem can be computed in polynomial time.</p><p>Proof Sketch: The image of f is contained in the integer set {1, &#8226; &#8226; &#8226; , n}. For each potential value j, we construct a graph G j that contains X as nodes and another j 'dummy' nodes. We put an edge (u, v) between every u, v &#8712; X with different sensitive attribute value with weight equal to the cost min i (d p (u, i) + d p (v, i)). Between every data point u &#8712; X and dummy point v put an edge (u, v) with cost min i d p (x, s i ). Finding the minimum cost perfect matching in this graph gives the minimum cost of an assignment with fairness value j. The same result can be shown similarly for the MAX IMBALANCE objective,</p><p>). Remark 3.7. These objectives are quite natural extensions of the BALANCE objective, as they minimize the sum or max, over the k clusters, of the deviation from equality between the two groups. It is worth noting that slight modification of these objectives place the problem back in the NP-complete space: A construction of Bercea et al. <ref type="bibr">[10]</ref> (see also <ref type="bibr">Esmaeili et al. [26]</ref>, Theorem 5.1) implies that for minimizing the deviations not from equality (1 : 1) but from the ratio 1 : 3, it is NP-complete to find even the point of the Pareto front with the best fairness value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>We implement our proposed algorithm (Algorithm 1) for finding the Pareto front on three real-world datasets and five fairness objectives, as detailed below.</p><p>Datasets: We use the following real-world datasets for our experiments: the Adult dataset and the Census dataset retrieved from the UCI repository (as the Census1990 version) <ref type="bibr">[43,</ref><ref type="bibr">46]</ref>, and the BlueBike trip history dataset. <ref type="foot">3</ref> The Adult and Census datasets contain numeric attributes such as income, age, education level, often used in applications of fair clustering <ref type="bibr">[6,</ref><ref type="bibr">19,</ref><ref type="bibr">26]</ref>. The BlueBike dataset contains the starting location, ending location, as well as various user attributes for users of the BlueBike bike sharing system in the Boston area. We use as features the starting and ending longitude and latitude values for all rides during a period of a week in May, 2016. These routes are a proxy for common traffic patterns, for which clustering can inform of high-density areas, with the purpose of deciding on new locations for bike stations or public transportation. Clustering and related framings have long been used in facility location problems, for which fairness is a central question <ref type="bibr">[1,</ref><ref type="bibr">41,</ref><ref type="bibr">53,</ref><ref type="bibr">62]</ref>. As the datasets are prohibitively large, we sample from each 1, 000 data points. Further details about the datasets can be found in Appendix C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Objectives: We implement the k-means clustering objective and five different fairness objectives, as defined in Section 2 (BALANCE, GROUP UTILITARIAN, GROUP UTILITARIAN-SUM, GROUP EGALITARIAN, and GROUP EGALITARIAN-SUM).</head><p>Experimental details: We first note that our approach for finding the Pareto front is agnostic to the specific vanilla clustering algorithm used. For our datasets, we use the k-means++ clustering algorithm as the vanilla clustering that has an approximation ratio of O(log(k)) <ref type="bibr">[5]</ref>. All datasets have numeric attributes, allowing a direct embedding into Euclidean space and using k-means++ directly on the features. We use the self-reported gender (male or female) as the sensitive attribute for all datasets. Each sensitive attribute a has a proportion p a in the general population. For the proportional violation objectives, we set upper and lower bounds as a &#182;-deviation from the true proportions (p a ) a : &#179; a = (1 + &#182;)p a , &#180;a = (1 - &#182;)p a . We set &#182; &#8712; [0.005, 0.05] for all experiments and k = 2 clusters. Additional experiments for k = 3 are reported in Appendix E, noting qualitatively similar results. All experiments are run on local computers, using Python 3.9, k-means++ <ref type="bibr">[5]</ref>, NetworkX <ref type="bibr">[33]</ref>, and CPLEX for the implementation of the FCBC algorithm <ref type="bibr">[52,</ref><ref type="bibr">26]</ref> with &#1013; = 2 -10 and N = 50 runs. An empirical analysis of the running time for Algorithms 1 and 2 can be found in Section D, Figure <ref type="figure">5</ref>. <ref type="foot">4</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Pareto Front on Real-World Data</head><p>Figure <ref type="figure">1</ref> illustrates the Pareto front recovered by our dynamic programming approach (Algorithm 1) on the real-world datasets: the curves obtained are an exact recovery of the Pareto front for the assignment problem (as we are not re-computing the clusters centers during the implementation), and thus an approximation for the true Pareto front of the clustering problem. We note that the BALANCE objective and the proportional violation objectives differ: higher balance is considered fairer, whereas lower proportional violation is considered fairer, hence the different shapes of the Pareto fronts. From an evaluation point of view, for each assignment found on the Pareto front, we compute its clustering cost with respect to its actual centers, rather than the initial centers found by k-means++.</p><p>We note that the Pareto front is often, but not always strictly convex or concave, as it simply contains all the undominated points. We note that the proportional violation values will always be worse for the summed objectives than for their min-max equivalents, since the worst proportional violation</p><p>A particular advantage of finding the entire Pareto front is visible for the BALANCE objective in all datasets: as the clustering cost increases, the gain in the BALANCE objective becomes negligible; thus, a practitioner wishing to achieve some level of fairness may gain a lot in quality by allowing BALANCE to decrease by a minimal amount. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Exploring Faster Pareto Front Approximations</head><p>We explore faster algorithms for recovering the Pareto front for the clustering problem. In particular, we leverage a recently proposed linear programming approach that imposes an upper bound on the clustering objective and optimizes a fairness objective, the FCBC algorithm proposed by Esmaeili et al. <ref type="bibr">[26]</ref>. For the GROUP UTILITARIAN and GROUP EGALITARIAN objectives with a clustering cost upper bound input U , FCBC presents a polynomial-time approach for finding an approximation of a point x on the Pareto front that has a clustering cost upper bounded by a quantity (2 + &#945;)U with a fairness additive approximation of &#951;. Here, &#945; is the approximation ratio of a vanilla clustering algorithm for the clustering objective, and &#951; is an additive approximation for the fairness objective. Thus, we can extend the W-approximation of a Pareto set definition (Definition 3.3) to include an additive approximation term: for parameters W = ((w c , v c ), (w f , v f )), we can define the (W, V)approximation of the Pareto set X P as a set of feasible points</p><p>We note that Algorithm 1 gives only a multiplicative approximation, so v c = v f = 0. In contrast, for a point x &#8712; X P on the true Pareto front, FCBC recovers a ((2 + &#945;, 0), (1, &#951;))-approximation. <ref type="foot">5</ref>We extend the FCBC algorithm by allowing a sweep over the clustering cost upper bound U , thus, in theory, obtaining an approximation of the Pareto front in polynomial time (for a detailed description, see Algorithm 2 in Appendix D).</p><p>Figure <ref type="figure">2</ref> shows that repeated FCBC (Algorithm 2) recovers few points on the Pareto front recovered by Algorithm 1: sometimes it recovered the vanilla clustering cost and fairness (the upper most left point in panels b,c,e,f), whereas in other cases it recovers dominated clustering assignments (in panels a-d). We attribute this inaccuracy in recovery to the additive approximation in the fairness objective. Furthermore, whereas both our dynamic programming approach and repeated FCBC have an approximation ratio of (2 + &#945;) in the clustering objective, dynamic programming often gets a strictly better cost for similar fairness values. In other words, where repeated FCBC gains in running time, 6 it loses in recovery accuracy and clustering cost. This is particularly problematic when the only point recovered by FCBC is the vanilla clustering assignment: this means that even when a practitioner may be willing to trade-off significant clustering cost in order to improve fairness, that trade-off is not realizable in practice solely through FCBC.</p><p>Furthermore, the FCBC algorithm does not work for objectives summing over the clusters, such as GROUP UTILITARIAN-SUM and GROUP EGALITARIAN-SUM. For such objectives, there is no polynomial time algorithm that can recover the Pareto front to within an additive approximation of O(n &#948; ), for &#948; &#8712; [0, 1) (see Theorem 7.1 in Esmaeili et al. <ref type="bibr">[26]</ref>). Our approach, however, can also include such objectives, as we show both in theory and in practice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>Overall, our work shows the versatility of our proposed algorithms for a variety of fairness and clustering objectives. We provide simple properties sufficient in theory for the recovery of the Pareto front for the (clustering, fairness)-biobjective optimization problem, with extensive experiments on real-world datasets. We discuss limitations and future directions of our work in this section.</p><p>First, while our approach has the advantage of being agnostic to specific objectives, it loses in running time as compared to approaches that optimize for specific fairness objectives. While previous work provides approximation algorithms for optimization clustering under a fairness constraint, none offers a method for computing the Pareto front. On the question 'can we have faster algorithms with worse approximation bounds?', the answer is yes. Our paper provides (the first, to our knowledge) such exploration: We show in Section 4.2 that we can adapt such polynomial-time methods to compute an approximation of the Pareto front; however, what we gain in runtime, we lose in approximation bounds: the repeated FCBC algorithm has an additive approximation on the fairness objective that can get large in practice (we get a different Pareto front, with points quite far away in the objective cost, compared to the dynamic programming approach), and with no theoretical bounds on the additive approximation.</p><p>A future direction could study faster algorithms that can recover a similar approximation of the Pareto front. Interesting theoretical open questions emerge: while our analysis ruled out polynomial algorithms for general objectives, are there exponential algorithms (i.e. of the form O(2 k n 3 )), that still provide an improvement from O(n (k-1)l )-type of algorithms? And, are there other objectives</p><p>4200 4225 4250 4275 4300 Clustering cost 0.00 0.02 0.04 0.06 0.08 0.10 Proportional violation Adult Dyn Progr FCBC (a) Group Util 55000 57500 60000 62500 65000 Clustering cost 0.00 0.01 0.02 0.03 0.04 Proportional violation Census Dyn Progr FCBC (b) Group Util 2388 2389 2390 2391 Clustering cost 0.00 0.01 0.02 0.03 0.04 Proportional violation BlueBike Dyn Progr FCBC (c) Group Util 4200 4225 4250 4275 4300 4325 Clustering cost 0.00 0.02 0.04 0.06 Proportional violation Adult Dyn Progr FCBC (d) Group Egalit 53500 54000 54500 55000 Clustering cost 0.00 0.01 0.02 0.03 0.04 0.05 Proportional violation Census Dyn Progr FCBC (e) Group Egalit 2388 2389 2390 2391 Clustering cost 0.000 0.005 0.010 0.015 0.020 Proportional violation BlueBike Dyn Progr FCBC (f) Group Egalit that admit polynomial-time algorithms for computing an approximate Pareto, aside from SUM/MAX OF IMBALANCES? Furthermore, future work could investigate algorithms for recovering the Pareto front for fairness objectives that are not pattern-based, which is a prerequisite for our algorithms. We provide some examples of such objectives in the Appendix.</p><p>Our approach is best suited for group fairness definitions. Extending this work to other definitions would be an excellent avenue for future work. We note that for some individual fairness definitions, the notion of a Pareto front does not apply: for example, the socially fair k-means clustering notion proposed by Ghadiri et al. <ref type="bibr">[31]</ref> defines a single optimization objective that already incorporates fairness (by minimizing the max distance between points in each group and their cluster centers). With this definition, the fairness objective and the clustering objective are not separated in a multi-objective optimization problem, but rather combined in a single objective. Therefore, there is no Pareto front, since this notion applies to a multi-objective optimization problem. For other definitions of individual fairness, such as recent adaptations of individual fairness under bi-criteria optimization problems <ref type="bibr">[50]</ref>, one can apply a repeated approximation algorithm under an upper bound on one of the objectives (similar to the repeated-FCBC algorithm). In doing so, one would obtain loose approximation guarantees (see, for example, the approximation guarantee for individual fairness proved by Mahabadi and Vakilian <ref type="bibr">[50]</ref>). Dynamic programming approaches would not directly apply; however, we think this would be an excellent avenue for future work.</p><p>Finally, we assumed that the sensitive attributes are disjoint. For overlapping attributes, one could assign a new sensitive attribute to every overlap set and apply our approach, however, with a significant increase in running time. Future work could investigate alternative approaches that optimize running time for overlapping attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Complete Proofs</head><p>Proof of Theorem 3.2: Define V P as the set of clusterings C that have the pettern P C = P and C is a clustering of X j . Let c be the metric-based cost function for the assignment to centers S, which is parameterized by a distance function d and non-negative exponent p. We will argue that, for all j, the clustering C stored at T j [P ] has the property that C = arg min C &#8242; &#8712;V P c(C &#8242; ). We will show that this invariant always holds by induction on the number of data points in the current level of the table j (equivalently, the number of points clustered by C in the main loop of Algorithm 1). The base case is j = 0, where we store the empty clustering, the unique clustering of 0 points.</p><p>Suppose we have some pattern P for the set of the first j points in our ordering, X j . Let C * = arg min C&#8712;V P c(C), so C * is optimal cost for P . Point x j must be in some cluster C i &#8712; C * . The clustering C &#8242; i of the remaining points (which are exactly the first j -1 points from the data) induces some pattern P &#8242; i = P C &#8242; i which is an X j-1 -pattern. Since C * was assumed to be of optimal quality, the clustering C &#8242; i must have the property that</p><p>Otherwise, C &#8242; i could be replaced in C * with the lower cost clustering that has the same pattern and reduce the cost of C * (a contradiction).</p><p>We have shown that C * = C i , where C i = C &#8242; i + x j is generated by assigning the first j -1 points according to C &#8242; i , and then adding point x j to cluster i. By the inductive assumption T j-1 [P &#8242; i ] correctly stores the optimal C &#8242; i . Since Algorithm 1 minimizes c(C i ) over cluster indices i, we correctly compute the optimal clustering of j points that maps to P . All candidate points for the Pareto set of the assignment problem are stored in T n , meaning that every clustering C &#8242; of X is weakly dominated by some clustering in T n . This is because every clustering C &#8712; K must map to some X -pattern P . By our algorithm invariant, C * &#8712; T n [P ] is the clustering that minimizes assignment cost across clusterings that have pattern P , and thus c(C * ) &#8804; c(C). In addition, since Proof of Theorem 3.4: Theorem 3.2 states that Algorithm 1 finds (exactly) the Pareto front of the assignment problem. Let &#981; * , S * be the optimal cost clustering map and set of centers, respectively, that satisfy a particular fairness bound F . Let &#981;, S be the clustering map and set of centers found by a vanilla clustering algorithm A, which achieves an &#179;-approximation to the best clustering cost. We define a new assignment &#981; &#8242; , S, by applying a "routing" argument, first introduced in Bera et al. <ref type="bibr">[9]</ref> and reused in Esmaeili et al. <ref type="bibr">[27]</ref>.</p><p>Define a function nrst(s * ) = arg min s&#8712;S d p (s, s * ) which returns the nearest center in S to an input center s * . Now define an assignment map &#981; &#8242; where vertices are routed from their center s * i &#8712; S * to nrst(s * i ) &#8712; S. In other words, for every point x, &#981; &#8242; (x) = nrst(&#981; * (x)). Now we can write:</p><p>The first and third inequalities follow from the triangle inequality on d p and the second inequality is due to the definition of the nrst function. In addition, &#179; ( x d p (x, &#981; * (x)))</p><p>1/p , since &#981; is the clustering found by the vanilla clustering algorithm A. Applying our previous inequality together with the triangle inequality on the p-norm:</p><p>This implies that &#981; &#8242; S is an assignment map with respect to centers S that has at most (2 + &#179;) times the cost of &#981; * with respect to centers S * .</p><p>In addition, the clustering C &#8242; associated with &#981; &#8242; can be generated by, for each s j &#8712; S, merging all clusters C * i &#8712; C * such that nrst(s * i ) = s j . This procedure can be done by sequentially merging pairs of clusters. Since the fairness objective is mergeable (see Def. 2.2), this implies that f (C * ) &#8805; f (C &#8242; ). So, the assignment map &#981; &#8242; with respect to centers S also satisfies the fairness bound F . There exists an assignment to centers S that is a (2+&#179;, 1)-approximation to &#981; * , S * , so the Pareto front of the assignment problem is a (2 + &#179;, 1)-approximation to the Pareto front of the clustering problem as desired.</p><p>Proof of Theorem 3.5. We describe in detail the algorithm modification needed to compute the Pareto front for non-mergeable fairness objectives. First, we give some necessary preliminaries: Definition A.1 (Refinement of a pattern and refinement DAG D). Consider the directed graph D with the X -patterns as nodes and edges (P 1 , P 2 ) if merging two nonempty rows of the pattern P 2 yields the pattern P 1 . A pattern P &#8242; is a refinement of another pattern P iff there is a path from P to P &#8242; in D, i.e. if P can be obtained from P &#8242; by merging different parts of P &#8242; . Note that D is a directed acyclic graph (DAG). Let R P be the set of P &#8242; that are refinements of P .</p><p>Similarly, one can define the refinement of a clustering C. Note that if a clustering C &#8242; is a refinement of another clustering C, then its pattern P C &#8242; is a refinement of the pattern of cluster C, P C . Next, we define a modified fairness function created from the non-mergeable function f , with the purpose of reducing it to a mergeable function and applying Algorithm 1.</p><p>Definition A.2 (Modified fairness function). If f is a pattern-based fairness function, define its associated modified function f to be f (P ) = min P &#8242; &#8712;R P (f (P &#8242; )) for every pattern P .</p><p>Note that f is still required to be pattern-based, and thus modified function is well-defined. By definition, f is a mergeable function. Furthermore, if f is mergeable then f = f . Lemma A.3. We can compute in O(n l(k-1) ) time the modified function f for all X -patterns P , and compute for each P a refinement P &#8242; such that f (P ) = f (P &#8242; ).</p><p>Proof. We compute f bottom up in the DAG D. We also compute for each node a pointer to its refinement pattern that has the minimum fairness cost. At the sinks P (patterns that have no outward edge pointing from them to other patterns), f (P ) = f (P ) and P points to itself only. For every non-sink node P , we set f (P ) = min v&#8712;N (P ) f (v) where N (P ) is the set of neighbors of P , and we set the pointer of P to the descendant pointed by the neighbor v of P with the minimum f (v).</p><p>The total time complexity is linear in the number of nodes and edges of D. The number of edges is at most k 2 times the number of nodes, since every node P &#8242; has at most k 2 incoming edges (there are at most k 2 choices for the two parts of P &#8242; that are merged to form a parent pattern). The number of nodes of D, i.e. X -patterns, is at most 4k( n 2 ) l(k-1) : All the components of all the rows of a pattern P , except possibly at most two components, are less than n/2; furthermore, specifying k -1 rows of P determines also the last row because the sum for each attribute value must match the given set of points. From the bounds on the number of nodes and edges of D, it follows that the time complexity is O(n l(k-1) ).</p><p>Recall that in a clustering we associate a center with each cluster, and the cost of a clustering is computed from the distances of the points from the center of their cluster. We allow different clusters to have the same point as their center. We show the following result, needed in the proof of Theorem 3.5: Lemma A.4. If we are given a clustering C with centers S and P &#8242; is a refinement of P C then we can compute efficiently a clustering C &#8242; with centers S &#8242; , which we call a center reassignment of C, such that P C &#8242; = P &#8242; and c(C &#8242; , S &#8242; ) = c(C, S). The centers S &#8242; will be a subset of the points in S but with multiplicity (i.e. different clusters may have the same center). Proof. Suppose that row i of pattern P is formed by merging a set J i of two or more nonempty parts of P &#8242; . Then we split the cluster C i of C into a set of |J i | subclusters, one for each part in J i , and we place in each subcluster a number of points from C i for each attribute value that matches the corresponding entry in the row of P &#8242; ; we assign to all the subclusters the same center as the center of C i . After performing the above splitting for all parts i of P that are refined in the pattern P &#8242; , we obtain a clustering C &#8242; such that P C &#8242; = P &#8242; . Since the subclusters created from splitting a cluster C i of C are assigned the same center as the center of C i , it follows from the definition of a metric-based cost function, that C and C &#8242; have the same cost.</p><p>We now describe the algorithm modification for non-mergeable fairness objectives. For a non-mergeable fairness objective f , let f = min P &#8242; &#8712;R P (f (P &#8242; )) be its associated modified function. Apply the algorithm of Lemma A.3 to compute f (P ) for every X -pattern P , and the corresponding pointer to its optimal refinement P &#8242; . As before, use a vanilla clustering approximation algorithm A to compute a clustering that approximates the minimum cost. Use the centers of this clustering in Algorithm 1 to construct the dynamic programming table T n . Process the patterns P as before in order of increasing cost, but now use the modified function f as the fairness objective to filter out dominated patterns, and for each undominated pattern P , replace the clustering C in T n [P ] by the center reassignment clustering C &#8242; constructed as in Lemma A.4. The algorithm returns the set of these center reassignment clusterings C &#8242; for all undominated patterns. By the definition of the modified fairness function and Lemma A.4, these are undominated clusterings for the original (non-mergeable) fairness function f and the clustering cost c.</p><p>From our timing analysis, the time complexity of the algorithm is O(kn l(k-1) ).</p><p>The proof of the approximation follows the proof of Theorem 3.4. We define &#981; * , S * , &#981;, S, and &#981; &#8242; , S similarly as before. Observe that in the construction of &#981; &#8242; , we simply merged clusters from &#981; * and gave them new centers. Since the pattern of a clustering is independent of the identities of the centers, the pattern of &#981; * is a refinement of the pattern of &#981; &#8242; . So by Lemma A.4, there exists a clustering &#981; &#8242;&#8242; that is a center reassignment of &#981; &#8242; and has the same pattern as &#981; * . Therefore, &#981; &#8242;&#8242; , which is one of the reassignments we search over in the algorithm, has f (C &#8242;&#8242; ) = f (&#981; * ) and c(C &#8242;&#8242; ) = c(&#981; &#8242; ) (here, we used interchangeably the clustering cost function c applied to the clustering or the clustering assignment map, which are equivalent). Since &#981; &#8242; has at most 2 + &#179; worse cost than &#981; * (see proof of Theorem 3.4), so does C &#8242;&#8242; . Therefore, the Pareto front computed by the algorithm is a (2 + &#179;, 1)-approximation to the Pareto front of the clustering problem, as desired.</p><p>Proof of Theorem 3.6: First we notice that in this case f can take at most n + 1 values: If n = |X | is even, then it can take only values the even integers between 0 and n, and if it is odd all odd integers between 1 and n -1. We shall treat the case of even n, the case of even n being very similar.</p><p>Given dataset X , metric d, and exponent p, for each even number F between 0 and n we must compute the best assignment that has fairness F . Once this is done, we only have to sort with respect to fairness and remove the dominated assignments (in a similar vein to the filtering heuristic of Algorithm 1). Clearly, this step is polynomial time since we can sort the points in O(n log(n)) and then make a single pass over them to remove the dominated points. All that remains to show is that we can compute the best assignment with fairness F also in polynomial time. We do so as follows.</p><p>Given an even number F , we construct a weighted graph G F with n nodes corresponding to the data points, plus F dummy nodes. We join every data point x with every dummy node z by an edge of weight min i&#8712;[k] d p (x, s i ). We join any two data points x, y with different attribute value (recall that there are only two attribute values) by an edge of length min k i=1 (d p (x, s i ) + d p (y, s i )). We call an edge of G F to be of type i &#8712; [k] if the i-th cluster achieves the minimum that defines the weight of the edge. (In other words, i is the argmin of the minimization expression.) A perfect matching in a graph is a set of disjoint edges that includes all the nodes. The weight of a matching M is defined as the sum of the weights of edges that are contained in the matching: w(M ) = e&#8712;M w(e). Compute the minimum weight perfect matching M in the graph G F . This can be done in polynomial time using Edmonds' algorithm <ref type="bibr">[24]</ref>, specifically in time O(N (E + N log N )) for a graph with N nodes and E edges <ref type="bibr">[30]</ref>; in our case the graph G F has at most 2n nodes and 2n 2 edges.</p><p>We will show that the minimum weight of a perfect matching is equal to the minimum cost of a clustering with fairness F , and furthermore we can derive a minimum cost clustering from the minimum weight matching M .</p><p>Consider any clustering C with f (C) = F . Starting from C, we construct a perfect matching M of G F as follows: For each cluster C &#8712; C, choose the attribute value a that has fewer nodes in C (breaking ties arbitrarily). Now, match these nodes of attribute a in C arbitrarily with nodes of the larger attribute group in C. Finally, match any remaining nodes of the other group (the larger attribute group) to dummy nodes.</p><p>We claim that such a matching M is possible, because the total number of nodes that cannot be matched is precisely F , the number of dummy nodes. We claim now that the weight of this matching w(M ) is at most the cost of the clustering, c(C); that is, w(M ) &#8804; c(C).</p><p>The weight of the matching is at most the clustering cost for the following reason: each matched pair x, y contributes to c(C) at least the weight of the corresponding matched edge; and any data point matched to a dummy node again contributes to c(C) at least the weight of the matched edge. Now consider the minimum weight perfect matching M of the weighted graph G F . Clearly, w( M ) &#8804; c(C). Construct from M a corresponding clustering &#264; as follows: any matched data point whose edge is of type i is placed in cluster i, while any data point matched in M with a dummy node with an edge of type i is added to cluster i. This clustering satisfies c( &#264;) = w( M ) &#8804; c(C). Since C was assumed to be an arbitrary clustering with f (C) = F , &#264; is the optimum such clustering.</p><p>The same result can be shown for the MAX IMBALANCE objective.</p><p>Theorem A.5. If l = 2 and the fairness objective f is the max imbalance</p><p>i ||, then the Pareto front of the assignment problem can be computed in polynomial time.</p><p>Proof. The objective can take again at most n+1 values (more precisely, at most 1+max(|X 1 |, |X 2 |) values). For each possible value F , construct a weighted graph G F , whose set N F of nodes consists of n nodes corresponding to the given set X of data points, a set D i of F dummy nodes for each i &#8712; [k], and if n + kF is odd, then N F has one more dummy node so that the total number of nodes is even. The edge set E F of G F consists of the following edges: for each pair x, y of data points with different attribute values, we include an edge (x, y) with weight min k i=1 (d p (x, s i ) + d p (y, s i )) and associate with the edge as its type the index i &#8712; [k] that achieves the minimum in the weight; for each data point x and dummy node z in D i , we include an edge (x, z) with weight d p (x, s i ) and associate type i to the edge; for every pair z, w of dummy nodes we include an edge (z, w) with weight 0 (we do not associate a type with these edges).</p><p>We then compute the minimum weight perfect matching M of the graph G F . Just like in the proof of Theorem 3.6, the minimum weight perfect matching can be found in polynomial time. The matching M induces an assignment &#264; of the data points: every point x is assigned to the cluster i corresponding to the type of the edge of M incident to x. By construction, the cost of the assignment is equal to the weight w( M ) of the matching. Furthermore, for every cluster i &#8712; [k], the number of data points in the cluster that are matched with dummy nodes in</p><p>Therefore the fairness cost of the assignment is at most F .</p><p>Conversely, any assignment C with fairness F induces a perfect matching M of G F , as follows: For each cluster i, match data points of cluster C i in C with opposite attribute values arbitrarily in pairs, and match the remaining ||C 1 i | -|C 2 i || data points to dummy nodes in D i . The rest of the dummy nodes that are not matched to a data point are matched arbitrarily in pairs. The weight of this matching M is then at most the cost of the clustering C, w(M ) &#8804; c(C). Therefore, c(C) &#8805; w( M ) = c( &#264;), that is, any assignment C with fairness F is dominated by the assignment &#264; that we derived from the minimum weight perfect matching. We show that BALANCE is also mergeable through a simple induction over the number of clusters to be merged. For the base case, take a clustering C and construct a clustering C &#8242; in which two clusters from C have been merged (assume without loss of generality that</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Analyzing Fairness Objectives</head><p>For ease of notation, denote by</p><p>Note that all variables are non-negative. We assume that neither C 1 and C 2 are empty. We need to show that the fairness objective is at least as good for the merged clustering than for the original clustering. It is sufficient to show that</p><p>First, this property is true as we can easily prove by considering possible cases:  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Then, if equation 5 holds, then BALANCE(C) &#8804; BALANCE(C &#8242; ).</head><p>To see this, we there is a cluster index i &gt; 2 for which BALANCE(C) = BALANCE(C i ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Then, BALANCE(C</head><p>The SUM OF IMBALANCES and MAX IMBALANCE objectives are derived from the BALANCE objective. As they are only a function of the number of data points of different attributes in each cluster, they are also clearly pattern-based. To see that SUM OF IMBALANCES is also mergeable, we consider again the base case of the induction proof, for 2 clusters. For the SUM OF IMBALANCES objective, note that</p><p>| by the triangle inquality for any clusters C i , C j . In merging two clusters C i and C j from a clustering C, obtaining a clustering C &#8242; , the contribution to the objective of C i &#8746; C j and the empty cluster is exactly Finally, for the induction step, if mergeability holds for merging a set of w clusters, then it holds for merging a set of w + 1 clusters as well, by reducing to the base case: without loss of generality, denote the w + 1 clusters to be merged by C 1 , &#8226; &#8226; &#8226; , C w+1 . By the induction hypothesis, mergeability holds for merging any w clusters, so for &#8746; j&#8712;[w] C j . Then, the base case applies for the clusters &#8746; j&#8712;[w] C j and C w+1 .</p><p>Proportional violation objectives: First, we easily note that for any two clusterings C and C &#8242; that induce the same pattern &#8710; C a , we pair each cluster C &#8712; C with a different cluster</p><p>. Since all proportional violation-based objectives defined are only a function of (&#8710; C a ) a,C , it follows that all clusterings that have the same pattern will also have the same objective value for all four of the proportional violation-based objectives.</p><p>Finally, we will show that they are also mergeable. We show this again by induction over the number of merged clusters for each of the objectives. We start with the base case of two clusters, denoted without loss of generality by C 1 and C 2 . We assume that neither C 1 and C 2 are empty. We say that C 1 and C 2 are part of a clustering C, and we aim to show that the clustering C &#8242; in which C 1 and C 2 got merged will have OBJECTIVE(C &#8242; ) &#8804; OBJECTIVE(C), for OBJECTIVE is one of the GROUP UTILITARIAN, GROUP UTILITARIAN-SUM, GROUP EGALITARIAN, GROUP EGALITARIAN-SUM objectives. For a sensitive attribute a &#8712; [l], assume without loss of generality that</p><p>|C2| . Then, the following property holds, also known as the mediant inequality:</p><p>To see this, note that the left hand side is equivalent to:</p><p>The right hand side is equivalent to:</p><p>By the definition from equation 2, we have</p><p>for clusters C 1 , C 2 &#8712; C. As C 1 and C 2 got merged in C &#8242; , they got replaced by the empty cluster C &#8709; and the merged cluster</p><p>Using inequalities 9 and 10 with the mediant inequality, we get that</p><p>Since by definition, &#8710; C1&#8746;C2 a is the minimum value that satisfies equation 10, we get that</p><p>Note that the empty cluster will have</p><p>. This also implies that max</p><p>Since the attribute a &#8712; [l] was chosen arbitrarily, we also get that</p><p>and thus the GROUP UTILITARIAN objective cannot increase by merging two clusters. Similarly, for the GROUP UTILITARIAN-SUM, since</p><p>, and all other clusters have the same proportional violation in both clusterings, we also get that</p><p>Since the attribute a &#8712; [l] was chosen arbitrarily, we also get that</p><p>and thus the GROUP UTILITARIAN-SUM objective can also not increase by merging two clusters. Furthermore, equation 12 implies that max(</p><p>Since this holds for any arbitrary a &#8712; [l], it also implies that</p><p>and thus the GROUP EGALITARIAN objective cannot increase by merging two clusters. Finally, since</p><p>, and all other clusters have the same proportional violation in both clusterings, we also get that</p><p>Since the attribute a &#8712; [l] was chosen arbitrarily, we also get that</p><p>and thus the GROUP EGALITARIAN-SUM objective cannot increase by merging two clusters.</p><p>For the induction step, the proof is identical to the proof for the BALANCE objective: if mergeability holds for merging a set of w clusters, then it holds for merging a set of w + 1 clusters as well, by reducing to the base case: without loss of generality, denote the w + 1 clusters to be merged by C 1 , &#8226; &#8226; &#8226; , C w+1 . By the induction hypothesis, mergeability holds for merging any w clusters, so for &#8746; j&#8712;[w] C j . Then, the base case applies for the clusters &#8746; j&#8712;[w] C j and C w+1 for all objectives. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.2 A discussion on pattern-based and mergeability properties</head><p>Example of non-pattern based fairness objectives. As mentioned in the introduction, informally, a pattern-based fairness objective computes a per-cluster fairness quantity that only depends on the number of data points from each sensitive attribute. In practice, many fairness objectives satisfy this property, as they aim to operationalize different versions of proportional representation, with the goal of having balanced clusters among different sensitive attributes. Objectives that define fairness through the proportion of k-centers that are closest to them are not pattern-based, since they do not depend on the number of other nodes of different attributes in the same cluster but rather on equalizing the proportions of k-center assignments to different attributes <ref type="bibr">[17,</ref><ref type="bibr">41,</ref><ref type="bibr">45]</ref> or the max average distance between points of different groups to their centers <ref type="bibr">[31]</ref>. For example, the definition in Ghadiri et al. <ref type="bibr">[31]</ref> states that for two groups R (red) and B (blue), the fairness objective is defined for a clustering C as</p><p>where X A is the subset of points from X with attribute A. Take the following set of points: 3 overlapping points x 1 , x 2 , x 3 , with x 1 , x 2 &#8712; B, x 3 &#8712; R, and 3 overlapping points x 4 , x 5 , x 6 at different coordinates than the first 3, with x 4 , x 5 &#8712; B, x 6 &#8712; R. Then, the clustering</p><p>x 5 } has a non-zero fairness objective value with respect to its true centers (by computing the centroids). The clustering</p><p>}, however, has the same pattern as (C 1 , C 2 ), but a fairness objective value of 0 with respect to its true centroids. Even if we evalute (C 1 , C 2 ) with respect to the centroids of (C &#8242; 1 , C &#8242; 2 ), it still has a non-zero fairness objective value. See Figure <ref type="figure">3</ref> (a) for an illustration.</p><p>Example of non-mergeable fairness objectives. As discussed in the introduction, a mergeable objective means that the objective value can only improve or stay the same when merging any number of clusters. While many fairness objectives are mergeable, we give an example below of non-mergeable objectives. We note that objectives that enforce a minimum number of data points in each cluster tend to not be mergeable. For example, the &#196; -ratio objective defined by <ref type="bibr">[32]</ref> is non-mergeable, where the objective is defined as:</p><p>The &#196; -ratio enforces a minimum number of the total points that must go in each cluster, so empty clusters violate the mergeability condition on the fairness objective (since the number of clusters k is fixed).</p><p>Pareto front approximation gap gets arbitrarily large without mergeability. We rely on the assignment problem to provide an approximation for the clustering problem when computing the Pareto front, which works for pattern-based and mergeable fairness objectives. However, for fairness objectives that are pattern-based but are not mergeable, if we apply directly Algorithm 1, without using a modified fairness function to filter the dominated patterns and adjust the clusterings, as we did in the proof of Theorem 3.5, then the resulting approximation ratio is no longer necessarily bounded, as we show in the example below. We showcase this for the &#196; -ratio objective.</p><p>We construct a dataset X such that |X | = 8m that contains 8 sets of m points each on the plane. Each point has a sensitive attribute, denote by b (blue) or r (red). We denote these sets by (P i ) i&#8712; <ref type="bibr">[8]</ref> , constructed in the Euclidean space with the following coordinates (see Figure <ref type="figure">3</ref> (b) for an illustration):</p><p>&#8226; P 1 contains 2m -1 blue points situated at coordinates (-&#1013;, 1)</p><p>&#8226; P 2 contains 2m -1 red points situated at coordinates at (&#1013;, 1)</p><p>&#8226; P 3 contains 1 blue point situated at coordinates (1, 0)</p><p>&#8226; P 4 contains 1 red point of situated at coordinates (1, 0)</p><p>&#8226; P 5 contains 2m -1 blue points situated at coordinates (&#1013;, -1)</p><p>&#8226; P 6 contains 2m -1 red points situated at coordinates (-&#1013;, -1)</p><p>&#8226; P 7 contains 1 blue point situated at coordinates (-1, 0)</p><p>&#8226; P 8 contains 1 red point situated at coordinates (-1, 0)</p><p>Set &#1013; &lt; 1 8m , and k = 4. Then, the cost of not assigning the 4 points at (1, 0) and (-1, 0) their own two centers outweighs the benefits of assigning 2 centers to the points clustered near (0, 1) or (0, -1). Therefore, the best k-means and k-median clustering yields centers S = {(0, 1), (1, 0), (0, -1), (-1, 0)}.</p><p>We set &#196; = 1 4 as our fairness constraint, which constrains every center to have exactly m red points and exactly m blue points. Observe that the optimal quality way &#981; to assign such points to s 2 is to assign m -1 of the red points in P 2 and m -1 of the blue points in P 5 to S 2 (in addition to the two points assigned to s 2 by the unfair k-means/medians clustering). So the center at s 2 clusters together 2 sets of m -1 points separated by distance 2 on the plane. Even if we allow moving the location of s 2 after the assignment, the clustering cost is still lower bounded by(2m -2) 1/p , since the best center is unit distance from 2(m -1) points. Symmetrically, we do the corresponding assignment to s 4 and get a clustering cost lower bound of 2(2m -2) 1/p . Setting our centers to be S * = {(0, 1), (0, 1), (0, -1), (0, -1)}, the assignment function &#981; * with optimal quality under the fairness constraint maps m points from each of P 1 and P 2 to s * 1 and maps the remainder of the points from P 1 and P 2 , as well as the points in P 3 and P 4 to s * 2 . Symmetrically, we do the corresponding assignment to s * 3 and s * 4 with the points from P 5 , P 6 , P 7 , and P 8 . This clustering achieves a clustering cost at most ((8m-4)&#1013; p +4( &#8730; 2) p ) 1/p , where the first term comes from the 8m-4 points in sets P 1 , P 2 , P 5 , P 6 and the second term comes from the 4 points in sets P 3 , P 4 , P 7 , P 8 .</p><p>Let &#981; be the lowest cost assignment to centers S, and let S m be the best possible centers for &#981;. Then we have that:</p><p>Considering the k-means or the k-median clustering objectives, we get (8m -4)&#1013; p &#8804; 1, since we set &#1013; &lt; 1 8m . Therefore,</p><p>Thus, for any constant b, the assignment Pareto set is not a (b, 1) multiplicative approximation for the clustering Pareto set for non-mergeable fairness functions. This justifies the need for a modified algorithm for non-mergeable fairness objectives that can change the set of centers while searching for the best assignment, as described in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Datasets details</head><p>For all datasets, we subsample 1, 000 data points, as the datasets sizes are prohibitively large. For the Adult and Census datasets, we use all numerical features available in the data and embed them in Euclidean space. For the BlueBike dataset, we use the starting and ending latitude and longitude as coordinates, directly embedded in Euclidean space. For all datasets, the gender of users is self-reported. Table <ref type="table">1</ref> describes the characteristics of all datasets. We note that some datasets naturally cluster into fairer clusters than others. For this reason, setting &#182; too high renders a single point on the Pareto front, since the vanilla clustering itself will be fair for proportional violation-based objectives. In fact, the proportional violation value will be equal to 0. For this reason, we experiment with various values of &#182;, reported in Table <ref type="table">1</ref> for the experiments presented in the main text. As described in the main text, we investigate alternatives for faster recovery of the Pareto front in the case of two objectives: clustering cost and fairness. We employ a recently proposed linear programming method that bounds the clustering objective and optimizes a fairness objective, up to an approximation, denoted the FCBC algorithm (Algorithm 1 in Esmaeili et al. <ref type="bibr">[26]</ref>). We implement the FCBC algorithm repeatedly by discretizing over the clustering cost space. We perform a sweep over the upper bound U values, setting its minimum value equal to the clustering cost of a vanilla clustering algorithm and its max value equal to a constant times the clustering cost of a vanilla clustering algorithm, for some chosen constant. We describe the approach formally in Algorithm 2, with an illustration in Figure <ref type="figure">4</ref>.</p><p>We compare the Pareto front recovered from Algorithm 2 with the Pareto front recovered from the dynamic programming approach in Algorithm 1 on all the real-world datasets in Figure <ref type="figure">2</ref> in Section 4, setting U max = U min &#8226; C, where U min is the clustering cost of a vanilla clustering algorithm (in our case, k-means++), C = 1.5, and N = 50. We note that setting C equal to 1 is equivalent to constraining the clustering cost in the FCBC algorithm to be equal to the vanilla clustering algorithm cost.</p><p>Running time comparison. We illustrate in Figure <ref type="figure">5</ref> the running time comparison between the dynamic programming approach from Algorithm 1, labeled as 'Dyn Progr', and the repeated FCBC approach from Algorithm 2, labeled as 'FCBC', for samples of different sizes of all the datasets. We showcase two objectives, GROUP UTILITARIAN and GROUP EGALITARIAN, since the original FCBC algorithm <ref type="bibr">[26]</ref> is designed only for these two, among the five objectives we described in Section 4. We note that Algorithm 1 has a similar running time for all other objectives. The experimental running time follows the analysis presented in Section 3: for k = 2 clusters and l = 2 sensitive attributes, one would expect a running time of O(n 2 ). Figure <ref type="figure">5</ref>: Running time comparison with our dynamic programming approach from Algorithm 1, labeled as 'Dyn Progr', and the repeated FCBC approach from Algorithm 2, labeled as 'FCBC', for each dataset (by column) and for the GROUP UTILITARIAN and GROUP EGALITARIAN objective (by row).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E Additional Experiments</head><p>We present experimental results on the three datasets and the five fairness objectives defined in Section 4 for k = 3 clusters in Figures <ref type="figure">6</ref> and <ref type="figure">7</ref>. We note that results are qualitatively similar as for k = 2 clusters. As mentioned in the main text, we note that the Pareto front need not be strictly convex for two minimization objectives, nor strictly concave for a minimization objective and a maximization objective (as in the case of the clustering objective and the balance objective), since it simply consists of the undominated points.</p><p>For the interested reader, we also showcase the Pareto front for the SUM OF IMBALANCES objective, for all three datasets, for k = 2 and k = 3, in Figure <ref type="figure">8</ref>. We note that since this objective is best suited for data with relatively equal proportions between the two groups, we subsample equal proportions of each gender for each of the three datasets, Adult, Census, and BlueBike. 700 725 750 775 800 Clustering cost 0.0 0.2 0.4 0.6 0.8 1.0 Proportional violation Adult Dyn Progr FCBC (a) Group Util 9000 10000 11000 Clustering cost 0.0 0.1 0.2 0.3 Proportional violation Census Dyn Progr FCBC (b) Group Util 410 420 430 440 Clustering cost 0.00 0.05 0.10 0.15 0.20 0.25 Proportional violation BlueBike Dyn Progr FCBC (c) Group Util 700 720 740 760 Clustering cost 0.0 0.2 0.4 0.6 Proportional violation Adult Dyn Progr FCBC (d) Group Egalit 9000 9500 10000 10500 Clustering cost 0.0 0.1 0.2 0.3 0.4 0.5 Proportional violation Census Dyn Progr FCBC (e) Group Egalit 410 420 430 Clustering cost 0.00 0.05 0.10 0.15 Proportional violation BlueBike Dyn Progr FCBC (f) Group Egalit Figure 7: Pareto front recovered by Algorithm 1 (labeled as 'Dyn Progr', in blue) and by Algorithm 2 (labeled as 'FCBC', in orange) for the Adult, Census, and BlueBike datasets (by column) and for the GROUP UTILITARIAN and GROUP EGALITARIAN objectives (by row), for k = 3 clusters. NeurIPS Paper Checklist 1. Claims Question: Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? Answer: [Yes]</p><p>Justification: We provide a clear summary of our results in the abstract and introduction, stating all conditions needed for the results to hold. We summarize the experiments presented in the paper, with details in the main section and the Appendix.</p><p>Guidelines:</p><p>&#8226; The answer NA means that the abstract and introduction do not include the claims made in the paper. &#8226; The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. &#8226; The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. &#8226; It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Limitations</head><p>Question: Does the paper discuss the limitations of the work performed by the authors?</p><p>Answer: [Yes]</p><p>Justification: The paper discusses all the necessary conditions for the results to hold. In particular, all our algorithms require that the first objective (the clustering objective) is based on a metric and that the second objective (the fairness objective) is pattern-based. Algorithm 1 also requires that the fairness objective is mergeable. We give theoretical bounds for the running time of all algorithms and we showcase the running time empirically for the algorithms present in the experimental section. We clearly state the datasets and parameters used. We discuss limitations in the discussion section in detail.</p><p>Guidelines:</p><p>&#8226; The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. &#8226; The authors are encouraged to create a separate "Limitations" section in their paper.</p><p>&#8226; The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. &#8226; The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. &#8226; The authors should reflect on the factors that influence the performance of the approach.</p><p>For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. &#8226; The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. &#8226; If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. &#8226; While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren't acknowledged in the paper. The authors should use their best</p><p>Answer: [No] Justification: The results do not require error bars, as there is always the same set of Pareto points recovered for the Pareto front for the assignment problem when using a standard vanilla k-means clustering algorithm (in our case, k-means++) as an input to our algorithms. We subsample the data once and regarded fixed from there on. Guidelines:</p><p>&#8226; The answer NA means that the paper does not include experiments.</p><p>&#8226; The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. &#8226; The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). &#8226; The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) &#8226; The assumptions made should be given (e.g., Normally distributed errors). &#8226; It should be clear whether the error bar is the standard deviation or the standard error of the mean. &#8226; It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. &#8226; For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g., negative error rates). &#8226; If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Experiments Compute Resources</head><p>Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: We provide details on the machines used to run the experiments, the libraries needed, as well as an empirical running time analysis in the paper, both in the Experiments section as well as in the Appendix. Guidelines:</p><p>&#8226; The answer NA means that the paper does not include experiments.</p><p>&#8226; The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. &#8226; The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. &#8226; The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn't make it into the paper).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9.">Code Of Ethics</head><p>Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics <ref type="url">https://neurips.cc/public/EthicsGuidelines</ref>? Answer: [Yes] Justification: The paper conforms to the NeurIPS Code of Ethics. In particular, the datasets used are open widely used in related work; we do not use human participants; we discuss societal implications of our method with the purpose of improving fairness in clustering. Guidelines:</p><p>&#8226; The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.</p><p>&#8226; If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. &#8226; The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10.">Broader Impacts</head><p>Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?</p><p>Answer: [Yes] Justification: We discuss the broader impact of our paper, in particular potential use cases by practitioners who wish to use our method for decision-making in choosing an optimal trade-off between clustering and fairness objectives, in cases where the data points represent people.</p><p>Guidelines:</p><p>&#8226; The answer NA means that there is no societal impact of the work performed.</p><p>&#8226; If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. &#8226; Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. &#8226; The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. &#8226; The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. &#8226; If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="11.">Safeguards</head><p>Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?</p><p>Answer: [NA] Justification: The paper poses no such risks.</p><p>Guidelines:</p><p>&#8226; The answer NA means that the paper poses no such risks.</p><p>&#8226; Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. &#8226; Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. &#8226; We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.</p><p>12. Licenses for existing assets</p><p>&#8226; According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: The paper does not involve crowdsourcing nor research with human subjects. Guidelines: &#8226; The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. &#8226; Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.</p><p>&#8226; We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution. &#8226; For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>We will treat f as a minimization objective, thus minimizing the unfairness of a clustering.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Each clustering C maps to a pattern P C , with many different clusterings mapping to the same pattern.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>The BlueBike trip history dataset is retrieved from https://bluebikes.com/system-data.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>All code and data used in the paper can be found at this link.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>In particular, &#951; is dependent on the fairness objective, on U, and on the specific instance of the fair clustering problem (see Theorem</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_5"><p>6.1 in Esmaeili et al.<ref type="bibr">[26]</ref>). A potential limitation of this approximation is that &#951; is not efficiently computable in closed-form for a particular instance.<ref type="bibr">6</ref> The FCBC algorithm uses linear programming through the simplex method, with a worst-case running time of O(2 n ); in practice, it is faster than that, as noted in the Appendix.</p></note>
		</body>
		</text>
</TEI>
