<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Honor Among Bandits: No-Regret Learning for Online Fair Division</title></titleStmt>
			<publicationStmt>
				<publisher>NeurIPS</publisher>
				<date>10/11/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10548397</idno>
					<idno type="doi"></idno>
					
					<author>Ariel D Procaccia</author><author>Ben Schiffer</author><author>Shirley Zhang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves Õ(T 2/3 ) regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Fair allocation of indivisible goods is a fundamental problem with a wide range of applications; implemented algorithms for this task have been widely used in practice <ref type="bibr">[14]</ref>. We consider the online fair division setting, which introduces additional complexities as items arrive one by one and each item must be immediately and irrevocably allocated at its time of arrival. Crucially, this allocation must be done without knowledge of future items <ref type="bibr">[5]</ref>. One motivating example for this setting is a food bank that receives donations for a region and then allocates these donations among many different food pantries and soup kitchens in that region. Donations are often perishable, and therefore must be immediately allocated. Furthermore, donations can be unpredictable, and hence knowledge of future items is limited.</p><p>Two standard notions of fairness are envy-freeness and proportionality. Envy-freeness implies that every player is at least as happy with their own allocation as with any other player's allocation. Intuitively, envy-freeness guarantees that no player will want to trade their allocation for that of another player. Proportionality is a slightly weaker notion, which requires only that each of the n players receive at least a 1/n fraction of their total value for all items. Finding a solution which is envy-free or proportional is often interesting in and of itself, as can be seen from many previous results in fair division <ref type="bibr">[7,</ref><ref type="bibr">28,</ref><ref type="bibr">12,</ref><ref type="bibr">3]</ref>. In cases where there may exist multiple envy-free or proportional allocations, however, a natural goal is then to find the best solution among such allocations <ref type="bibr">[13]</ref>. In our work, we evaluate the quality of a fair solution by its (utilitarian) social welfare, which is defined as the sum over all players of each player's value for their own allocation.</p><p>We take a probabilistic approach to analyzing online fair division. In particular, we assume that there are a finite number of item types, and each player's value for each type of item is drawn from a random distribution. In practice, these distributions would not be known in advance and must be learned as items are allocated. For example, consider again the food bank. When a new food pantry opens, the values of that food pantry for different types of products are unknown. After items have been allocated to the food pantry, however, the food bank can easily collect information on the demand for various item types at the food pantry. Therefore, we primarily consider the setting where the player distributions are unknown in advance, and a player's true value for an item is observed if and only if that player receives the item. This problem can be viewed as a variant of the multi-armed bandits problem, as the goal is to learn unknown distributions (player values) while maintaining high reward (social welfare), subject to fairness constraints; with a finite number of types of items, pulling an arm represents allocating a specific item type to a specific player.</p><p>As is standard in the multi-armed bandits literature, we use the notion of regret to measure the difference between our algorithm's performance and that of the optimal policy that knows the value distributions and is subject to the same fairness constraints. Our overarching challenge is this: design online allocation algorithms that achieve low regret while maintaining fairness in the form of envy-freeness or proportionality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Our Results</head><p>Our main result is that there exists a simple optimization-based explore-then-commit algorithm that achieves &#213;(T 2/3 ) regret and maintains envy-freeness in expectation (Algorithm 1 and Theorem 1). A variant of the same algorithm achieves &#213;(T 2/3 ) regret while maintaining proportionality in expectation. The key step of the algorithm is a linear program-based optimization that guarantees that the constraints are satisfied without significantly decreasing social welfare.</p><p>The main difficulty in this learning problem is that the envy-freeness and proportionality constraints depend on the unknown value distributions and may be tight constraints without any slack. We therefore develop novel machinery that relies on fundamental properties of these fairness notions. One observation is that our fairness constraints are always satisfied when players are treated equally. Another crucial property is that when players have unequal values, these fairness notions can be satisfied with slack (Property 2). The latter property is especially challenging to show for envyfreeness, and the combinatorial algorithm that achieves it (Lemma 1) should be of independent interest to researchers in fair division.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Related Work</head><p>Online Fair Division. Work in online fair division generally deals with dividing goods when there is uncertainty about the future. Early work in the area focused on axiomatic questions <ref type="bibr">[29,</ref><ref type="bibr">1]</ref>.</p><p>Our paper is most closely related to work by Benad&#232; et al. <ref type="bibr">[5]</ref>. Like us, they consider a setting where indivisible items arrive online and must be allocated immediately and irrevocably to players. They study several models for how the values of items are determined, ranging from a model where values are drawn i.i.d. from a distribution common to all players and items to, at the other extreme, an adversarial model with worst-case values. There are two fundamental differences between their work and ours. First, Benad&#232; et al. <ref type="bibr">[5]</ref> do not optimize social welfare; rather, they seek to either just minimize envy, or do so while (approximately) satisfying the axiomatic notion of Pareto efficiency. Second, and more crucially, they assume that the values of all players for an item are known at the time of its arrival, whereas in our model the values are unknown. It is precisely this modeling choice that induces a learning problem and underlies the connection between our setting and multi-armed bandits, which is absent in prior work in online fair division.</p><p>Fairness in Multi-armed Bandits. The other main body of literature related to our paper is multiarmed bandits with constraints. One notion of fairness in multi-armed bandits is the idea that similar individuals and/or groups should be treated similarly <ref type="bibr">[9,</ref><ref type="bibr">18,</ref><ref type="bibr">22]</ref>. The fairness constraint of Joseph et al. <ref type="bibr">[17]</ref> is that a worse arm is not pulled with higher probability than a better arm. Their definition of fairness is actually incompatible with maintaining envy-freeness (or proportionality), because maintaining envy-freeness may require allocating an item to a player with lower value to prevent envy. Another common fairness constraint in multi-armed bandits is that every arm receives a minimum fraction of pulls <ref type="bibr">[10,</ref><ref type="bibr">11,</ref><ref type="bibr">19,</ref><ref type="bibr">26]</ref>. This notion of fairness is also not compatible with envy-freeness because the optimal envy-free allocation may never give a player a specific item type. There also exist many other fairness notions in contextual bandits that are farther from our setting <ref type="bibr">[15,</ref><ref type="bibr">27,</ref><ref type="bibr">30,</ref><ref type="bibr">32]</ref>.</p><p>Wei et al. <ref type="bibr">[31]</ref> analyze a form of envy-freeness in contextual bandits, but their envy-freeness notion depends only on the treatment probabilities instead of the values.</p><p>Our paper is also closely related to work on multi-armed bandits subject to general linear constraints. Multiple works in linear bandits study "safety" with respect to a linear constraint that depends on the unknown true mean values <ref type="bibr">[2,</ref><ref type="bibr">8,</ref><ref type="bibr">24]</ref>. Amani et al. <ref type="bibr">[2]</ref> focus on a single constraint and specifically show that if there is positive slack in the optimal solution, then &#213;(T 1/2 ) regret is possible. If there is zero slack, however, their algorithm only achieves &#213;(T 2/3 ) regret. This differs from our work because envy-freeness and proportionality involve multiple constraints that can have zero slack. Note that our setting is similar but not equivalent to linear bandits, as a single arm is pulled in each step in our setting. There also exist many results for cumulative constraints in bandits <ref type="bibr">[20,</ref><ref type="bibr">21]</ref>. These are less closely related to our model as we consider constraints that must hold at every time step. Finally, there is a branch of multi-armed bandits that studies constraints in expectation at each step as in our paper. However, these works are also in the linear bandits setting and again require a safety gap that fairness constraints such as envy-freeness may not guarantee <ref type="bibr">[25]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Model</head><p>In this section we introduce our basic setting and terminology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Online Allocation With Unknown Values</head><p>Suppose we have a set of players N = [n] and a set of object types M = [m]. Given a set of T indivisible items, an allocation A = (A 1 , ..., A n ) is a partition of the T items among the n players, where player i receives the items in A i . In our model, we assume that every item j has a type k(j) &#8712; [m], and that there exists a (possibly unknown) matrix &#181; * such that each player i's value for an item of type k &#8712; [m] is independently drawn from a sub-Gaussian distribution with mean &#181; * ik . This distribution is independent of both player i's distributions for other item types and other players' distributions. For a specific item j, we denote player i's value for item j as V i (j), and similarly, player i's value for their allocation</p><p>We consider algorithms in the following online setting. At each time step t &#8712; [T ], an item j t of type k t arrives, where k t &#8764; D, for some known distribution D supported on [m]. We will assume that D = Unif([m]), or in other words that every item has an equal probability of being type 1, ..., m. We make this choice purely for ease of exposition, in order to simplify notation; our results and techniques extend seamlessly to arbitrary distributions D that do not depend on T , as we explain in Appendix C.1. The algorithm observes the item type k t , and must then immediately allocate the item j t to a player i t , at which time the algorithm observes that player's value V it (j t ). Note that the algorithm does not observe any other player's value for item j t . The high-level goal is to allocate these items in a manner that maximizes the social welfare of the final allocation of all T items. We denote X &#8712; R n&#215;m as a valid fractional allocation if i X ik = 1 for every k &#8712; [m]. One valid fractional allocation we will often refer to is the uniform at random allocation (UAR), where every entry is 1 n . At every time-step t before observing k t , the allocation algorithm ALG takes as input the history</p><p>t} and returns a fractional allocation X t = ALG(H t ), where (X t ) ik represents the probability of allocating the item to player i if the item is of type k. If the next item is of type k t , then the algorithm allocates the item randomly among the n players according to the distribution induced by the k th t column of X t , i.e. (X &#8868; t ) kt . Therefore, the t th item is allocated to player i with probability (X t ) ikt . We denote the final realized allocation that ALG returns as A(ALG), and the corresponding partial allocation up to time &#964; as A &#964; (ALG). This online process is summarized in pseudo-code in Appendix A.</p><p>We will also assume (explicitly in our theorem statements) that for all i, k, there exist known constants a, b &gt; 0 such that a &#8804; &#181; * ik &#8804; b. Intuitively, this is equivalent to assuming that players have non-zero values for all item types and that their values have a finite upper bound. This assumption is necessary because if we allow the means of values to be arbitrary close to zero, then it can be impossible to achieve regret of o(T ). This is formalized in Theorem 10 in Appendix C.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Fairness Notions</head><p>We will primarily use two metrics of fairness to evaluate an online allocation algorithm ALG: envyfreeness in expectation and proportionality in expectation. Both are defined below. For two vectors x, y &#8712; R n , we use &#10216;x, y&#10217; = x &#8226; y to represent the dot product of the two vectors. Definition 1. Let X t = ALG(H t ) be the fractional allocation used by algorithm ALG at time t given history H t . Then ALG satisfies envy-freeness in expectation (EFE) if for all t and all H t ,</p><p>Let X t = ALG(H t ) be the fractional allocation used by algorithm ALG at time t given history H t . Then ALG satisfies proportionality in expectation (PE) if for all t and all histories</p><p>Intuitively, envy-freeness in expectation is equivalent to maintaining that at every time step t and before observing the item type k t , no player prefers the fractional allocation of any other player in X t . Similarly, proportionality in expectation is equivalent to maintaining that at every time step t and before observing the item type k t , the expected value of every player for their fractional allocation is at least 1/n times that player's value if they received the item with probability 1.</p><p>In Appendix B, we justify some of the implicit choices behind these definitions. Specifically, we discuss why we consider envy-freeness in expectation rather than its realization, and also why we require envy-freeness in expectation to hold at every individual time step. Analogous results for proportionality can be found in Appendix B.1. For the former question, Theorem 2 shows that in our setting, no algorithm can with high probability output an allocation A(ALG) with realized envy less than &#8730; T . Note that Benad&#232; et al. <ref type="bibr">[5]</ref> show that in the adversarial setting, no algorithm can guarantee o( &#8730; T ) realized envy. Conversely, they also show that when values are generated randomly and observed before allocation, there exists an algorithm that can guarantee o(1) realized envy with high probability. Theorem 2 shows that when values are still generated randomly but are unknown at the time of allocation (as in our setting), no algorithm can guarantee o( &#8730; T ) realized envy with high probability. We complement Theorem 2 with Theorem 3, which shows that any algorithm ALG that satisfies envy-freeness in expectation will output a final allocation A(ALG) with realized envy of at most</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8730;</head><p>T log(T ) with high probability. Therefore, envy-free in expectation algorithms are within a log(T ) factor of being "optimal" in terms of final realized envy.</p><p>We also show that requiring envy-freeness in expectation at every time step does not lead to any social welfare loss compared to requiring envy-freeness in expectation only at the end of T rounds. More specifically, Theorem 4 (again in Appendix B) implies that requiring that no player is envious in expectation of any other player at the end of all T rounds is equivalent to maintaining envy-freeness in expectation at all times t &#8712; [T ] when maximizing social welfare. A key step of our proof of Theorem 4 is showing that for every time-or history-dependent algorithm ALG which achieves envy-freeness in expectation at the end of T rounds, there exists another algorithm ALG &#8242; that is time-and history-independent, envy-free in expectation at every time step, and achieves the same social welfare. Therefore, maximizing social welfare only over algorithms which are envy-free in expectation at every time step is sufficient even if envy-freeness in expectation at the end of T rounds is all that is desired.</p><p>We can formulate our fairness notions as linear constraints, in the spirit of prior work in fair division <ref type="bibr">[4]</ref>. Formally, define &#10216;A, B&#10217; F as the Frobenius inner product of matrices A and B. For B &#8712; R n&#215;m , c &#8712; R, and a fractional allocation X, we represent the linear constraint &#10216;B, X&#10217; F &#8805; c as the tuple (B, c). A fractional allocation X satisfies a set of L linear constraints {(B &#8467; , c &#8467; )} L &#8467;=1 if for all &#8467; &#8712; [L], &#10216;B &#8467; , X&#10217; F &#8805; c &#8467; . Because the constraints represent "fairness in expectation" relative to the mean values, we will explicitly let the constraint matrix B &#8467; (&#181; * ) be a function of the mean value matrix &#181; * . Therefore, we will consider sets of constraints of the form {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 . Because these constraints are functions of &#181;, we will also refer to families of constraints {(B &#8467; (&#181;), c &#8467; )} L &#8467;=1 &#181; .</p><p>The following two remarks show how envy-freeness in expectation and proportionality in expectation can be represented within this framework.</p><p>Then the envy-freeness in expectation constraints for mean &#181; * as defined in Definition 1 correspond to efe(&#181; * ) := {(B efe &#8467; (&#181; * ), 0)} n 2 &#8467;=1 .</p><p>Remark 2. For every &#8467; &#8712; [n], construct B pe &#8467; (&#181; * ) as follows. For every k &#8712; [m], (B pe &#8467; (&#181; * )) &#8467;k =</p><p>(n-1) n</p><p>&#8226; &#181; * &#8467;k and (B pe &#8467; (&#181; * )) ik = -1 n &#8226; &#181; * &#8467;k for every i &#824; = &#8467;. Then the proportionality in expectation constraints for mean &#181; * as defined in Definition 2 correspond to pe(&#181; * ) = {(B pe &#8467; (&#181; * ), 0)} n &#8467;=1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Regret and Problem Formulation</head><p>An algorithm ALG satisfies constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 if for all t &#8712; [T ], the fractional allocation X t used by ALG at time t satisfies the constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 . When &#181; * is known, the expected social welfare can be directly optimized over all algorithms ALG that satisfy constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 . This problem is equivalent to solving LP (1) with &#181; = &#181; * and using the solution Y &#181; * as the fractional allocation for all time steps.</p><p>When &#181; * is unknown, we define the regret of an algorithm ALG as follows. Note that the baseline algorithm in this definition of regret is the optimal allocation algorithm under the constraints when &#181; * is known. Definition 3. Let Y &#181; * be the solution to LP (1) for &#181; = &#181; * . Let X t = ALG(H t ) be the fractional allocation used by algorithm ALG at time t given history H t . Then the T -step regret of ALG for constraints</p><p>We are now ready to present the formal problem statement. Problem 1. Suppose we are given n, m, T, a, b such that 0</p><p>. Given a family of fairness constraints {(B &#8467; (&#181;), c &#8467; )} L &#8467;=1 &#181; representing either envy-freeness in expectation or proportionality in expectation, the goal is to construct an algorithm ALG such that with probability 1 -1/T , X t = ALG(H t ) satisfies the constraints (B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 for all t &#8712; [T ] and the regret of ALG for constraints (B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 is o(T ). Note that the o(T ) regret in Problem 1 will be &#213;(T 2/3 ) for our results. We use the standard O() and &#213;() notation with respect to the number of time steps T , and therefore the constants represented by this notation may depend on other problem parameters such as n and m.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Fairness Machinery</head><p>Our goal in this section is to establish novel, fundamental properties of envy-freeness and proportionality in expectation, which will serve as a crucial part of the machinery underlying our regret bounds.</p><p>In the context of fairness, a natural assumption is that if a group of individuals are treated equally, then that group is considered to be treated fairly. In that spirit, our first key property is as follows. Property 1. For any &#8467; &#8712;</p><p>Informally, a set of constraints satisfies Property 1 if for any constraint in the set, the constraint is always satisfied when all players involved in the constraint have the same fractional allocation. An important consequence of Property 1 is that the uniform at random allocation satisfies every constraint. This implies that even with no information about the players' mean values, the uniform at random allocation will always be fair.</p><p>Note that envy-freeness in expectation satisfies Property 1 because any two players with equal allocations are never envious of each other. Proportionality in expectation also satisfies Property 1, because if every player has the same allocation, then every player is receiving exactly their proportional allocation. Observation 1. The envy-freeness in expectation and proportionality in expectation constraints satisfy Property 1.</p><p>Our second key property is more technical and novel. Intuitively, the property requires the existence of a fractional allocation X &#8242; that is only slightly worse than the optimal constrained allocation Y &#181; , but unlike the latter allocation, in X &#8242; the constraints either hold with slack or all players involved in the constraint are treated equally. Interestingly, this property does not hold for arbitrary sets of linear constraints, but relies on structure inherent to the envy-freeness in expectation and proportionality in expectation constraints. The bulk of the theoretical work of this paper is proving that the envy-freeness in expectation and proportionality in expectation constraints satisfy this property. Property 2. Let Y &#181; be the solution to LP <ref type="bibr">(1)</ref>. Then there exists a &#947; 0 such that for any &#947; &lt; &#947; 0 and any &#181;, there exists a fractional allocation X &#8242; such that &#10216;X &#8242; , &#181;&#10217; F &#8805; &#10216;Y &#181; , &#181;&#10217; F -C P2 &#947; for constant C P2 &gt; 0, and such that for each &#8467; &#8712; [L], either</p><p>The envy-freeness in expectation constraints satisfy Property 2.</p><p>Proof sketch. We will informally argue that we can transform Y &#181; into a fractional allocation X &#8242; which satisfies Property 2 through Algorithm 3. Algorithm 3 iterates over 'envy-with-slack-&#945;' graphs, which track whether a player prefers their allocation by at least &#945; over another player's allocation. More specifically, given parameters &#181;, X, and &#945;, the corresponding 'envy-with-slack' graph has vertices N and edge set E such that a directed edge from i to i &#8242; exists if and only if</p><p>At a high level, Algorithm 3 constructs 'envy-with-slack-&#945;' graphs with progressively smaller &#945;, with &#945; &#8805; &#947; for all iterations. The algorithm operates on sets of nodes called equivalence classes, where every pair of nodes in an equivalence class has the same allocation. Algorithm 3 makes progress in every iteration by either 1) merging two equivalence classes, or 2) removing an edge from the graph.</p><p>An overview of the algorithm is as follows. In each iteration, Algorithm 3 generally performs one of three operations and decreases &#945;. First, if there exists an equivalence class S with at least one incoming edge but no outgoing edges, then operation remove-incoming-edge transfers allocation probability from nodes in S to all other nodes. This will remove all incoming edges to S. If such an equivalence class does not exist, then Algorithm 3 finds a special type of directed cycle in the 'envy-with-slack' graph. The directed cycle is chosen so that the outgoing edge of each node i in the cycle is among i's outgoing edges with minimal weight. Therefore, each node i in the cycle is pointing to an i &#8242; &#8712; N for whom i has the least slack. If there exists some node i * &#8712; N which has an edge to some but not all of the nodes in the cycle, then operation cycle-shift gives each node in the cycle half of its current allocation and half of the next node's allocation. This will remove an outgoing edge from i * . If such a node does not exist, then Algorithm 3 either decreases &#945; to remove an edge or creates a new equivalence class by merging all equivalence classes that the nodes in the cycle belong to via operation average-clique.</p><p>However, such a merge may lead to envy, which is removed by Algorithm 4. Each call to Algorithm 4 removes envy from at least one edge. Algorithm 4 does so by first carefully redistributing allocation among the nodes until there exists a cycle where each node has non-negative envy (which is equivalent to a cycle with non-positive slack). Each node in the cycle is then given the allocation of the next node in the cycle. We prove that each call to Algorithm 4 decreases the number of edges with envy, while not increasing the number of edges in the 'envy-with-slack' graph. Furthermore, Algorithm 4 does not significantly decrease the social welfare of the allocation.</p><p>The three operations and Algorithm 4 each take as input an allocation X and returns a new allocation X &#8242; which is close in social welfare to X. Furthermore, each iteration begins with an envy-free allocation, and the size of the edge set of the 'envy-with-slack' graphs never increases throughout the algorithm. The maximum size of an equivalence class is n, so an edge must be removed from the 'envy-with-slack' graph every n iterations. There are at most n 2 edges, and the algorithm therefore terminates in at most n 3 iterations with an allocation which satisfies Property 2. For the numerous details, see Appendix F. Lemma 2. The proportionality in expectation constraints satisfy Property 2.</p><p>Proof sketch. Define the slack S i of a player i for an allocation as the amount by which that player's value for their allocation is greater than their proportional value. In other words, the slack is the amount of welfare a player can lose and still satisfy the proportionality constraint. We construct the fractional allocation X &#8242; in one of two different ways depending on the amount of total slack for the allocation Y &#181; across all n players.</p><p>If the amount of total slack across all n players is less than bn&#947; a , then we take X &#8242; = UAR. Note that the total slack is equivalent to the change in social welfare between Y &#181; and UAR. Therefore, because the total slack was less than bn&#947; a , the difference in social welfare between Y &#181; and UAR is at most bn&#947; a which is O(&#947;). Furthermore, by definition the UAR allocation satisfies option 2 of Property 2 for all constraints &#8467;.</p><p>If the amount of total slack is greater than bn&#947; a , then we construct X &#8242; from Y &#181; by transferring allocation away from players with slack greater than the required &#947; and redistributing this allocation so that every player has slack of at least &#947;. Specifically, each player i loses &#8710; ik of their allocation for item k, where</p><p>The allocation X &#8242; is then constructed as</p><p>Intuitively, this can be viewed as each player putting a part of their allocation (proportional to S i &#8226; &#947;) into a communal "pot." The pot, consisting of</p><p>for item k, is then divided evenly among all n players to form X &#8242; . By construction, no player loses more than S i social welfare when the pot is created, and every player receives at least &#947; additional social welfare when the pot is redistributed. Therefore, in the resulting allocation X &#8242; , every player prefers their allocation to their proportional value by at least &#947;, i.e. each player has a slack of at least &#947; for X &#8242; . Furthermore, the total difference in social welfare between Y &#181; and X &#8242; is at most O(&#947;). We have thus shown that in both cases, X &#8242; will satisfy all of the desired conditions. The full proof is relegated to Appendix E.</p><p>It will be useful to introduce two further properties that are immediately satisfied by the definitions of envy-freeness in expectation and proportionality in expectation. Property 3 guarantees a form of Lipschitz continuity in &#181; for the constraints. This is unsurprising, as the entries in the matrices B &#8467; (&#181;) for both envy-freeness in expectation and proportionality in expectation are linear in the entries of &#181;. Property 4 guarantees that the non-zero entries in the constraint matrices stay the same for all values of &#181;, which follows directly from Remarks 1 and 2 and the fact that &#181; ik is bounded away from 0. Property 3. There exists a K &gt; 0 such that</p><p>Observation 2. The envy-freeness in expectation and proportionality in expectation constraints satisfy Property 3.</p><p>Observation 3. The envy-freeness in expectation and proportionality in expectation constraints satisfy Property 4.</p><p>Recall that Property 2 implies that for every constraint &#8467;, either the constraint &#8467; has a slack of at least &#947; for X &#8242; or every player involved in constraint &#8467; is treated equally under allocation X &#8242; . A slack of &#947; in the constraint guarantees constraint satisfaction for all &#181; &#8242; close to &#181; if the constraints are continuous in &#181;. Treating every player equally for a given constraint also guarantees that the constraints are satisfied for all &#181; &#8242; by Property 1. Therefore, Properties 1 and 2 together with continuity (Property 3) imply that there exists a fractional allocation X &#8242; such that the social welfare of X &#8242; is close to the social welfare of Y &#181; and such that X &#8242; not only satisfies the constraints for &#181;, but also satisfies the constraints for any &#181; &#8242; close to &#181;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Algorithm and Regret Bounds</head><p>In this section, we present our main result, an explore-then-commit algorithm which achieves &#213;(T 2/3 ) regret while maintaining either proportionality in expectation or envy-freeness in expectation. The key step in Algorithm 1 is the optimization in LP (3) to guarantee that the fairness constraints are satisfied with high probability. For &#181; &#8712; R n&#215;m + and &#1013; &#8712; R n&#215;m + , we define the confidence region</p><p>for t &#8592; T 2/3 to T -1 do At time t, use fractional allocation X t = X. end for Proof sketch. We have already shown in Section 3 that both envy-freeness in expectation and proportionality in expectation satisfy Properties 1, 2, 3, and 4. Therefore, it suffices to show that Algorithm 1 achieves &#213;(T 2/3 ) regret for any family of constraints {{(B &#8467; (&#181;), c &#8467; )} L &#8467;=1 } &#181; that satisfies Properties 1, 2, 3, and 4.</p><p>The allocations used during the warm-up steps of Algorithm 1 are uniform at random, and therefore these allocations satisfy the constraints {(B &#8467; (&#181;), c &#8467; )} L &#8467;=1 for all &#181; by Property 1. Because the fractional allocations used in the first T 2/3 steps are all UAR, each arm, or (player, item) pair, will be sampled with probability 1  nm at each step. This implies by Hoeffding's inequality that with high probability, N ik = &#937;(T 2/3 ) for all i, k. The i, k entry in the &#1013; matrix is proportional to</p><p>, and therefore &#8741;&#1013;&#8741; 1 = &#213;(T -1/3 ) with high probability. Because the value distributions are sub-Gaussian, a standard application of Hoeffding's inequality also gives that with high probability, the true mean matrix will be within our confidence region, i.e. &#181; * &#8712; &#956; &#177; &#1013;. To summarize, because we used UAR for the first T 2/3 steps, we have that</p><p>For the rest of the proof we will assume that the high probability event in the equation above holds. The next step is to show that X has &#8741;&#1013;&#8741; 1 per-step regret compared to Y &#181; * . Let K be the Lipschitz constant of Property 3. Using Property 2 with &#181; = &#181; * and &#947; = 2K&#8741;&#1013;&#8741; 1 = &#213;(T -1/3 ) gives that there exists an allocation X &#8242; such that the social welfare of X &#8242; is only O(&#8741;&#1013;&#8741; 1 ) less than the social welfare of the optimal allocation Y &#181; * and such that every constraint &#8467; either has slack of at least 2K&#8741;&#1013;&#8741; 1 (option 1 of Property 2) or every player is treated equally in constraint &#8467; (option 2 of Property 2). We will now show that X &#8242; is a solution to LP (3). If option 1 holds for constraint &#8467; &#8712; [L], then by Property 3, X &#8242; will satisfy the constraint (B &#8467; (&#181;), c &#8467; ) for every &#181; &#8712; &#956; &#177; &#1013;. Formally, if option 1 holds for constraint &#8467;, then for any &#181; &#8712; &#956; &#177; &#1013;,</p><p>[Property 2: option 1]</p><p>If option 2 holds for constraint &#8467; &#8712; [L], then Properties 1 and 4 together guarantee that X &#8242; will satisfy the constraint (B &#8467; (&#181;), c &#8467; ) for every &#181; &#8712; &#956; &#177; &#1013;. Therefore, X &#8242; will satisfy all of the constraints {B &#8467; (&#181;), c &#8467; } L &#8467;=1 for every &#181; &#8712; &#956; &#177; &#1013;, which implies that X &#8242; is a solution to LP (3). Finally, because X is the optimal solution to LP (3), X must have higher social welfare than X &#8242; under means &#956;. Because &#8741;&#181; * -&#956;&#8741; 1 &#8804; &#8741;&#1013;&#8741; 1 , this implies that X must have at most O(&#8741;&#1013;&#8741; 1 ) less social welfare than X &#8242; under the true means &#181; * . An application of the triangle inequality gives that,</p><p>Thus, the total regret for the steps after the warm-up period is &#213;(T 2/3 ). The regret of the warm-up period is at most (b -a)T 2/3 due to the assumption that the mean values are bounded. We can therefore conclude that the total regret is &#213;(T 2/3 ), and this completes the proof of regret. Finally, we note that by construction of LP (3), if &#181; * &#8712; &#956; &#177; &#1013; then the chosen fractional allocations X must also satisfy the constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 as desired. See Appendix D for the full proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>We conclude by discussing some limitations and open questions. First, Algorithm 1 involves solving a linear program with an infinite number of linear constraints. Linear programs with an infinite number of constraints (called semi-infinite programs) are well-studied and occur in many applications <ref type="bibr">[16,</ref><ref type="bibr">23]</ref>.</p><p>We also note that a finite number of (exponentially many) constraints suffices for envy-freeness in expectation and proportionality in expectation by bounding all of the possible extreme values of &#1013;. Nevertheless, we opted to avoid this representation because it significantly complicates the presentation of the algorithm. Furthermore, there also exists a polynomial time separation oracle for determining whether an allocation satisfies the infinitely many constraints, which would allow techniques such as the Ellipsoid Method <ref type="bibr">[6]</ref> to solve the linear program in polynomial time.</p><p>Second, while the regret coefficients for proportionality are polynomial in n, a practical limitation of Algorithm 1 for envy-freeness is that the &#213; term is exponential in n. We expect, however, that the worst-case bound we present in Lemma 1 is far from tight. Whether there exists a bound on the regret that is polynomial in n for learning under envy-freeness in expectation constraints is an open question for future work.</p><p>The other natural question that remains open for future work is whether we can achieve &#213;( &#8730; T ) regret while maintaining envy-freeness in expectation or proportionality in expectation. If the optimal solution Y &#181; * has a positive slack in every constraint, then an upper confidence bound (UCB) approach would be likely to work. Unfortunately, the fairness constraints for envy-freeness in expectation and proportionality in expectation are often tight for the optimal allocation. Furthermore, the constraints have a constant (greater than 1/n) dependence on every unknown value in the &#181; * matrix. Therefore, every mean value &#181; * ik might need to be learned with high accuracy even if the optimal allocation does not allocate item type k to player i.</p><p>We also note that there exist additional (albeit less prominent) fairness notions for the problem of online fair division, such as equitability, which may satisfy additional properties that allow for lower regret. We leave the question of studying more fairness notions for future work.</p><p>Finally, a broader question is whether the connection we have established between multi-armed bandits and online fair division might be leveraged to give a fresh perspective on additional problems in this area, such as online cake cutting <ref type="bibr">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Algorithmic Representation of Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B Motivating Fairness in Expectation</head><p>In this section, we will explore the relationship between envy-freeness in expectation and realized envy, as well as the relationship between requiring envy-freeness in expectation at every time step and only requiring envy-freeness in expectation at the end of round T . For this section, we will use the following notation. For any algorithm ALG, we denote X t = ALG(H t ) as the fractional allocation used at time t, and Pr ALG (i, k, H t ) := (X t ) ik .</p><p>Previous works on fair online allocation of indivisible goods have focused on the fairness of the final, realized allocation instead of studying fairness in expectation as in Definitions 1 and 2 <ref type="bibr">[5]</ref>. We define the realized envy of an allocation below. Definition 4. The realized envy of allocation A at time</p><p>We show in the following theorems that algorithms which are envy-free in expectation are within a log(T ) factor of being "optimal" in terms of final realized envy. Informally, Theorem 2 shows that in our setting, no algorithm can with high probability output an allocation A(ALG) with realized envy less than &#8730; t. Conversely, Theorem 3 shows that any algorithm ALG that satisfies envy-freeness in expectation will output a final allocation A(ALG) with realized envy of at most</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8730;</head><p>T log(T ) with high probability. Theorem 2. Suppose &#181; * is known. For any algorithm ALG and for any &#964; &#8712; [T ], with probability at least 1/16 the allocation A &#964; (ALG) has realized envy of more than &#8730; &#964; .</p><p>proof. Assume there are two players and only one item type, and assume that all values are drawn from N (&#181;, 1). Fix &#964; &#8805; log 2 (T ). As in Algorithm 2, define i t &#8712; {1, 2} as the player allocated the item at time t. The realized envy of the two players can be written as:</p><p>Realized Envy of Player</p><p>Realized Envy of Player 2 at time &#964; = &#964; t=0</p><p>Let v a i be the value of the assigned player for item i, and v u i be the value of the unassigned player for item i. Note that v a i , v u i &#8764; N (&#181;, 1), because at the time of allocation, ALG does not know either player's value for the item, and player values are therefore independent of the assignment.</p><p>By this coupling argument, Equations ( <ref type="formula">4</ref>) and ( <ref type="formula">5</ref>) can be rewritten as:</p><p>By the Central Limit Theorem, &#964; i=0 v a i and &#964; i=0 v u i are both N (&#964; &#8226; &#181;, &#964; ). Therefore, for any &#964; , with probability at least 1/16, we have that</p><p>Putting these equations together, we have that</p><p>However, this result with Equations ( <ref type="formula">6</ref>) and <ref type="bibr">(7)</ref> implies that the envy must be at least &#8730; &#964; for any choice of {i t }. Therefore, we have shown that with probability at least 1/16, the envy at time &#964; will be at least &#8730; &#964; for any possible algorithm.</p><p>Theorem 3. Suppose &#181; * is known. Also assume that all of the value distributions are bounded by a constant B. If ALG is deterministic (i.e. ALG(H t ) is a deterministic function of H t ) and satisfies envy-freeness in expectation, then with probability 1 -o(1/T ), the realized envy of A &#964; (ALG) at every time &#964; &#8712; [T ] is at most &#8730; &#964; log(T ).</p><p>proof. We will bound the realized envy of A(ALG) between any two specific players i, i &#8242; , which will then imply a bound on the realized envy of A(ALG) by a union bound. The key observation is that each round of Algorithm 2 consists of first a random draw from D to determine the item type k t and then a draw from &#958; t &#8764; Unif([0, 1]) which determines the player to which the item is allocated based on X t = ALG(H t ). Formally, the t th item is allocated to player i if</p><p>Define {v ti &#8242;&#8242; } n i &#8242;&#8242; =1 as the values of the players for the item at time t. This is the final source of randomness in round t. Therefore, the allocation of any player at time &#964; is a function of the random variable sequence {(k t , &#958; t , {v ti &#8242;&#8242; } n i &#8242;&#8242; =1 )} &#964; -1 t=0 . Let E &#964; represent the envy accrued by player i for player i &#8242; up until but not including time &#964; . Then</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Now we will apply McDiarmid's inequality to the function</head><p>Therefore, we can apply McDiarmid's inequality to get that</p><p>&#8804; T -log(T )/(2B 2 ) .</p><p>Since ALG is envy-free in expectation, we know that</p><p>Therefore, we must have that</p><p>Taking a union bound over all pairs i, i &#8242; and all &#964; &#8712; [T ], we have that</p><p>Therefore, with probability 1 -o(1/T ), the realized envy at evert time &#964; is at most &#8730; &#964; log(T ).</p><p>Note that Theorem 2 does not contradict the results of Benad&#232; et al. <ref type="bibr">[5]</ref>, who achieve envy-freeness of the realized allocation with high probability when the true player values for the items are known at time of allocation (as opposed to our model which only knows the item type). When player values for item j t are known before assignment, Theorem 2 does not apply.</p><p>We also defined envy-freeness in expectation as a constraint which needs to hold at every time step t.</p><p>In some fair division applications, we may only care about "fairness" of the total allocation at the end of the process. Therefore, we could instead only require that no player is envious in expectation of any other player at the end of all T rounds. However, Theorem 4 shows that this is equivalent to maintaining envy-freeness in expectation at all times t &#8712; [T ] when maximizing social welfare. Theorem 4. Suppose &#181; * is known. Let E be the class of all algorithms that are envy-free in expectation and let F be the class of all algorithms which satisfy</p><p>proof. By definition, E &#8838; F , which proves one direction of the desired equality. We will now show that for any ALG &#8712; F , there exists an algorithm ALG &#8242; &#8712; E such that</p><p>First, observe that by the definition of F , we have that for all i, i &#8242; ,</p><p>By linearity of expectation, Equation ( <ref type="formula">8</ref>) is equivalent to</p><p>Furthermore, by definition</p><p>We will construct the algorithm ALG &#8242; as follows. Suppose ALG &#8242; is time-independent and historyindependent, such that for all t, H t ,</p><p>The expected social welfare of ALG &#8242; is</p><p>Therefore, ALG and ALG &#8242; have the same expected social welfare. Finally, we need to show that ALG &#8242; &#8712; E. This is equivalent to showing that for all t and H t ,</p><p>Starting with the LHS and plugging in the definition of ALG &#8242; , we have that</p><p>as desired. Therefore, we have shown that ALG &#8242; &#8712; E.</p><p>Theorem 5. The algorithm which maximizes expected social welfare subject to EFE up to time T is time-independent, history-independent, and can be calculated in polynomial time.</p><p>proof. By Theorem 4, there exists an optimal algorithm that satisfies envy-freeness in expectation and that is time-independent and history-independent. To find the best such fractional allocation, all we must do is solve LP (1) with the envy-freeness in expectation constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.1 Proportionality</head><p>Theorems 2-5 also have equivalent forms for proportionality. We define the realized proportionality gap as the equivalent of envy for the proportionality constraints. This implies that a proportional allocation has non-positive proportionality gap. Definition 5. The realized proportionality gap of allocation A at time &#964; is max i</p><p>As in Theorems 2 and 3 , the following two results give that algorithms which satisfy proportionality in expectation are within a log(T ) factor of optimal for the realized proportionality gap. Theorem 6. Suppose &#181; * is known. For any algorithm ALG and for any &#964; &#8712; [T ], with probability at least 1/16 the allocation A &#964; (ALG) has realized proportionality gap of more than &#8730; &#964; .</p><p>proof. As in the proof of Theorem 2, assume there are two players and only one item type, and assume that all values are drawn from N (&#181;, 1). Then the realized proportionality gap of the two players can be written as:</p><p>Realized proportionality gap of Player</p><p>Realized proportionality gap of Player 2 at time &#964; = 1 2 &#964; t=0</p><p>) These equations only differ from those of realized envy by a scalar factor, and therefore the rest of the proof follows exactly as in Theorem 2.</p><p>Theorem 7. Suppose &#181; * is known. Also assume that all of the value distributions are bounded by a constant B. If ALG is deterministic (i.e. ALG(H t ) is a deterministic function of H t ) and satisfies proportionality in expectation, then with probability 1 -o(1/T ), the realized proportionality gap of A &#964; (ALG) at every time &#964; &#8712; [T ] is at most &#8730; &#964; log(T ).</p><p>proof. In this proof, we can let E t be the accrued "proportionality gap" of any player i. Then as in Theorem 3, an application of McDiarmid's inequality allows us to bound the realized proportionality gap with high probability for all &#964; .</p><p>The following two theorems are analogs of Theorem 4 and Theorem 5 for envy-freeness in expectation. Together, these theorems imply that maximizing social welfare subject to proportionality in expectation at every time step is equivalent to maximizing social welfare subject to proportionality in expectation only at the end of round T . Theorem 8. Suppose &#181; * is known. Let E be the class of all algorithms that are proportional in expectation and let F be the class of all algorithms which satisfy</p><p>proof. Suppose ALG &#8712; F . We will define ALG &#8242; as in Theorem 4 to be</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>By this construction, ALG</head><p>Because E &#8838; F and because we showed in Theorem 4 that ALG &#8242; and ALG have the same expected social welfare, the desired result follows.</p><p>Theorem 9. The algorithm which maximizes expected social welfare subject to PE up to time T is time-independent, history-independent, and can be calculated in polynomial time.</p><p>proof. By Theorem 8, there exists an optimal algorithm that satisfies proportionality in expectation that is time-independent and history-independent. To find the best such fractional allocation, all we must do is solve LP (1) with the proportionality in expectation constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Additional Model Notes C.1 Choice of D</head><p>In the body of the paper, we focus on the case when D is the uniform distribution over item types. However, our results generalize to any distribution D which does not depend on T . In this case, the social welfare of a fractional allocation X becomes i,k</p><p>For mean values &#181; * , define &#181; &#8242; = f D (&#181; * ). Then social welfare of a fractional allocation X with means &#181; * and distribution D is then simply 1 n &#10216;X, &#181; &#8242; &#10217; F as in the uniform D case. Similarly, the envy-freeness in expectation or proportionality in expectation constraints on X with means &#181; * and item distribution D can be represented as &#10216;B &#8467; (&#181; &#8242; ), X&#10217; F &#8805; c &#8467; , which is an equivalent form to the constraints when D is uniform. Therefore, for arbitrary D we can use Algorithm 1 with only two slight modifications. The first is we must transform the &#956;, &#181;, and other components of the linear programs using the function f D . The second modification is that we potentially need more exploration steps (larger T ) in the warm-up period to guarantee the same level of estimation of &#181; * , depending on the value of min i,k Pr D (k). However, since we require that D does not depend on T , this will not change the overall regret of the algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2 Lower Bound on Means</head><p>In this section, we show that if the means of player values can be arbitrarily close to zero, then it can be impossible to achieve a regret of o(T ). Theorem 10. For &#1013; &gt; 0, there does not exist an algorithm ALG such that for any possible &#181; * &#8712; [0, &#1013;] n&#215;m , for every t the fractional allocation X t chosen by ALG satisfies envy-freeness in expectation with probability greater than 1/2 and the regret of ALG is o(T ). The same result also holds for proportionality in expectation. proof. W.l.o.g. assume that &#1013; = 1. The proof also holds for any other constant &#1013;. Suppose the underlying value distributions are Bernoulli, that we have two players and two item types, and assume T &#8805; 2. We will consider two cases for &#181; * and show that no algorithm can with probability greater than 1/2 satisfy the constraints and have regret of o(T ) for both of these cases of &#181; * . First, let</p><p>If &#181; * = &#181; 1 or &#181; * = &#181; 2 , then player 1 will not have a realized value of 1 for any of the T items with probability at least 1/2. Therefore, no algorithm can differentiate between &#181; * = &#181; 1 and &#181; * = &#181; 2 with probability at least 1/2. The only fractional allocation that is envy-free for both &#181; * = &#181; 1 and &#181; * = &#181; 2 is the uniform at random allocation. However, this allocation has regret of &#8486;(T ) for &#181; * = &#181; 2 , as the best fractional allocation when &#181; * = &#181; 2 is the fractional allocation</p><p>This proves the desired result that no algorithm can be envy-free for &#181; * and have o(T ) regret for both possible realizations of &#181; * . Similarly, the uniform at random allocation is the only proportional allocation in this example, and therefore the same result holds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Proof of Theorem 1</head><p>Observations 1, 2, and 3 give that the proportionality in expectation constraints and the envy-freeness in expectation constraints satisfy Properties 1, 3, and 4 respectively. Lemmas 1 and 2 respectively give that the proportionality in expectation constraints and the envy-freeness in expectation constraints satisfy Property 2. These results combined with Lemma 3 directly prove Theorem 1.</p><p>] n&#215;m be a family of constraints that satisfy Properties 1, 3, 4, and 2. Then with probability 1 -1/T , Algorithm 1 satisfies constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 and has regret of &#213;(T 2/3 ) for constraints {(B &#8467; (&#181; * ), c &#8467; )} L &#8467;=1 .</p><p>proof. First, we note that the regret of the first T 2/3 steps can be bounded by</p><p>By Property 1, the uniform at random allocation satisfies constraints {(B &#8467; (&#181;), c &#8467; } L &#8467;=1 . Therefore, X t satisfies the constraints for all t &lt; T 2/3 . Furthermore, because the fractional allocation was uniform at random for the first T 2/3 steps, we have that for sufficiently large T ,</p><p>where the second to last inequality is by Hoeffding's Inequality and a union bound. This implies that with probability 1 -1 2T , &#8741;&#1013;&#8741; 1 = &#213;(T -1/3 ). Because the values are drawn from a Sub-Gaussian distribution, there exists a constant c &gt; 0 (which depends on the distribution of the values) such that by Hoeffding's inequality,</p><p>For the rest of this proof, we will assume that</p><p>which by Equations ( <ref type="formula">13</ref>) and ( <ref type="formula">14</ref>) happens with probability 1 -1/T .</p><p>If K is the Lipschitz constant for this family of constraints, then by Equation ( <ref type="formula">15</ref>), 2K&#8741;&#1013;&#8741; 1 &#8804; &#213;(T -1/3 ) &#8804; &#947; 0 for sufficiently large T , where &#947; 0 is from Property 2. Therefore, taking &#947; = 2K&#8741;&#1013;&#8741; 1 in Property 2 gives that there exists some fractional allocation X &#8242; such that</p><p>and such that for every constraint &#8467; &#8712;</p><p>For any &#181; &#8712; &#956; &#177; &#1013;, we have that &#8741;&#181; -&#181; * &#8741; 1 &#8804; 2&#8741;&#1013;&#8741; 1 by Equation ( <ref type="formula">16</ref>) and the triangle inequality. By the Lipschitz continuity assumption (Property 3), this implies that for all &#181; &#8712; &#956; &#177; &#1013;,</p><p>Therefore, if Equation ( <ref type="formula">18</ref>) holds for a constraint &#8467;, then for any &#181; &#8712; &#956; &#177; &#1013;,</p><p>[Equation <ref type="bibr">(18)</ref>] Therefore, we have shown that if Equation ( <ref type="formula">18</ref>) holds for a constraint &#8467;, then the fractional allocation X &#8242; satisfies constraint (B &#8467; (&#181;), c &#8467; ) for all &#181; &#8712; &#956; &#177; &#1013;. If Equation <ref type="bibr">(18)</ref> does not hold for a constraint &#8467;,</p><p>by Property 4, this implies by Property 1 that &#10216;B &#8467; (&#181;), X &#8242; &#10217; F &#8805; c &#8467; for all &#181;. Therefore, we have shown that X &#8242; satisfies {(B &#8467; (&#181;), c &#8467; )} L &#8467;=1 for all &#181; &#8712; &#956; &#177; &#1013;, and thus X &#8242; satisfies the constraints in LP (3). Because X is the optimal solution to LP (3), we have that</p><p>By Equation ( <ref type="formula">16</ref>), this implies that</p><p>Therefore,</p><p>[Equations ( <ref type="formula">17</ref>) and ( <ref type="formula">20</ref>)]</p><p>).</p><p>[Equation ( <ref type="formula">15</ref>)]</p><p>Combining with Equation ( <ref type="formula">12</ref>), this gives a total regret of</p><p>= &#213;(T 2/3 ).</p><p>Lastly, we must show that the constraints are satisfied by the fractional allocation used by the algorithm for t &#8805; T 2/3 . This is because if Equation ( <ref type="formula">16</ref>) holds, then any solution to LP (3) must satisfy the constraints {(B &#8467; (&#181; * ), c &#8467; } L &#8467;=1 , and therefore the fractional allocation used by the algorithm for all t &#8805; T 2/3 will satisfy these constraints. Recall that all of the above relies on Equations ( <ref type="formula">16</ref>) and ( <ref type="formula">15</ref>) holding, which happens with probability 1 -1/T as desired.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E Proof of Lemma 2</head><p>For proportionality in expectation, LP (1) can be rewritten as the following linear program.</p><p>In order to show that the proportionality constraints satisfy Property 2, we want to construct an X &#8242; such that 1. X &#8242; decreases the social welfare relative to Y &#181; by O(&#947;) and 2. For every constraint i &#8712; [n], either X &#8242; = UAR or</p><p>LP <ref type="bibr">(22)</ref> has n constraints, one corresponding to each player. Define</p><p>S i is the slack on the ith constraint when using the optimal solution Y &#181; . Now we have two cases depending on n i=1 S i , the total amount of slack across all n players. Case 1:</p><p>a n&#947; Let X &#8242; = UAR. This will result in an decrease of social welfare of at most b a n&#947; compared to Y &#181; . To see why, note that the slack of constraint i is equivalent to how much player i prefers their fractional allocation in Y &#181; to UAR. Therefore, switching to UAR from Y &#181; decreases the total social welfare by S i per player, and therefore decreases the total social welfare by</p><p>. Furthermore, X &#8242; = UAR clearly satisfies the other condition because every player is treated equally.</p><p>Case 2:</p><p>a n&#947; Intuitively, in this case we want to redistribute the slack from the constraints with a lot of slack to the constraints without much slack. To do this, construct X &#8242; as follows. Define</p><p>By construction, we have that</p><p>Because</p><p>Furthermore, for every i,</p><p>With Equation <ref type="bibr">(25)</ref>, this implies that &#8710; ik &#8804; Y &#181; ik . Finally, we note that</p><p>Now we are ready to construct X &#8242; . Let</p><p>In order for this to be a valid allocation, we need that X &#8242; ik &#8805; 0, which is true because we showed above that &#8710; ik &#8804; Y &#181; ik . We also need that</p><p>Next, we will show that Equation ( <ref type="formula">23</ref>) is satisfied for all i for fractional allocation X &#8242; . Starting with the left hand side of Equation ( <ref type="formula">23</ref>), we have</p><p>[Eq <ref type="bibr">(26)</ref>] Furthermore, we can bound the decrease in social welfare between fractional allocation X &#8242; and Y &#181; by</p><p>[Equation ( <ref type="formula">26</ref>)] Therefore, we have shown that X &#8242; has the desired properties, and thus the proportionality constraints satisfy Property 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F Proof of Lemma 1</head><p>For Section F only, we will assume w.l.o.g. that a = 1 and that b &#8805; 1. This is without loss of generality because envy-freeness in expectation constraints and social welfare are both scale invariant. Therefore, scaling every player's values (and mean values) by 1/a will give an equivalent problem where a = 1.</p><p>To prove Lemma 1, will show that we can transform Y &#181; into a fractional allocation X &#8242; which satisfies Property 2 through Algorithm 3. Algorithm 3 iterates over the following types of 'envy-with-slack' graphs, which track whether a player prefers their allocation by at least &#945; over another player's allocation. Definition 6. Let create-slack-graph(&#181;, X, &#945;) be the subroutine which returns the directed graph with vertices N and edges generated as follows. Suppose G = create-slack-graph(&#181;, X, &#945;). Then a directed edge from i to i &#8242; exists in G if and only if</p><p>At a high level, Algorithm 3 constructs 'envy-with-slack' graphs with progressively smaller &#945;, with &#945; &#8805; &#947; for all iterations. The algorithm operates on sets of nodes called equivalence classes, where every pair of nodes in an equivalence class has the same allocation. We represent the set of equivalence classes in a fractional allocation X as S(X). Definition 7. Let S(X) be the set of equivalence classes of fractional allocation X, where two nodes i, i &#8242; are part of the same equivalence class S &#8712; S(X) if and only if</p><p>Algorithm 3 generally makes progress by either 1) merging two equivalence classes, or 2) removing an edge from the graph. We formalize this model below.</p><p>Each iteration r begins with some allocation X r and a slack value &#945; r . Algorithm 3 then generates from these parameters a directed graph G r = create-slack-graph(&#181;, X r , &#945; r ), which is the 'envywith-slack' graph for allocation X r given means &#181;. As in standard graph notation, for a graph G we define V (G) as the vertices of G and E(G) as the edges of G. Each edge e = (i, i &#8242; ) &#8712; E(G r ) has a weight w e = &#10216;X i , &#181; i &#10217; -&#10216;X i &#8242; , &#181; i &#10217;. For a set of vertices S, we use &#948; + (S) to represent the edges with head in S and tail in N \S, and similarly, we use &#948; -(S) to represent the edges with tail in S and head in N \S. For notational convenience, we let &#948; -(i) = &#948; -({i}), and &#948; + (i) = &#948; + ({i}). Throughout this section, we will use the notation sw(X, &#181;) = &#10216;X, &#181;&#10217; F .</p><p>An overview of the algorithm is as follows. In each iteration, Algorithm 3 generally performs one of three operations and removes an edge by decreasing &#945;. First, if there exists an equivalence class S with at least one incoming edge but no outgoing edges, then operation remove-incoming-edge transfers allocation probability from nodes in S to all other nodes. If such an equivalence class does not exist, then Algorithm 3 finds a specific type of cycle in the 'envy-with-slack' graph. If there exists some node which has an edge to some but not all of the nodes in the cycle, then operation cycle-shift gives each node in the cycle half of its current allocation and half of the next node's allocation. If such a node does not exist and all edges in the graph have low enough weight, then operation average-clique instead creates a new equivalence class by merging all equivalence classes that the nodes in the cycle belong to. Such a merge may lead to envy, which is removed by a call to Algorithm 4. We define each of the three operations formally below, where each operation returns a new allocation X &#8242; . Definition 8. Let S &#8712; S(X).</p><p>Define X Sk = X ik for every i &#8712; S. Let remove-incoming-edge(S, &#945;, X) be the subroutine which returns X &#8242; , where</p><p>Let C be a cycle in a graph G = create-slack-graph(&#181;, X, &#945;) and next(i) be the node which i points to in C. Then the subroutine cycle-shift(C, X) returns X &#8242; , where</p><p>Definition 10. Let Q be a clique in a graph G = create-slack-graph(&#181;, X, &#945;). Then the subroutine average-clique(Q, X) returns X &#8242; , where</p><p>We also define two intermediary operations. Note that the arg min function may return the empty set, a singleton, or a set with multiple elements. Definition 11. Let G = create-slack-graph(&#181;, X, &#945;) with edge set E. For each equivalence class S &#8712; S(X), let E S = arg min e&#8712;&#948; + (S) w e , where the size of E S may be 0, 1, or &gt; 1. Let E &#8242; = S E S , and let G &#8242; = (N, E &#8242; ). Then the subroutine find-special-cycle(G, S(X)) returns a cycle C of G &#8242; where V (C) contains at most one member of each equivalence class S &#8712; S(X) (and returns &#8709; if no such cycle exists).</p><p>Algorithm 4 Remove Envy (also referred to as subroutine remove-envy(&#181;, X 0 ))</p><p>while w (u,v) &lt; 0 and there does not exist a cycle in G r containing (u, v) do</p><p>// Let prev(i) and next(i) be the nodes before and after i in C, respectively.</p><p>We begin by proving some helpful lemmas. It will be convenient to define the following term. Definition 14. Let X be an envy-free fractional allocation for &#181; if for all i, i &#8242; &#8712; N , &#10216;X i , &#181; i &#10217; -&#10216;X i &#8242; , &#181; i &#10217; &#8805; 0. Lemma 4. Let X be an envy-free fractional allocation for &#181;, and let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Suppose that there exists some equivalence class S &#8712; S(X) such that &#948; -(S) &#824; = &#8709; and &#948; + (S) = &#8709; in G, and let X &#8242; = remove-incoming-edge(S, &#945;, X).</p><p>proof. We first show that if an edge e / &#8712; E, then e / &#8712; E &#8242; . Note that for any i, i &#8242; such that (i, i &#8242; ) / &#8712; E,</p><p>Therefore, to show that an edge (i, i &#8242; ) not in E is also not in E &#8242; , it suffices to show that</p><p>The subroutine remove-incoming-edge(S, &#945;, X) only transfers weight from S to N \S, which implies that no node in N \S will gain an edge to a node in S. Formally,</p><p>Every pair of nodes i, i &#8242; &#8712; S has their fractional allocation reduced by the same amount, so</p><p>Similarly, every pair of nodes i, i &#8242; &#8712; N \S has their fractional allocation increased by the same amount, so</p><p>Finally, we observe that for any i &#8712; S and i &#8242; &#8712; N \S,</p><p>where the second inequality is because</p><p>Recall that &#948; -(S) &#824; = &#8709; in G. We will now show that &#948; -(S) = &#8709; in G &#8242; . Observe that for i &#8712; S and i &#8242; &#8712; N \S,</p><p>Therefore, (i &#8242; , i) / &#8712; E, which implies &#948; -(S) = &#8709; in G &#8242; . We conclude that all edges in E &#8242; exist in E, and at least one edge in E does not exist in E &#8242; , which implies that |E &#8242; | &lt; |E|. Lemma 5. Let X be an envy-free fractional allocation for &#181;, and let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Suppose that there exists some equivalence class S &#8712; S(X) such that &#948; -(S) &#824; = &#8709; and &#948; + (S) = &#8709; in G, and let X &#8242; = remove-incoming-edge(S, &#945;, X). Then X &#8242; is an envy-free allocation for &#181;.</p><p>proof. Let G &#8242; = create-slack-graph(&#181;, X &#8242; , &#945;/2b) with edge set E &#8242; . It suffices to show that w e &#8805; 0 &#8704;e &#8712; E &#8242; . By Lemma 4, if an edge (i, i &#8242; ) / &#8712; E, then</p><p>Consider any i &#8712; S and i &#8242; &#8712; N \S. Then there must not exist an edge (i, i &#8242; ) &#8712; E by assumption. Also by assumption, for any i, i &#8242; such that (i, i &#8242; ) &#8712; E, we have that &#10216;X i , &#181; i &#10217; -&#10216;X i &#8242; , &#181; i &#10217; &#8805; 0. By a direct application of Equations ( <ref type="formula">30</ref>), <ref type="bibr">(31)</ref>, and (32), we can conclude that &#10216;X &#8242; i , &#181; i &#10217; -&#10216;X &#8242; i &#8242; , &#181; i &#10217; &#8805; 0 as well. Lemma 6. Let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Suppose that there exists some equivalence class S &#8712; S(X) such that &#948; -(S) &#824; = &#8709; and &#948; + (S) = &#8709; in G, and let X &#8242; = remove-incoming-edge(S, &#945;, X). Then sw(X, &#181;) -sw(X &#8242; , &#181;) &#8804; &#945;n 2 .</p><p>proof. Observe that</p><p>This implies that sw(X, &#181;) -sw(X &#8242; , &#181;) &#8804; &#945;n 2 .</p><p>Lemma 7. Let X be an envy-free fractional allocation for &#181; and let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Let C = find-special-cycle(G, S(X)) and suppose there exists a node u such that for i, i</p><p>proof. We first show that if an edge e / &#8712; E, then e / &#8712; |E &#8242; |. Suppose that i &#8712; V (C), i &#8242; &#8712; N , and edge</p><p>Now, suppose that i, i &#8242; / &#8712; V (C) and edge (i, i &#8242; ) / &#8712; E. Then</p><p>Finally, suppose that i &#8712; V (C), i &#8242; &#8712; N \V (C), and edge (i &#8242; , i) / &#8712; E. Then</p><p>We have shown that if e &#8712; E &#8242; , then e &#8712; E. Now we will show that there exists at least one edge e such that e &#8712; E, but e / &#8712; E &#8242; . Consider the node u described in the lemma statement, and let i &#8712; V (C) be some node such that (u, i)</p><p>This implies that (u, i) / &#8712; E &#8242; , as desired.</p><p>Lemma 8. Let X be an envy-free fractional allocation for &#181; and let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Let C = find-special-cycle(G, S(X)) and let X &#8242; = cycle-shift(C, X). Then X &#8242; is an envy-free allocation for &#181;.</p><p>proof. Let G &#8242; = create-slack-graph(&#181;, X &#8242; , &#945;/2) with edge set E &#8242; . It suffices to show that w e &#8805; 0 for all e &#8712; E &#8242; . By Lemma 7, if an edge (i, i &#8242; ) / &#8712; E, then</p><p>Then by Equation (34),</p><p>Lemma 9. Let X be an envy-free fractional allocation for &#181; and let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Let C = find-special-cycle(G, S(X)) and let</p><p>This implies that sw(X, &#181;) -sw(X &#8242; , &#181;) &#8804; n&#945; 2 .</p><p>Lemma 10. Let X be an envy-free fractional allocation for &#181;. Let G = create-slack-graph(&#181;, X, &#945;) with edge set E, and let Q be a clique in G. Let X &#8242; = average-clique(Q, X) and</p><p>Finally, suppose that i &#8712; Q, i &#8242; &#8712; N \Q, and (i &#8242; , i) / &#8712; E. Then</p><p>Lemma 11. Let X be an envy-free fractional allocation for &#181;. Let G = create-slack-graph(&#181;, X, &#945;) with edge set E, and let Q be a clique in G. Let X &#8242; = average-clique(Q, X). Then</p><p>Rearranging, this implies that sw(X, &#181;) -sw(X &#8242; , &#181;) &#8804; n &#8226; &#945;.</p><p>Lemma 13. Let G = create-slack-graph(&#181;, X, 0) with edge set E, and let X &#8242; = remove-envy(&#181;, X). Further let G &#8242; = create-slack-graph(&#181;, X &#8242; , 0) with edge set E &#8242; . Then</p><p>proof. First, we show that if an edge e / &#8712; E, then e / &#8712; E &#8242; . In other words, we will show that no new edges with positive envy (negative weight) are added. Observe that within the while loop, remove-envy distributes &#946; i * from a set U to N \U . If at the beginning of the while loop &#8707;i &#8712; U such that w i,i &#8242; &lt; 0 for some i &#8242; &#8712; N \U , then &#946; i * = 0 and i &#8242; is added to U . Otherwise, by definition &#946; i * is at most the minimum fraction that the set U needs to give away in order to create a new 0 envy edge between any node in U and a node i &#8242; &#8712; N \U . Therefore, distribute-equally cannot create a new edge with positive envy by our choice of &#946; i * , which implies that no new edge with positive envy could have been created by the end of the while loop. Now, suppose that w (u,v) &lt; 0 at the end of the while loop. Then there is a cycle C in G r containing (u, v). Then for every i, &#10216;X &#8242; i , &#181; i &#10217; &#8805; &#10216;X r i , &#181; i &#10217;. The total set of allocations has not changed, so no positive envy is introduced. Now, we show that there exists an edge e &#8712; E such that e / &#8712; E &#8242; . That is, we show that we have removed some edge with positive envy. If w (u,v) = 0 at the end of the while loop, then (u, v) &#8712; E and (u, v) &#824; &#8712; E &#8242; so we are done. Otherwise, there is a cycle in G r containing (u, v). As the overall set of allocations has not changed and &#10216;X v , &#181; u &#10217; -&#10216;X u , &#181; u &#10217; &gt; 0, we must have i &#8712; V (C) s.t. &#10216;X &#8242; i , &#181; u &#10217; -&#10216;X &#8242; u , &#181; u &#10217; &lt; 0 &lt; i &#8712; V (C) s.t. &#10216;X i , &#181; u &#10217; -&#10216;X u , &#181; u &#10217; &lt; 0 .</p><p>Therefore, there exists some edge e = (u, i) for i &#8712; V (C) such that e &#8712; E but e / &#8712; E &#8242; .</p><p>Applying the above once for each node, we obtain sw(X &#8467; , &#181;)</p><p>&#8805; sw(X &#8467;-1 , &#181;) -n &#8226; &#945; 4n 2 . Rearranging, this implies that sw(X &#8467;-1 , &#181;) -sw(X &#8467; , &#181;) &#8804; &#945; 4n .</p><p>Because &#8467; &#8804; n 2 , we therefore must have sw(X 0 , &#181;) -sw(X &#8467; , &#181;) &#8804; n&#945; 4 .</p><p>Lemma 16. Let G = create-slack-graph(&#181;, X, &#945;) with edge set E. Suppose that for S &#8712; S(X), if &#948; -(S) / &#8712; &#8709;, then &#948; + (S) / &#8712; &#8709;. Further suppose that &#945; 4bn 3 &#8804; &#10216;X i , &#181; i &#10217; -&#10216;X i &#8242; , &#181; i &#10217; &#8704;i, i &#8242; &#8712; N Let X 0 = X and define X &#8467; = remove-envy(&#181;, X &#8467;-1 ). Finally, for &#8467; &#8804; n 2 let G &#8242; = create-slack-graph(&#181;, X &#8467; , &#945;</p><p>2 ) with edge set E &#8242; . Then |E &#8242; | &#8804; |E|.</p><p>proof. It suffices to show that if an edge (i, i &#8242; ) / &#8712; E, then (i, i &#8242; ) / &#8712; E &#8242; . By Lemma 14, in each call to remove-envy, the utility of i for the allocation of i decreases by at most &#945; 4n 2 . Therefore,</p><p>Again by Lemma 14, in each call to remove-envy, the utility of i for the allocation of i &#8242; increases by at most &#945; 4n 3 . Therefore,</p><p>Putting both equations together, we have</p><p>Lemma 17. Let X &#8242; = remove-envy(&#181;, X). Then |S(X &#8242; )| &#8804; |S(X)|.</p><p>proof. It suffices to show that no equivalence class S &#8712; S(X) becomes smaller (i.e. strictly loses members) during remove-envy. First, suppose for contradiction that S first becomes smaller during the while loop during iteration r. Then in iteration r, it must be the case that S &#8745; U &#824; = &#8709; and S &#8745; (N \U ) &#824; = &#8709;, otherwise the allocation of each member of S would have changed by the same amount. Furthermore, it must be the case that in iteration r, &#946; i * &gt; 0, otherwise no allocation would have been transferred. In iteration r, consider some i &#8712; (S &#8745; U ) and i &#8242; &#8712; (S &#8745; (N \U )). Then &#946; i = 0, as i begins iteration r with zero envy towards i &#8242; . Because &#946; i * &#8804; min i &#946; i , this means &#946; i * &#8804; 0, which is a contradiction . Therefore, no equivalence class S becomes smaller during the while loop. To conclude the proof, we observe that in the cycle elimination step after the while loop, the total set of allocations remains the same, which implies that the set of equivalence classes remains the same as well.</p><p>We are finally ready to prove Lemma 1.</p><p>Proof of Lemma 1. We first prove by induction that every iteration r starts with an envy-free allocation X r . The base case is satisfied because X 0 = Y &#181; , and Y &#181; is envy-free by definition. Now, suppose that the inductive hypothesis holds for all iterations up to and including r. We will show that iteration r + 1 starts with an envy-free allocation. If X r+1 = X r , then we can directly invoke the inductive hypothesis for iteration r. Otherwise, suppose that remove-incoming-edge was called in iteration r of Algorithm 3. Then by Lemma 5, iteration r + 1 starts with an envy-free allocation. Suppose instead that cycle-shift was called in iteration r. Then by Lemma 8, iteration r + 1 starts with an envy-free allocation. Finally, suppose that average-clique was called in iteration r. Then the remove-envy is called repeatedly as long as there exists an edge with negative weight in E(G &#8242; ). By Lemma 13, each call to remove-envy removes an edge with negative weight from E(G &#8242; ), and adds no new edges with negative weight. There are a finite number of edges in E(G &#8242; ), so this loop terminates. Therefore, iteration r + 1 starts with an envy-free allocation.</p><p>Next, we prove that in every iteration, either an edge is removed from the envy-with-slack graph, or the number of equivalence classes decreases. Formally, for two iterations r and r + 1, we prove that either We now prove that the algorithm terminates with an X r which satisfies Proposition 2. Each iteration either removes an edge or merges two equivalence classes. Because the maximum number of edges is n 2 and the number of equivalence classes is n, Algorithm 3 terminates in at most n 3 iterations. We need to show that Algorithm 3 terminates with &#945; r &#8805; &#947;. If an iteration r does not call remove-envy, then an edge is removed and &#945; r+1 &#8805; &#945; r 4bn 4 . There can be at most n 2 such iterations. If an iteration r does call remove-envy, then &#945; r+1 = &#945; r 2n . There can be at most n such iterations which call remove-envy between every iteration which does not call remove-envy, for a total of at most n 3 iterations. Therefore, &#945; r &#8805; &#945; 0 (4bn 4 ) n 2 &#8226; (2n) n 3 = &#945; 0 e n 2 log(4bn 4 )+n 3 log(2n) .</p><p>Choosing &#945; 0 = &#947; &#8226; (e n 2 log(4bn 4 )+n 3 log(2n) ) thus implies that &#945; r &#8805; &#947; for every iteration r if &#947; &#8226; (e n 2 log(4bn 4 )+n 3 log(2n) ) &#8804; 1. Therefore, we set &#947; 0 = e -n 2 log(4bn 4 )-n 3 log(2n) . Finally, we need to show that Algorithm 3 does not significantly decrease the social welfare. By Lemmas 6, 9 and 12, we know that each of the operations remove-incoming-edge, cycle-shift, and average-clique change the social welfare by at most O(&#947;). Each of these operations is called at most n 3 times. Each of remove-incoming-edge and cycle-shift are called at most once for each edge, or at most n 2 times. Operation average-clique is called at most n times for each edge, or at most n 3 times. Finally, Lemma 15 bounds the total social welfare loss from all calls to remove-envy between any two calls to average-clique by O(&#947;). Therefore, sw(Y &#181; , &#181;) -sw(X r , &#181;) = O(&#947;), as desired.</p></div></body>
		</text>
</TEI>
