To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied-- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DR-NVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length. 
                        more » 
                        « less   
                    
                            
                            One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning
                        
                    
    
            In recent years, federated learning has been embraced as an approach for bringing about collaboration across large populations of learning agents. However, little is known about how collaboration protocols should take agents’ incentives into account when allocating individual resources for communal learning in order to maintain such collaborations. Inspired by game theoretic notions, this paper introduces a framework for incentive-aware learning and data sharing in federated learning. Our stable and envy-free equilibria capture notions of collaboration in the presence of agents interested in meeting their learning objectives while keeping their own sample collection burden low. For example, in an envy-free equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we compare the sample complexity of incentive-aware collaboration with that of optimal collaboration when one ignores agents’ incentives. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10283402
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Volume:
- 38
- Page Range / eLocation ID:
- 1005-1014
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Penalty-based strategies, such as congestion pricing, have been employed to improve traffic network efficiency, but they face criticism for their negative impact on users and equity concerns. Collaborative routing, which allows users to negotiate route choices, offers a solution that considers individual heterogeneity. Personalized incentives can encourage such collaboration and are more politically acceptable than penalties. This study proposes a collaborative routing strategy that uses personalized incentives to guide users towards desired traffic states while promoting multidimensional equity. Three equity dimensions are considered: accessibility equity (equal access to jobs, services, and education), inclusion equity (route suggestions and incentives that do not favor specific users), and utility equity (envy-free solutions where no user feels others have more valuable incentives). The strategy prioritizes equitable access to societal services and activities, ensuring accessibility equity in routing solutions. Inclusion equity is maintained through non-negative incentives that consider user heterogeneity without excluding anyone. An envy-free compensation mechanism achieves utility equity by eliminating envy over incentive-route bundles. A constrained traffic assignment (CTA) formulation and consensus optimization variant are then devised to break down the centralized problem into smaller, manageable parts and a decentralized algorithm is developed for scalability in large transportation networks and user populations. Numerical studies investigate the model's enhancement of equity dimensions and the impact of hyperparameters on system objective tradeoffs and demonstrate the algorithm convergence.more » « less
- 
            In 1979, Hylland and Zeckhauser [26] gave a simple and general mechanism for a 6 one-sided matching market, given cardinal utilities of agents over goods. They use the power of a 7 pricing mechanism, which endows their mechanism with several desirable properties – it produces an 8 allocation that is Pareto optimal and envy-free, and the mechanism is incentive compatible in the 9 large. It therefore provides an attractive, off-the-shelf method for running an application involving 10 such a market. With matching markets becoming ever more prevalent and impactful, it is imperative 11 to characterize the computational complexity of this mechanism. 12 We present the following results: 13 1. A combinatorial, strongly polynomial time algorithm for the dichotomous case, i.e., 0/1 14 utilities, and more generally, when each agent’s utilities come from a bi-valued set. 15 2. An example that has only irrational equilibria; hence this problem is not in PPAD. 16 3. A proof of membership of the problem in the class FIXP; as a corollary we get that an 17 HZ equilibrium can always be expressed via algebraic numbers. For this purpose, we give 18 a new proof of the existence of an HZ equilibrium using Brouwer’s fixed point theorem; 19 the proof of Hylland and Zeckhauser used Kakutani’s fixed point theorem, which is more 20 involved. 21 4. A proof of membership of the problem of computing an approximate HZ equilibrium in the 22 class PPAD.more » « less
- 
            Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a po- tentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communi- cation, heterogeneity in the reward functions and transition kernels of the agents’ MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the ex- tent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.more » « less
- 
            We study fair resource allocation with strategic agents. It is well-known that, across multiple fundamental problems in this domain, truthfulness and fairness are incompatible. For example, when allocating indivisible goods, no truthful and deterministic mechanism can guarantee envy-freeness up to one item (EF1), even for two agents with additive valuations. Or, in cake-cutting, no truthful and deterministic mechanism always outputs a proportional allocation, even for two agents with piecewise constant valuations. Our work stems from the observation that, in the context of fair division, truthfulness is used as a synonym for Dominant Strategy Incentive Compatibility (DSIC), requiring that an agent prefers reporting the truth, no matter what other agents report.In this paper, we instead focus on Bayesian Incentive Compatible (BIC) mechanisms, requiring that agents are better off reporting the truth in expectation over other agents' reports. We prove that, when agents know a bit less about each other, a lot more is possible: using BIC mechanisms we can achieve fairness notions that are unattainable by DSIC mechanisms in both the fundamental problems of allocation of indivisible goods and cake-cutting. We prove that this is the case even for an arbitrary number of agents, as long as the agents' priors about each others' types satisfy a neutrality condition. Notably, for the case of indivisible goods, we significantly strengthen the state-of-the-art negative result for efficient DSIC mechanisms, while also highlighting the limitations of BIC mechanisms, by showing that a very general class of welfare objectives is incompatible with Bayesian Incentive Compatibility. Combined, these results give a near-complete picture of the power and limitations of BIC and DSIC mechanisms for the problem of allocating indivisible goods.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    