- PAR ID:
- 10283402
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Volume:
- 38
- Page Range / eLocation ID:
- 1005-1014
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks by exploiting information from different agents. However, when each agent interacts with a po- tentially different environment, little to nothing is known theoretically about the non-asymptotic performance of FRL algorithms. The lack of such results can be attributed to various technical challenges and their intricate interplay: Markovian sampling, linear function approximation, multiple local updates to save communi- cation, heterogeneity in the reward functions and transition kernels of the agents’ MDPs, and continuous state-action spaces. Moreover, in the on-policy setting, the behavior policies vary with time, further complicating the analysis. In response, we introduce FedSARSA, a novel federated on-policy reinforcement learning scheme, equipped with linear function approximation, to address these challenges and provide a comprehensive finite-time error analysis. Notably, we establish that FedSARSA converges to a policy that is near-optimal for all agents, with the ex- tent of near-optimality proportional to the level of heterogeneity. Furthermore, we prove that FedSARSA leverages agent collaboration to enable linear speedups as the number of agents increases, which holds for both fixed and adaptive step-size configurations.more » « less
-
Penalty-based strategies, such as congestion pricing, have been employed to improve traffic network efficiency, but they face criticism for their negative impact on users and equity concerns. Collaborative routing, which allows users to negotiate route choices, offers a solution that considers individual heterogeneity. Personalized incentives can encourage such collaboration and are more politically acceptable than penalties. This study proposes a collaborative routing strategy that uses personalized incentives to guide users towards desired traffic states while promoting multidimensional equity. Three equity dimensions are considered: accessibility equity (equal access to jobs, services, and education), inclusion equity (route suggestions and incentives that do not favor specific users), and utility equity (envy-free solutions where no user feels others have more valuable incentives). The strategy prioritizes equitable access to societal services and activities, ensuring accessibility equity in routing solutions. Inclusion equity is maintained through non-negative incentives that consider user heterogeneity without excluding anyone. An envy-free compensation mechanism achieves utility equity by eliminating envy over incentive-route bundles. A constrained traffic assignment (CTA) formulation and consensus optimization variant are then devised to break down the centralized problem into smaller, manageable parts and a decentralized algorithm is developed for scalability in large transportation networks and user populations. Numerical studies investigate the model's enhancement of equity dimensions and the impact of hyperparameters on system objective tradeoffs and demonstrate the algorithm convergence.more » « less
-
We study a model-free federated linear quadratic regulator (LQR) problem where M agents with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to minimize an average quadratic cost while keeping their data private. To exploit the similarity of the agents' dynamics, we propose to use federated learning (FL) to allow the agents to periodically communicate with a central server to train policies by leveraging a larger dataset from all the agents. With this setup, we seek to understand the following questions: (i) Is the learned common policy stabilizing for all agents? (ii) How close is the learned common policy to each agent's own optimal policy? (iii) Can each agent learn its own optimal policy faster by leveraging data from all agents? To answer these questions, we propose a federated and model-free algorithm named FedLQR. Our analysis overcomes numerous technical challenges, such as heterogeneity in the agents' dynamics, multiple local updates, and stability concerns. We show that FedLQR produces a common policy that, at each iteration, is stabilizing for all agents. We provide bounds on the distance between the common policy and each agent's local optimal policy. Furthermore, we prove that when learning each agent's optimal policy, FedLQR achieves a sample complexity reduction proportional to the number of agents M in a low-heterogeneity regime, compared to the single-agent setting.more » « less
-
We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve O(∆^1/4 T^3/4) regret when the degree of nonstationarity, as measured by total variation ∆, is known, and O(∆^1/5 T^4/5) regret when ∆ is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.more » « less
-
Abstract We develop a theory of policymaking between an agent and an overseer, with a principal whose welfare is affected by agent-overseer interactions. The agent can increase the quality of policy outcomes through costly capacity investments. Oversight and agent bias jointly determine optimal agent capacity investments. We show that when oversight improves agent investment incentives the principal always benefits from an agent with biases opposite the overseer. Competing agent-overseer biases translate into higher quality policy outcomes than the principal could induce were she monitoring the agent. Effective oversight is necessary for these incentive effects. The results imply that political principals ought to consider the nature of the broader policymaking environment when appointing agents to make policy on their behalf and when designing managerial strategies aimed at motivating agents. (JEL D73, D82, H11)