NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hidden-Role Games: Equilibrium Concepts and Computation

Carminati, L; Zhang, B; Farina, G; Gatti, N; Sandholm, T (July 2024, EC24)

In this paper, we study the class of games known as hidden-role games in which players are assigned privately to teams and are faced with the challenge of recognizing and cooperating with teammates. This model includes both popular recreational games such as the Mafia/Werewolf family and The Resistance (Avalon) and many real-world settings, such as distributed systems where nodes need to work together to accomplish a goal in the face of possible corruptions. There has been little to no formal mathematical grounding of such settings in the literature, and it was previously not even clear what the right solution concepts (notions of equilibria) should be. A suitable notion of equilibrium should take into account the communication channels available to the players (e.g., can they communicate? Can they communicate in private?). Defining such suitable notions turns out to be a nontrivial task with several surprising conse- quences. In this paper, we provide the first rigorous definition of equilibrium for hidden-role games, which overcomes serious limitations of other solution concepts not designed for hidden-role games. We then show that in certain cases, including the above recreational games, optimal equilibria can be computed efficiently. In most other cases, we show that computing an optimal equilibrium is at least NP-hard or coNP-hard. Lastly, we experimentally validate our approach by computing exact equilibria for complete 5- and 6-player Avalon instances whose size in terms of number of information sets is larger than 1056.
more » « less
Full Text Available
Steering No-Regret Learners to a Desired Equilibrium

Zhang, B; Farina, G; Anagostides, I; Cacciamani, F; McAleer, S; Haupt, A; Celli, A; Gatti, N; Conitzer, V; Sandholm, T (July 2024, EC24)

A mediator observes no-regret learners playing an extensive-form game repeatedly across T rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator’s budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator’s payments: a total budget and a per-round budget. If the mediator’s total budget does not grow with T, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with T, that is, for the average payment to vanish. When players’ full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on T. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games.
more » « less
Full Text Available
Confronting Reward Model Overoptimization with Constrained RLHF

Moskovitz, T; Singh, A; Strouse, DJ; Sandholm, T; Salakhutdinov, R; Dragan, A; McAleer, S (May 2024, ICLR24)

Large language models are typically aligned with human preferences by optimizing reward models (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to overoptimization, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM’s threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run.
more » « less
Full Text Available
Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Sequential Games

Zhang, B; Farina, G; Sandholm, T (May 2024, ICLR24)

A recent paper by Farina & Pipis (2023) established the existence of uncoupled no-linear-swap regret dynamics with polynomial-time iterations in extensive-form games. The equilibrium points reached by these dynamics, known as linear correlated equilibria, are currently the tightest known relaxation of correlated equilibrium that can be learned in polynomial time in any finite extensive-form game. However, their properties remain vastly unexplored, and their computation is onerous. In this paper, we provide several contributions shedding light on the fundamental nature of linear-swap regret. First, we show a connection between linear deviations and a generalization of communication deviations in which the player can make queries to a “mediator” who replies with action recommendations, and, critically, the player is not constrained to match the timing of the game as would be the case for communication deviations. We coin this latter set the untimed communication (UTC) deviations. We show that the UTC deviations coincide precisely with the linear deviations, and therefore that any player minimizing UTC regret also minimizes linear-swap regret. We then leverage this connection to develop state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice. In theory, our algorithms achieve polynomially better per-iteration runtimes; in practice, our algorithms represent the state of the art by several orders of magnitude.
more » « less
Full Text Available
A Multiagent Path Search Algorithm for Large-Scale Coalition Structure Generation

Taguelmimt, R; Aknine, S; Boukredera, D; Changder, N; Sandholm, T (May 2024, AAMAS24)

Coalition structure generation (CSG) is a critical problem in multiagent systems, involving the optimal partitioning of agents into disjoint coalitions to maximize social welfare. This paper introduces SALDAE, a novel multiagent path finding algorithm for CSG on a coalition structure graph. SALDAE employs various heuristics and strategies for efficient search, making it an anytime algorithm suitable for handling large-scale problems
more » « less
Full Text Available
Efficient Size-based Hybrid Algorithm for Optimal Coalition Structure Generation

Taguelmimt, R; Aknine, S; Boukredera, D; Changder, N; Sandholm, T (May 2024, AAMAS24)

Coalition Structure Generation (CSG) involves dividing agents into coalitions in such a way as to coordinate them into solving problems together efficiently. In this paper, we revisit the CSG problem and propose a new search method that introduces an offline phase to speed up the search process, where the best coalition sets to search are preprocessed. These sets are calculated only once regardless of the coalition values and can be reused each time a CSG instance is to be solved. Then our search in the online phase combines dynamic programming with integer partition-based search in a novel way.
more » « less
Full Text Available
Learning Accurate and Interpretable Decision Trees

Balcan, Maria Florina; Sharma, Dravyansh (April 2024, The 40th Conference on Uncertainty in Artificial Intelligence)

Full Text Available
Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

Zhang, B; Sandholm, T (March 2024, IJCAI24)

The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partiallyobservable stochastic games (POSGs) and extensiveform games (EFGs). Our results depend on what is assumed about the tiebreaking scheme—that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.
more » « less
Full Text Available
Model-Free Preference Elicitation

Martin, C; Boutilier, C; Sandholm, T; Meshi, O (March 2024, IJCAI24)

In recommender systems, preference elicitation (PE) is an effective way to learn about a user’s preferences to improve recommendation quality. Expected value of information (EVOI), a Bayesian technique that computes expected gain in user utility, has proven to be effective in selecting useful PE queries. Most EVOI methods use probabilistic models of user preferences and query responses to compute posterior utilities. By contrast, we develop model-free variants of EVOI that rely on function approximation to obviate the need for specific modeling assumptions. Specifically, we learn user response and utility models from existing data (often available in real-world recommender systems), which are used to estimate EVOI rather than relying on explicit probabilistic inference. We augment our approach by using online planning, specifically, Monte Carlo tree search, to further enhance our elicitation policies. We show that our approach offers significant improvement in recommendation quality over standard baselines on several PE tasks.
more » « less
Full Text Available
Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Tewolde, E; Zhang, B; Oesterheld, C; Zampetakis, M; Sandholm, T; Goldberg, P; Conitzer, V (March 2024, IJCAI24)

We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensiveform games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, ΣP2 , ∃R, and ∃∀R.
more » « less
Full Text Available

« Prev Next »

Search for: All records