Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

Nguyen-Tang, Thanh; Arora, Raman

Citation Details

This content will become publicly available on December 1, 2025

Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

We study learning in a dynamically evolving environment modeled as a Markov game between a learner and a strategic opponent that can adapt to the learner’s strategies. While most existing works in Markov games focus on external regret as the learning objective, external regret becomes inadequate when the adversaries are adaptive. In this work, we focus on policy regret – a counterfactual notion that aims to compete with the return that would have been attained if the learner had followed the best fixed sequence of policy, in hindsight. We show that if the opponent has unbounded memory or if it is non-stationary, then sample-efficient learning is not possible. For memory-bounded and stationary adversaries, we show that learning is still statistically hard if the set of feasible strategies for the learner is exponentially large. To guarantee learnability, we introduce a new notion of consistent adaptive adversaries, wherein, the adversary responds similarly to similar strategies of the learner. We provide algorithms that achieve √ T policy regret against memorybounded, stationary, and consistent adversaries. more »

Award ID(s):: 1943251

PAR ID:: 10572977

Author(s) / Creator(s):: Nguyen-Tang, Thanh; Arora, Raman

Publisher / Repository:: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Date Published:: 2024-12-01

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on December 1, 2025
Conference Paper:
The DOI is not currently available.

More Like this