NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding; Shi, Laixi; Wen, Muning; Jin, Ming; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam; Spanos, Costas (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Scalable Dynamic Resource Allocation via Domain Randomized Reinforcement Learning

https://doi.org/10.1109/GLOBECOM52923.2024.10901010

Wang, Yiqi; Shi, Laixi; Lee, Martin Hyungwoo; Sydir, Jaroslaw; Zhou, Zhu; Chi, Yuejie; Li, Bin (December 2024, IEEE)

Full Text Available
Near-Optimal Distributionally Robust Reinforcement Learning with General Lp Norms

Clavier, Pierre; Shi, Laixi; Le_Pennec, Erwan; Mazumdar, Eric; Wierman, Adam; Geist, Matthieu (September 2024, The Thirty-eighth Annual Conference on Neural Information Processing Systems)

To address the challenges of sim-to-real gap and sample efficiency in reinforcement learning (RL), this work studies distributionally robust Markov decision processes (RMDPs) --- optimize the worst-case performance when the deployed environment is within an uncertainty set around some nominal MDP. Despite recent efforts, the sample complexity of RMDPs has remained largely undetermined. While the statistical implications of distributional robustness in RL have been explored in some specific cases, the generalizability of the existing findings remains unclear, especially in comparison to standard RL. Assuming access to a generative model that samples from the nominal MDP, we examine the sample complexity of RMDPs using a class of generalized norms as the 'distance' function for the uncertainty set, under two commonly adopted -rectangular and -rectangular conditions. Our results imply that RMDPs can be more sample-efficient to solve than standard MDPs using generalized norms in both - and -rectangular cases, potentially inspiring more empirical research. We provide a near-optimal upper bound and a matching minimax lower bound for the -rectangular scenarios. For -rectangular cases, we improve the state-of-the-art upper bound and also derive a lower bound using norm that verifies the tightness.
more » « less
Full Text Available
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Woo, Jiin; Shi, Laixi; Joshi, Gauri; Chi, Yuejie (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Sample-efficient robust multi-agent reinforcement learning in the face of environmental uncertainty

Shi, Laixi; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam (July 2024, Proceedings of the 41st International Conference on Machine Learning)

To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied-- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DR-NVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.
more » « less
Full Text Available
Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Shi, Laixi; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Woo, Jiin; Shi, Laixi; Joshi, Gauri; Chi, Yuejie (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Settling the sample complexity of model-based offline reinforcement learning

https://doi.org/10.1214/23-AOS2342

Li, Gen; Shi, Laixi; Chen, Yuxin; Chi, Yuejie; Wei, Yuting (February 2024, The Annals of Statistics)

Full Text Available
The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Shi, Laixi; Li, Gen; Wei, Yuting; Chen, Yuxin; Geist, Matthieu; Chi, Yuejie (December 2023, 37th Conference on Neural Information Processing Systems (NeurIPS 2023))
Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

https://doi.org/10.1093/imaiai/iaac034

Li, Gen; Shi, Laixi; Chen, Yuxin; Chi, Yuejie (February 2023, Information and Inference: A Journal of the IMA)

Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $$S$$ states, $$A$$ actions and horizon length $$H$$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $$\sqrt{H^2SAT}$$ (modulo log factors) with $$T$$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $$S^6A^4 \,\mathrm{poly}(H)$$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $$SA\,\mathrm{poly}(H)$$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.
more » « less
Full Text Available

« Prev Next »

Search for: All records