NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimizing Relevance and Diversity in Online Matching Markets: A Time-Adaptive Attenuation Approach

https://doi.org/10.1613/jair.1.16635

Xu, Evan Yifan; Xu, Pan (June 2025, Journal of Artificial Intelligence Research)

Real-world online matching markets (OMMs) often involve multiple objectives, such as maximizing relevance and diversity in online recommendation and crowdsourcing systems. In this paper, we propose a generic bi-objective maximization model for OMMs with the following features: (1) there are two types of agents—offline and online—with online agents arriving dynamically and stochastically; (2) upon each online agent’s arrival, an immediate and irrevocable decision must be made regarding which subset of relevant offline agents to assign; and (3) each offline and online agent has a specific matching capacity, i.e., an upper bound on the number of allowable matchings. Our model supports two general linear objective functions defined over all possible assignments to online agents. We formulate a bi-objective linear program (LP) and design an LP-based parameterized algorithm. Departing from prevalent non-adaptive attenuation methods, we introduce a time-adaptive attenuation framework that achieves an almost tight competitive ratio for each objective. To complement our theoretical analysis, we implement the proposed algorithm and evaluate it against several heuristics using two real-world datasets. Extensive experimental results demonstrate the flexibility and effectiveness of our approach, validating our theoretical predictions.
more » « less
Free, publicly-accessible full text available June 23, 2026
Robust Offline Reinforcement Learning with Linearly Structured $$f$$-Divergence Regularization

Tang, Cheng; Liu, Zhishuai; Xu, Pan (May 2025, Proceedings of the 42nd International Conference on Machine Learning)

The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured regularization, potentially leading to conservative policies under unrealistic transitions. To address this limitation, we propose a novel framework, the $$d$$-rectangular linear RRMDP ($$d$$-RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employs linear function approximation for robust policy learning in $$d$$-RRMDPs with $$f$$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, demonstrating that these bounds are influenced by how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. We establish information-theoretic lower bounds to verify that our algorithm is near-optimal. Finally, numerical experiments validate that R2PVI learns robust policies and exhibits superior computational efficiency compared to baseline methods.
more » « less
Free, publicly-accessible full text available May 1, 2026
Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

He, Yiting; Liu, Zhishuai; Xu, Pan (May 2025, Proceedings of the 42nd International Conference on Machine Learning)

Off-dynamics reinforcement learning (RL), where training and deployment transition dynamics are different, can be formulated as learning in a robust Markov decision process (RMDP) where uncertainties in transition dynamics are imposed. Existing literature mostly assumes access to generative models allowing arbitrary state-action queries or pre-collected datasets with a good state coverage of the deployment environment, bypassing the challenge of exploration. In this work, we study a more realistic and challenging setting where the agent is limited to online interaction with the training environment. To capture the intrinsic difficulty of exploration in online RMDPs, we introduce the supremal visitation ratio, a novel quantity that measures the mismatch between the training dynamics and the deployment dynamics. We show that if this ratio is unbounded, online learning becomes exponentially hard. We propose the first computationally efficient algorithm that achieves sublinear regret in online RMDPs with $$f$$-divergence based transition uncertainties. We also establish matching regret lower bounds, demonstrating that our algorithm achieves optimal dependence on both the supremal visitation ratio and the number of interaction episodes. Finally, we validate our theoretical results through comprehensive numerical experiments.
more » « less
Free, publicly-accessible full text available May 1, 2026
A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

https://doi.org/10.1613/jair.1.16261

Xu, Evan Yifan; Xu, Pan (January 2025, Journal of Artificial Intelligence Research)

We consider two versions of the (stochastic) budgeted Multi-Armed Bandit problem. The first one was introduced by Tran-Thanh et al. (AAAI, 2012): Pulling each arm incurs a fixed deterministic cost and yields a random reward i.i.d. sampled from an unknown distribution (prior free). We have a global budget B and aim to devise a strategy to maximize the expected total reward. The second one was introduced by Ding et al. (AAAI, 2013): It has the same setting as before except costs of each arm are i.i.d. samples from an unknown distribution (and independent from its rewards). We propose a new budget-based regret-analysis framework and design two simple algorithms to illustrate the power of our framework. Our regret bounds for both problems not only match the optimal bound of O(ln B) but also significantly reduce the dependence on other input parameters (assumed constants), compared with the two studies of Tran-Thanh et al. (AAAI, 2012) and Ding et al. (AAAI, 2013) where both utilized a time-based framework. Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions.
more » « less
Free, publicly-accessible full text available January 6, 2026
Promoting fairness among dynamic agents in online-matching markets under known stationary arrival distributions

Ma, Will; Xu, Pan (December 2024, Advances in Neural Information Processing Systems)

Free, publicly-accessible full text available December 16, 2025
Parameter-dependent competitive analysis for online capacitated coverage maximization through boostings and attenuations

Xu, Pan (July 2024, Forty-first International Conference on Machine Learning)

Full Text Available
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Hsu, Hao-Lun; Wang, Weixin; Pajic, Miroslav; Xu, Pan (September 2024, Advances in Neural Information Processing Systems)

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy, respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $$\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$$ regret bound with communication complexity $$\widetilde{\mathcal{O}}(dHM^2)$$, where $$d$$ is the feature dimension, $$H$$ is the horizon length, $$M$$ is the number of agents, and $$K$$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., $$N$$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.
more » « less
Full Text Available
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Hsu, Hao-Lun; Wang, Weixin; Pajic, Miroslav; Xu, Pan (September 2024, Advances in Neural Information Processing Systems)

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy, respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $$\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$$ regret bound with communication complexity $$\widetilde{\mathcal{O}}(dHM^2)$$, where $$d$$ is the feature dimension, $$H$$ is the horizon length, $$M$$ is the number of agents, and $$K$$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., $$N$$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.
more » « less
Full Text Available
Efficient and robust sequential decision making algorithms

https://doi.org/10.1002/aaai.12186

Xu, Pan (September 2024, AI Magazine)

Abstract Sequential decision‐making involves making informed decisions based on continuous interactions with a complex environment. This process is ubiquitous in various applications, including recommendation systems and clinical treatment design. My research has concentrated on addressing two pivotal challenges in sequential decision‐making: (1) How can we design algorithms that efficiently learn the optimal decision strategy with minimal interactions and limited sample data? (2) How can we ensure robustness in decision‐making algorithms when faced with distributional shifts due to environmental changes and the sim‐to‐real gap? This paper summarizes and expands upon the talk I presented at the AAAI 2024 New Faculty Highlights program, detailing how my research aims to tackle these challenges.
more » « less
Promoting External and Internal Equities under Ex-Ante/Ex-Post Metrics in Online Resource Allocation

Sankararaman, Karthik A; Srinivasan, Aravind; Xu, Pan (July 2024, Proc. International Conference on Machine Learning (ICML))

Full Text Available

« Prev Next »

Search for: All records