NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits

Mukherjee, Subhojyoti; Xie, Qiaomin; Nowak, Robert D (August 2025, Reinforcement Learning Journal)

Free, publicly-accessible full text available August 8, 2026
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Mukherjee, Subhojyoti; Hanna, Josiah P; Xie, Qiaomin; Nowak, Robert D (August 2025, Reinforcement Learning Journal)

Free, publicly-accessible full text available August 8, 2026
Stable Offline Value Function Learning with Bisimulation-based Representations

Pavse, Brahma S; Chen, Yudong; Xie, Qiaomin; Hanna, Josiah P (July 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available July 17, 2026
A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression

https://doi.org/10.1145/3744970.3727269

Zhang, Yixuan; Huo, Dongyan Lucy; Chen, Yudong; Xie, Qiaomin (June 2025, ACM SIGMETRICS Performance Evaluation Review)

Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective functionfthat is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functionsfwith only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds.
more » « less
Free, publicly-accessible full text available June 16, 2026
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

Kwon, Jeongyeol; Dotson, Luke; Chen, Yudong; Xie, Qiaomin (May 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available May 3, 2026
Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent

https://doi.org/10.1609/aaai.v39i17.34035

Li, Xiang; Xie, Qiaomin (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration. When using a constant stepsize, the SGD iterates form a Markov chain, enjoying fast convergence during the initial transient phase. However, when reaching stationarity, the iterates oscillate around the optimum without making further progress. In this paper, we study the convergence diagnostics for SGD with constant stepsize, aiming to develop an effective dynamic stepsize scheme. We propose a novel coupling-based convergence diagnostic procedure, which monitors the distance of two coupled SGD iterates for stationarity detection. Our diagnostic statistic is simple and is shown to track the transition from transience stationarity theoretically. We conduct extensive numerical experiments and compare our method against various existing approaches. Our proposed coupling-based stepsize scheme is observed to achieve superior performance across a diverse set of convex and non-convex problems. Moreover, our results demonstrate the robustness of our approach to a wide range of hyperparameters.
more » « less
Free, publicly-accessible full text available April 11, 2026
The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

Huo, Dongyan; Zhang, Yixuan; Chen, Yudong; Xie, Qiaomin (September 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two structures leads to complications that are not captured by prior techniques. By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates and Markovian data. This enables us to overcome the obstacles in existing analysis and establish for the first time the weak convergence of the joint process. Furthermore, we present a precise characterization of the asymptotic bias of the SA iterates. As a by-product of our analysis, we derive finite-time bounds on higher moment and present non-asymptotic geometric convergence rates for the iterates, along with a Central Limit Theorem.
more » « less
Full Text Available
Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Zhang, Yixuan; Xie, Qiaomin (August 2024, Reinforcement Learning Journal)

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias allows the application of Richardson-Romberg (RR) extrapolation technique to construct a new estimate that is provably closer to the optimal Q function. Numerical results corroborate our theoretical finding on the improvement of the RR extrapolation method.
more » « less
Full Text Available
Inception: Efficiently Computable Misinformation Attacks on Markov Games

McMahan, Jeremy; Wu, Young; Chen, Yudong; Zhu, Jerry; Xie, Qiaomin (August 2024, Reinforcement Learning Journal)

We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal worst-case policy based on linear programming and backward induction. Then, we provide an efficient inception (""planting an idea in someone's mind"") attack algorithm to find the optimal fake reward function within a restricted set of reward functions with dominant strategies. Importantly, our methods exploit the universal assumption of rationality to compute attacks efficiently. Thus, our work exposes a security vulnerability arising from standard game assumptions under misinformation.
more » « less
Full Text Available
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Wu, Young; Mcmahan, Jeremy; Chen, Yiding; Chen, Yudong; Zhu, Jerry; Xie, Qiaomin (July 2024, Proceedings of Machine Learning Research)

We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost.
more » « less
Full Text Available

« Prev Next »

Search for: All records