NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Uncertainty quantification for Markov chains with application to temporal difference learning

Wu, Weichen; Wei, Yuting; Rinaldo, Alessandro (February 2025, stat.ml)

Markov chains are fundamental to statistical machine learning, underpinning key methodologies such as Markov Chain Monte Carlo (MCMC) sampling and temporal difference (TD) learning in reinforcement learning (RL). Given their widespread use, it is crucial to establish rigorous probabilistic guarantees on their convergence, uncertainty, and stability. In this work, we develop novel, high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains, addressing key limitations in existing theoretical tools for handling dependent data. We leverage these results to analyze the TD learning algorithm, a widely used method for policy evaluation in RL. Our analysis yields a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish a O(T−14logT) distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. These findings provide new insights into statistical inference for RL algorithms, bridging the gaps between classical stochastic approximation theory and modern reinforcement learning applications.
more » « less
Free, publicly-accessible full text available February 20, 2026
Hybrid reinforcement learning breaks sample size barriers in linear MDPs

Tan, Kevin; Fan, Wei; Wei, Yuting (December 2024, Neural Information Processing Systems)

Full Text Available
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Yang, Tong; Cen, Shicong; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (December 2024, 38th Conference on Neural Information Processing Systems)

Full Text Available
Theoretical insights for diffusion guidance: A case study for Gaussian mixture models

Wu, Yuchen; Chen, Minshuo; Li, Zihao; Wang, Mengdi; Wei, Yuting (July 2024, International Conference on Machine Learning)
Towards Non-Asymptotic Convergence for Diffusion-Based Generative Models

Li, Gen; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (May 2024, International Conference on Learning Representations)
Accelerating convergence of score-based diffusion models, provably

Li, Gen; Huang, Yu; Efimov, Timofey; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (July 2024, International Conference on Machine Learning)
Accelerating Convergence of Score-Based Diffusion Models, Provably

Li, Gen; Huang, Yu; Efimov, Timofiv; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (July 2024, International Conference on Machine Learning)
Towards Non-Asymptotic Convergence for Diffusion-Based Generative Models

Li, Gen; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (May 2024, The Twelfth International Conference on Learning Representations)

Full Text Available
Debiasing Evaluations That Are Biased by Evaluations

Wang, Jingyan; Stelmakh, Ivan; Wei, Yuting; Shah, Nihar (February 2024, Journal of machine learning research)

It is common to evaluate a set of items by soliciting people to rate them. For example, universities ask students to rate the teaching quality of their instructors, and conference organizers ask authors of submissions to evaluate the quality of the reviews. However, in these applications, students often give a higher rating to a course if they receive higher grades in a course, and authors often give a higher rating to the reviews if their papers are accepted to the conference. In this work, we call these external factors the" outcome" experienced by people, and consider the problem of mitigating these outcome-induced biases in the given ratings when some information about the outcome is available. We formulate the information about the outcome as a known partial ordering on the bias. We propose a debiasing method by solving a regularized optimization problem under this ordering constraint, and also provide a carefully designed cross-validation method that adaptively chooses the appropriate amount of regularization. We provide theoretical guarantees on the performance of our algorithm, as well as experimental evaluations.
more » « less
Full Text Available
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Cen, Shicong; Wei, Yuting; Chi, Yuejie (January 2024, Journal of machine learning research)

Full Text Available

« Prev Next »

Search for: All records