skip to main content


Search for: All records

Creators/Authors contains: "Yang, Jing"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 7, 2025
  2. Free, publicly-accessible full text available April 1, 2025
  3. Free, publicly-accessible full text available March 1, 2025
  4. Free, publicly-accessible full text available February 1, 2025
  5. Free, publicly-accessible full text available December 10, 2024
  6. Free, publicly-accessible full text available August 30, 2024
  7. Free, publicly-accessible full text available June 25, 2024
  8. Krause, Andreas and (Ed.)
    General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation. 
    more » « less
  9. Free, publicly-accessible full text available August 1, 2024