skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "NA"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Diffusion policies have achieved superior performance in imitation learning and offline reinforcement learning (RL) due to their rich expressiveness. However, the conventional diffusion training procedure requires samples from target distribution, which is impossible in online RL since we cannot sample from the optimal policy. Backpropagating policy gradient through the diffusion process incurs huge computational costs and instability, thus being expensive and not scalable. To enable efficient training of diffusion policies in online RL, we generalize the conventional denoising score matching by reweighting the loss function. The resulting Reweighted Score Matching (RSM) preserves the optimal solution and low computational cost of denoising score matching, while eliminating the need to sample from the target distribution and allowing learning to optimize value functions. We introduce two tractable reweighted loss functions to solve two commonly used policy optimization problems, policy mirror descent and max-entropy policy, resulting in two practical algorithms named Diffusion Policy Mirror Descent (DPMD) and Soft Diffusion Actor-Critic (SDAC). We conducted comprehensive comparisons on MuJoCo benchmarks. The empirical results show that the proposed algorithms outperform recent diffusion-policy online RLs on most tasks, and the DPMD improves more than 120% over soft actor-critic on Humanoid and Ant. 
    more » « less
    Free, publicly-accessible full text available July 13, 2026
  2. Ozay, Necmiye; Balzano, Laura; Panagou, Dimitra; Abate, Alessandro (Ed.)
    The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from computation issues that obstruct their real-world implementation. In this paper, we consider MDPs with low-rank structures, where the transition kernel can be written as a linear product of feature map and factors. We introduce *duple perturbation* robustness, i.e. perturbation on both the feature map and the factors, via a novel characterization of (𝜉,𝜂) -ambiguity sets featuring computational efficiency. Our novel low-rank robust MDP formulation is compatible with the low-rank function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Lastly, the robustness of our proposed approach is justified by numerical experiments, including classical control tasks with continuous state-action spaces. 
    more » « less
    Free, publicly-accessible full text available June 4, 2026
  3. Free, publicly-accessible full text available June 1, 2026
  4. Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)
    Free, publicly-accessible full text available May 3, 2026
  5. Free, publicly-accessible full text available April 26, 2026
  6. Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)
    Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks. 
    more » « less
    Free, publicly-accessible full text available May 3, 2026
  7. Free, publicly-accessible full text available May 23, 2026
  8. Despite the modeling power for problems under uncertainty, robust optimization (RO) and adaptive RO (ARO) can exhibit too conservative solutions in terms of objective value degradation compared with the nominal case. One of the main reasons behind this conservatism is that, in many practical applications, uncertain constraints are directly designed as constraint-wise without taking into account couplings over multiple constraints. In this paper, we define a coupled uncertainty set as the intersection between a constraint-wise uncertainty set and a coupling set. We study the benefit of coupling in alleviating conservatism in RO and ARO. We provide theoretical tight and computable upper and lower bounds on the objective value improvement of RO and ARO problems under coupled uncertainty over constraint-wise uncertainty. In addition, we relate the power of adaptability over static solutions with the coupling of uncertainty set. Computational results demonstrate the benefit of coupling in applications. Funding: I. Wang was supported by the NSF CAREER Award [ECCS 2239771] and Wallace Memorial Honorific Fellowship from Princeton University. B. Stellato was supported by the NSF CAREER Award [ECCS 2239771]. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  9. Abstract We consider the Cauchy problem for the logarithmically singular surface quasi-geostrophic (SQG) equation, introduced by Ohkitani,$$\begin{aligned} \begin{aligned} \partial _t \theta - \nabla ^\perp \log (10+(-\Delta )^{\frac{1}{2}})\theta \cdot \nabla \theta = 0, \end{aligned} \end{aligned}$$ t θ - log ( 10 + ( - Δ ) 1 2 ) θ · θ = 0 , and establish local existence and uniqueness of smooth solutions in the scale of Sobolev spaces with exponent decreasing with time. Such a decrease of the Sobolev exponent is necessary, as we have shown in the companion paper (Chae et al. in Illposedness via degenerate dispersion for generalized surface quasi-geostrophic equations with singular velocities,arXiv:2308.02120) that the problem is strongly ill-posed in any fixed Sobolev spaces. The time dependence of the Sobolev exponent can be removed when there is a dissipation term strictly stronger than log. These results improve wellposedness statements by Chae et al. (Comm Pure Appl Math 65(8):1037–1066, 2012). This well-posedness result can be applied to describe the long-time dynamics of the$$\delta $$ δ -SQG equations, defined by$$\begin{aligned} \begin{aligned} \partial _t \theta + \nabla ^\perp (10+(-\Delta )^{\frac{1}{2}})^{-\delta }\theta \cdot \nabla \theta = 0, \end{aligned} \end{aligned}$$ t θ + ( 10 + ( - Δ ) 1 2 ) - δ θ · θ = 0 , for all sufficiently small$$\delta >0$$ δ > 0 depending on the size of the initial data. For the same range of$$\delta $$ δ , we establish global well-posedness of smooth solutions to the dissipative SQG equations. 
    more » « less
    Free, publicly-accessible full text available April 1, 2026
  10. Free, publicly-accessible full text available February 26, 2026