skip to main content


Title: Verifying Fundamental Solution Groups for Lossless Wave Equations via Stationary Action and Optimal Control
A new optimal control based representation for stationary action trajectories is constructed by exploiting connections between semiconvexity, semiconcavity, and stationarity. This new representation is used to verify a known two-point boundary value problem characterization of stationary action.  more » « less
Award ID(s):
1908918
NSF-PAR ID:
10288226
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Applied Mathematics & Optimization
ISSN:
0095-4616
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A new optimal control based representation for stationary action trajectories is constructed by exploiting connections between semiconvexity, semiconcavity, and stationarity. This new representation is used to verify a known two-point boundary value problem characterization of stationary action. 
    more » « less
  2. Krause, Andreas and (Ed.)
    General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation. 
    more » « less
  3. Abstract

    The Community Earth System Model 2 (CESM2) is the latest Earth System Model developed by the National Center for Atmospheric Research in collaboration with the university community and is significantly advanced in most components compared to its predecessor (CESM1). Here, CESM2's representation of the large‐scale atmospheric circulation and its variability is assessed. Further context is providedthrough comparison to the CESM1 large ensemble and other models from the Coupled Model Intercomparison Project (CMIP5 and CMIP6). This includes an assessment of the representation of jet streams and storm tracks, stationary waves, the global divergent circulation, the annular modes, the North Atlantic Oscillation, and blocking. Compared to CESM1, CESM2 is substantially improved in the representation of the storm tracks, Northern Hemisphere (NH) stationary waves, NH winter blocking and the global divergent circulation. It ranks within the top 10% of CMIP class models in many of these features. Some features of the Southern Hemisphere (SH) circulation have degraded, such as the SH jet strength, stationary waves, and blocking, although the SH jet stream is placed at approximately the correct location. This analysis also highlights systematic deficiencies in these features across the new CMIP6 archive, such as the continued tendency for the SH jet stream to be placed too far equatorward, the North Atlantic westerlies to be too strong over Europe, the storm tracks as measured by low‐level meridional wind variance to be too weak and a lack of blocking in the North Atlantic sector.

     
    more » « less
  4. We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts’ for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DRoIL) and establishes a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and non-stationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the significant benefits of DRoIL’s new optimization method on synthetic data and a highway driving environment. 
    more » « less
  5. We consider the problem of offline reinforcement learning (RL) -- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose Off-Policy Double Variance Reduction (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an ϵ-optimal policy with O˜(H2/dmϵ2) episodes of offline data in the finite-horizon stationary transition setting, where H is the horizon length and dm is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best known upper bound by a factor of H. Moreover, we establish an information-theoretic lower bound of Ω(H2/dmϵ2) which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards. 
    more » « less