skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning
A key challenge in solving the deterministic inverse reinforcement learning problem online and in real-time is the existence of non-unique solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer is developed to generate solutions that are approximately equivalent. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.  more » « less
Award ID(s):
1925147 2027999
PAR ID:
10477506
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-2806-6
Page Range / eLocation ID:
3989 to 3994
Format(s):
Medium: X
Location:
San Diego, CA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. A deep neural network (DNN)-based adaptive controller with a real-time and concurrent learning (CL)-based adaptive update law is developed for a class of uncertain, nonlinear dynamic systems. The DNN in the control law is used to approximate the uncertain nonlinear dynamic model. The inner-layer weights of the DNN are updated offline using data collected in real-time; whereas, the output-layer DNN weights are updated online (i.e., in real-time) using the Lyapunov- and CL-based adaptation law. Specifically, the inner-layer weights of the DNN are trained offline (concurrent to real-time execution) after a sufficient amount of data is collected in real-time to improve the performance of the system, and after training is completed the inner-layer DNN weights are updated in batch-updates. The key development in this work is that the output-layer DNN update law is augmented with CL-based terms to ensure that the output-layer DNN weight estimates converge to within a ball of their optimal values. A Lyapunov-based stability analysis is performed to ensure semi-global exponential convergence to an ultimate bound for the trajectory tracking errors and the output-layer DNN weight estimation errors. 
    more » « less
  2. Summary Path planning is a fundamental and critical task in many robotic applications. For energy‐constrained robot platforms, path planning solutions are desired with minimum time arrivals and minimal energy consumption. Uncertain environments, such as wind conditions, pose challenges to the design of effective minimum time‐energy path planning solutions. In this article, we develop a minimum time‐energy path planning solution in continuous state and control input spaces using integral reinforcement learning (IRL). To provide a baseline solution for the performance evaluation of the proposed solution, we first develop a theoretical analysis for the minimum time‐energy path planning problem in a known environment using the Pontryagin's minimum principle. We then provide an online adaptive solution in an unknown environment using IRL. This is done through transforming the minimum time‐energy problem to an approximate minimum time‐energy problem and then developing an IRL‐based optimal control strategy. Convergence of the IRL‐based optimal control strategy is proven. Simulation studies are developed to compare the theoretical analysis and the proposed IRL‐based algorithm. 
    more » « less
  3. This paper proposes barrier functions for the study of forward invariance in hybrid systems modeled by hybrid inclusions. After introducing an appropriate notion of a barrier function, we propose sufficient conditions to guarantee forward invariance properties of a set for hybrid systems with nonuniqueness of solutions, solutions terminating prematurely, and Zeno solutions. Our conditions involve infinitesimal conditions on the barrier certificate and Minkowski functionals. Examples illustrate the results. 
    more » « less
  4. We investigate online network topology identification from smooth nodal observations acquired in a streaming fashion. Different from non-adaptive batch solutions, our distinctive goal is to track the (possibly) dynamic adjacency matrix with affordable memory and computational costs by processing signal snapshots online. To this end, we leverage and truncate dual-based proximal gradient (DPG) iterations to solve a composite smoothness-regularized, time-varying inverse problem. Numerical tests with synthetic and real electrocorticography data showcase the effectiveness of the novel lightweight iterations when it comes to tracking slowly-varying network connectivity. We also show that the online DPG algorithm converges faster than a primal-based baseline of comparable complexity. Aligned with reproducible research practices, we share the code developed to produce all figures included in this paper. 
    more » « less
  5. null (Ed.)
    TUNERCAR is a toolchain that jointly optimizes racing strategy, planning methods, control algorithms, and vehicle parameters for an autonomous racecar. In this paper, we detail the target hardware, software, simulators, and systems infrastructure for this toolchain. Our methodology employs a parallel implementation of CMA-ES which enables simulations to proceed 6 times faster than real-world rollouts. We show our approach can reduce the lap times in autonomous racing, given a fixed computational budget. For all tested tracks, our method provides the lowest lap time, and relative improvements in lap time between 7-21%. We demonstrate improvements over a naive random search method with equivalent computational budget of over 15 seconds/lap, and improvements over expert solutions of over 2 seconds/lap. We further compare the performance of our method against hand-tuned solutions submitted by over 30 international teams, comprised of graduate students working in the field of autonomous vehicles. Finally, we discuss the effectiveness of utilizing an online planning mechanism to reduce the reality gap between our simulation and actual tests. 
    more » « less