Abstract Forpractical considerations reinforcement learning has proven to be a difficult task outside of simulation when applied to a physical experiment. Here we derive an optional approach to model free reinforcement learning, achieved entirely online, through careful experimental design and algorithmic decision making. We design a reinforcement learning scheme to implement traditionally episodic algorithms for an unstable 1-dimensional mechanical environment. The training scheme is completely autonomous, requiring no human to be present throughout the learning process. We show that the pseudo-episodic technique allows for additional learning updates with off-policy actor-critic and experience replay methods. We show that including these additional updates between periods of traditional training episodes can improve speed and consistency of learning. Furthermore, we validate the procedure in experimental hardware. In the physical environment, several algorithm variants learned rapidly, each surpassing baseline maximum reward. The algorithms in this research are model free and use only information obtained by an onboard sensor during training.
more »
« less
Enabling Artificial Intelligence Studies in Off-Road Mobility Through Physics-Based Simulation of Multiagent Scenarios
Abstract We describe a simulation environment that enables the design and testing of control policies for off-road mobility of autonomous agents. The environment is demonstrated in conjunction with the training and assessment of a reinforcement learning policy that uses sensor fusion and interagent communication to enable the movement of mixed convoys of human-driven and autonomous vehicles. Policies learned on rigid terrain are shown to transfer to hard (silt-like) and soft (snow-like) deformable terrains. The environment described performs the following: multivehicle multibody dynamics cosimulation in a time/space-coherent infrastructure that relies on the Message Passing Interface standard for low-latency parallel computing; sensor simulation (e.g., camera, GPU, IMU); simulation of a virtual world that can be altered by the agents present in the simulation; training that uses reinforcement learning to “teach” the autonomous vehicles to drive in an obstacle-riddled course. The software stack described is open source. Relevant movies: Project Chrono. Off-road AV simulations, 20202.
more »
« less
- Award ID(s):
- 1835674
- PAR ID:
- 10349474
- Date Published:
- Journal Name:
- Journal of Computational and Nonlinear Dynamics
- Volume:
- 17
- Issue:
- 5
- ISSN:
- 1555-1415
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The deep neural network (DNN) model for computer vision tasks (object detection and classification) is widely used in autonomous vehicles, such as driverless cars and unmanned aerial vehicles. However, DNN models are shown to be vulnerable to adversarial image perturbations. The generation of adversarial examples against inferences of DNNs has been actively studied recently. The generation typically relies on optimizations taking an entire image frame as the decision variable. Hence, given a new image, the computationally expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while taking into account the underlying physical dynamics of autonomous vehicles, their mission, and the environment. The article presents a multi-level reinforcement learning framework that can effectively generate adversarial perturbations to misguide autonomous vehicles’ missions. In the existing image attack methods against autonomous vehicles, optimization steps are repeated for every image frame. This framework removes the need for fully converged optimization at every frame. Using multi-level reinforcement learning, we integrate a state estimator and a generative adversarial network that generates the adversarial perturbations. Due to the reinforcement learning agent consisting of state estimator, actor, and critic that only uses image streams, the proposed framework can misguide the vehicle to increase the adversary’s reward without knowing the states of the vehicle and the environment. Simulation studies and a robot demonstration are provided to validate the proposed framework’s performance.more » « less
-
Soaring birds often rely on ascending thermal plumes in the atmosphere as they search for prey or migrate across large distances. The landscape of convective currents is turbulent and rapidly shifts on timescales of a few minutes as thermals constantly form, disintegrate, or are transported away by the wind. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning can be used to find an effective navigational strategy as a sequence of decisions taken in response to environmental cues. Reinforcement learning was applied to train gliders in the field to autonomously navigate atmospheric thermals. Gliders of two-meter wingspan were equipped with a flight controller that enabled an on-board implementation of autonomous flight policies via precise control over their bank angle and pitch. Learning is severely challenged by a multitude of physical effects and the unpredictability of the natural environment. A navigational strategy was determined solely from the experiences collected over several days in the field using exploratory behavioral policies. Bird-like performance was achieved and several viable biological mechanosensory cues were identified for soaring birds, which are also directly applicable to the development of autonomous soaring vehicles.more » « less
-
Cyber defense exercises are an important avenue to understand the technical capacity of organizations when faced with cyber-threats. Information derived from these exercises often leads to finding unseen methods to exploit vulnerabilities in an organization. These often lead to better defense mechanisms that can counter previously unknown exploits. With recent developments in cyber battle simulation platforms, we can generate a defense exercise environment and train reinforcement learning (RL) based autonomous agents to attack the system described by the simulated environment. In this paper, we describe a two-player game-based RL environment that simultaneously improves the performance of both the attacker and defender agents. We further accelerate the convergence of the RL agents by guiding them with expert knowledge from Cybersecurity Knowledge Graphs on attack and mitigation steps. We have implemented and integrated our proposed approaches into the CyberBattleSim system.more » « less
-
Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data.more » « less
An official website of the United States government

