skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement-Learning-Based Risk-Sensitive Optimal Feedback Mechanisms of Biological Motor Control
Risk sensitivity is a fundamental aspect of biological motor control that accounts for both the expectation and variability of movement cost in the face of uncertainty. However, most computational models of biological motor control rely on model-based risk-sensitive optimal control, which requires an accurate internal representation in the central neural system to predict the outcomes of motor commands. In reality, the dynamics of human-environment interaction is too complex to be accurately modeled, and noise further complicates system identification. To address this issue, this paper proposes a novel risk-sensitive computational mechanism for biological motor control based on reinforcement learning (RL) and adaptive dynamic programming (ADP). The proposed ADP-based mechanism suggests that humans can directly learn an approximation of the risk-sensitive optimal feedback controller from noisy sensory data without the need for system identification. Numerical validation of the proposed mechanism is conducted on the arm-reaching task under divergent force field. The preliminary computational results align with the experimental observations from the past literature of computational neuroscience.  more » « less
Award ID(s):
2210320
PAR ID:
10513247
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-0124-3
Page Range / eLocation ID:
7944 to 7949
Subject(s) / Keyword(s):
motor control learning-based control Risk-Sensitive Optimal Control
Format(s):
Medium: X
Location:
Singapore, Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. Sensorimotor tasks that humans perform are often affected by different sources of uncertainty. Nevertheless, the central nervous system (CNS) can gracefully coordinate our movements. Most learning frameworks rely on the internal model principle, which requires a precise internal representation in the CNS to predict the outcomes of our motor commands. However, learning a perfect internal model in a complex environment over a short period of time is a nontrivial problem. Indeed, achieving proficient motor skills may require years of training for some difficult tasks. Internal models alone may not be adequate to explain the motor adaptation behavior during the early phase of learning. Recent studies investigating the active regulation of motor variability, the presence of suboptimal inference, and model-free learning have challenged some of the traditional viewpoints on the sensorimotor learning mechanism. As a result, it may be necessary to develop a computational framework that can account for these new phenomena. Here, we develop a novel theory of motor learning, based on model-free adaptive optimal control, which can bypass some of the difficulties in existing theories. This new theory is based on our recently developed adaptive dynamic programming (ADP) and robust ADP (RADP) methods and is especially useful for accounting for motor learning behavior when an internal model is inaccurate or unavailable. Our preliminary computational results are in line with experimental observations reported in the literature and can account for some phenomena that are inexplicable using existing models. 
    more » « less
  2. Noises are ubiquitous in sensorimotor interactions and contaminate the information provided to the central nervous system (CNS) for motor learning. An interesting question is how the CNS manages motor learning with imprecise information. Integrating ideas from reinforcement learning and adaptive optimal control, this paper develops a novel computational mechanism to explain the robustness of human motor learning to the imprecise information, caused by control-dependent noise that exists inherently in the sensorimotor systems. Starting from an initial admissible control policy, in each learning trial the mechanism collects and uses the noisy sensory data (caused by the control-dependent noise) to form an imprecise evaluation of the performance of the current policy and then constructs an updated policy based on the imprecise evaluation. As the number of learning trials increases, the generated policies mathematically provably converge to a (potentially small) neighborhood of the optimal policy under mild conditions, despite the imprecise information in the learning process. The mechanism directly synthesizes the policies from the sensory data, without identifying an internal forward model. Our preliminary computational results on two classic arm reaching tasks are in line with experimental observations reported in the literature. The model-free control principle proposed in the paper sheds more lights into the inherent robustness of human sensorimotor systems to the imprecise information, especially control-dependent noise, in the CNS. 
    more » « less
  3. This paper studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy ADP algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method. 
    more » « less
  4. An interior permanent magnet (IPM) motor is a prime electric motor used in electric vehicles (EVs), robots, and electric drones. In these applications, maximum torque per ampere (MTPA), flux-weakening, and maximum torque per volt (MTPV) techniques play a critical role in the efficient and reliable torque control of an IPM motor. Although several approaches have been proposed and developed for this purpose, each has its specific limitations. The objective of this paper is to develop a neural network (NN) method to determine MTPA, flux-weakening, and MTPV operating points for the most efficient torque control of the motor over its full speed range. The NN is trained offline by using the Levenberg-Marquardt backpropagation algorithm, which avoids the disadvantages associated with online NN training. A cloud computing system is proposed for routine offline NN training, which enables the lifetime adaptivity and learning capabilities of the offline-trained NN and overcomes the computational challenges related to online NN training. In addition, for the proposed NN mechanism, training data are collected and stored in a highly random manner, which makes it much more feasible and efficient to implement lifetime adaptivity than any other methods. The proposed method is evaluated via both simulation and hardware experiments, which shows the great performance of the NN-based MTPA, MTPV, and flux-weakening control for an IPM motor over its full speed range. Overall, the proposed method can achieve a fast and accurate current reference generation with a simple NN structure, for optimal torque control of an IPM motor. 
    more » « less
  5. The majority of the past research dealing with lane-changing controller design of autonomous vehicles (𝐴𝑉 s) is based on the assumption of full knowledge of the model dynamics of the 𝐴𝑉 and the surrounding vehicles. However, in the real world, this is not a very realistic assumption as accurate dynamic models are difficult to obtain. Also, the dynamic model parameters might change over time due to various factors. Thus, there is a need for a learning-based lane change controller design methodology that can learn the optimal control policy in real time using sensor data. In this paper, we have addressed this need by introducing an optimal learningbased control methodology that can solve the real-time lane-changing problem of 𝐴𝑉 s, where the input-state data of the 𝐴𝑉 is utilized to generate a near-optimal lane-changing controller by approximate/adaptive dynamic programming (ADP) technique. In the case of this type of complex lane-changing maneuver, the lateral dynamics depend on the longitudinal velocity of the vehicle. If the longitudinal velocity is assumed constant, a linear parameter invariant model can be used. However, assuming constant velocity while performing a lane-changing maneuver is not a realistic assumption. This assumption might increase the risk of accidents, especially in the case of lane abortion when the surrounding vehicles are not cooperative. Thus, in this paper, the dynamics of the 𝐴𝑉 are assumed to be a linear parameter-varying system. Thus we have two challenges for the lane-changing controller design: parameter-varying, and unknown dynamics. With the help of both gain scheduling and ADP techniques combined, a learning-based control algorithm that can generate a near-optimal lane-changing controller without having to know the accurate dynamic model of the 𝐴𝑉 is proposed. The inclusion of a gain scheduling approach with ADP makes the controller applicable to non-linear and/or parameter-varying 𝐴𝑉 dynamics. The stability of the learning-based gain scheduling controller has also been rigorously proved. Moreover, a data-driven lane-changing decision-making algorithm is introduced that can make the 𝐴𝑉 perform a lane abortion if safety conditions are violated during a lane change. Finally, the proposed learning-based gain scheduling controller design algorithm and the lane-changing decision-making methodology are numerically validated using MATLAB, SUMO simulations, and the NGSIM dataset. 
    more » « less