skip to main content


Title: Neural Networks Trained via Reinforcement Learning Stabilize Walking of a Three-Dimensional Biped Model With Exoskeleton Applications
Our group is developing a cyber-physical walking system (CPWS) for people paralyzed by spinal cord injuries (SCI). The current CPWS consists of a functional neuromuscular stimulation (FNS) system and a powered lower-limb exoskeleton for walking with leg movements in the sagittal plane. We are developing neural control systems that learn to assist the user of this CPWS to walk with stability. In a previous publication (Liu et al., Biomimetics, 2019, 4, 28), we showed a neural controller that stabilized a simulated biped in the sagittal plane. We are considering adding degrees of freedom to the CPWS to allow more natural walking movements and improved stability. Thus, in this paper, we present a new neural network enhanced control system that stabilizes a three-dimensional simulated biped model of a human wearing an exoskeleton. Results show that it stabilizes human/exoskeleton models and is robust to impact disturbances. The simulated biped walks at a steady pace in a range of typical human ambulatory speeds from 0.7 to 1.3 m/s, follows waypoints at a precision of 0.3 m, remains stable, and continues walking forward despite impact disturbances and adapts its speed to compensate for persistent external disturbances. Furthermore, the neural network controller stabilizes human models of different statures from 1.4 to 2.2 m tall without any changes to the control parameters. Please see videos at the following link: 3D biped walking control .  more » « less
Award ID(s):
1739800
NSF-PAR ID:
10292968
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Frontiers in Robotics and AI
Volume:
8
ISSN:
2296-9144
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes. 
    more » « less
  2. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less
  3. Task-invariant feedback control laws for powered exoskeletons are preferred to assist human users across varying locomotor activities. This goal can be achieved with energy shaping methods, where certain nonlinear partial differential equations, i.e., matching conditions, must be satisfied to find the achievable dynamics. Based on the energy shaping methods, open-loop systems can be mapped to closed-loop systems with a desired analytical expression of energy. In this paper, the desired energy consists of modified potential energy that is well-defined and unified across different contact conditions along with the energy of virtual springs and dampers that improve energy recycling during walking. The human-exoskeleton system achieves the input-output passivity and Lyapunov stability during the whole walking period with the proposed method. The corresponding controller provides assistive torques that closely match the human torques of a simulated biped model and able-bodied human subjects’ data. 
    more » « less
  4. Abstract For many planar bipedal models, each step is divided into a finite time single support period and an instantaneous double support period. During single support, the biped is typically underactuated and thus has limited ability to reject disturbances. The instantaneous nature of the double support period prevents nonimpulsive control during this period. However, if the double support period is expanded to finite time, it becomes overactuated. While it has been hypothesized that this overactuation during a finite-time double support period may improve disturbance rejection capabilities, this has not yet been tested. This paper presents a refined biped model by developing a finite-time, adaptive double support controller capable of handling the overactuation and limiting slip. Using simulations, we quantify the disturbance rejection capabilities of this controller and directly compare them to a typical, instantaneous double support model for a range of gait speeds and perturbations. We find that the finite-time double support controller increased the walking stability of the biped in approximately half of the cases, indicating that a finite-time double support period does not automatically increase disturbance rejection capabilities. We also find that the timing and magnitude of the perturbation can affect if a finite-time double support period enhances stability. Finally, we demonstrate that the adaptive controller reduces slipping. 
    more » « less
  5. A control system for simulated two-dimensional bipedal walking was developed. The biped model was built based on anthropometric data. At the core of the control is a Deep Deterministic Policy Gradients (DDPG) neural network that is trained in GAZEBO, a physics simulator, to predict the ideal foot location to maintain stable walking under external impulse load. Additional controllers for hip joint movement during stance phase, and ankle joint torque during toeoff, help to stabilize the robot during walking. The simulated robot can walk at a steady pace of approximately 1m/s, and during locomotion it can maintain stability with a 30N-s impulse applied at the torso. This work implement DDPG algorithm to solve biped walking control problem. The complexity of DDPG network is decreased through carefully selected state variables and distributed control system. 
    more » « less