skip to main content


Title: Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking
A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes.  more » « less
Award ID(s):
1739800
NSF-PAR ID:
10112280
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Biomimetics
Volume:
4
Issue:
1
ISSN:
2313-7673
Page Range / eLocation ID:
28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A control system for simulated two-dimensional bipedal walking was developed. The biped model was built based on anthropometric data. At the core of the control is a Deep Deterministic Policy Gradients (DDPG) neural network that is trained in GAZEBO, a physics simulator, to predict the ideal foot location to maintain stable walking under external impulse load. Additional controllers for hip joint movement during stance phase, and ankle joint torque during toeoff, help to stabilize the robot during walking. The simulated robot can walk at a steady pace of approximately 1m/s, and during locomotion it can maintain stability with a 30N-s impulse applied at the torso. This work implement DDPG algorithm to solve biped walking control problem. The complexity of DDPG network is decreased through carefully selected state variables and distributed control system. 
    more » « less
  2. null (Ed.)
    Our group is developing a cyber-physical walking system (CPWS) for people paralyzed by spinal cord injuries (SCI). The current CPWS consists of a functional neuromuscular stimulation (FNS) system and a powered lower-limb exoskeleton for walking with leg movements in the sagittal plane. We are developing neural control systems that learn to assist the user of this CPWS to walk with stability. In a previous publication (Liu et al., Biomimetics, 2019, 4, 28), we showed a neural controller that stabilized a simulated biped in the sagittal plane. We are considering adding degrees of freedom to the CPWS to allow more natural walking movements and improved stability. Thus, in this paper, we present a new neural network enhanced control system that stabilizes a three-dimensional simulated biped model of a human wearing an exoskeleton. Results show that it stabilizes human/exoskeleton models and is robust to impact disturbances. The simulated biped walks at a steady pace in a range of typical human ambulatory speeds from 0.7 to 1.3 m/s, follows waypoints at a precision of 0.3 m, remains stable, and continues walking forward despite impact disturbances and adapts its speed to compensate for persistent external disturbances. Furthermore, the neural network controller stabilizes human models of different statures from 1.4 to 2.2 m tall without any changes to the control parameters. Please see videos at the following link: 3D biped walking control . 
    more » « less
  3. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less
  4. We propose a locomotion framework for bipedal robots consisting of a new motion planning method, dubbed trajectory optimization for walking robots plus (TOWR+), and a new whole-body control method, dubbed implicit hierarchical whole-body controller (IHWBC). For versatility, we consider the use of a composite rigid body (CRB) model to optimize the robot’s walking behavior. The proposed CRB model considers the floating base dynamics while accounting for the effects of the heavy distal mass of humanoids using a pre-trained centroidal inertia network. TOWR+ leverages the phase-based parameterization of its precursor, TOWR, and optimizes for base and end-effectors motions, feet contact wrenches, as well as contact timing and locations without the need to solve a complementary problem or integer program. The use of IHWBC enforces unilateral contact constraints (i.e., non-slip and non-penetration constraints) and a task hierarchy through the cost function, relaxing contact constraints and providing an implicit hierarchy between tasks. This controller provides additional flexibility and smooth task and contact transitions as applied to our 10 degree-of-freedom, line-feet biped robot DRACO. In addition, we introduce a new open-source and light-weight software architecture, dubbed planning and control (PnC), that implements and combines TOWR+ and IHWBC. PnC provides modularity, versatility, and scalability so that the provided modules can be interchanged with other motion planners and whole-body controllers and tested in an end-to-end manner. In the experimental section, we first analyze the performance of TOWR+ using various bipeds. We then demonstrate balancing behaviors on the DRACO hardware using the proposed IHWBC method. Finally, we integrate TOWR+ and IHWBC and demonstrate step-and-stop behaviors on the DRACO hardware. 
    more » « less
  5. Abstract For many planar bipedal models, each step is divided into a finite time single support period and an instantaneous double support period. During single support, the biped is typically underactuated and thus has limited ability to reject disturbances. The instantaneous nature of the double support period prevents nonimpulsive control during this period. However, if the double support period is expanded to finite time, it becomes overactuated. While it has been hypothesized that this overactuation during a finite-time double support period may improve disturbance rejection capabilities, this has not yet been tested. This paper presents a refined biped model by developing a finite-time, adaptive double support controller capable of handling the overactuation and limiting slip. Using simulations, we quantify the disturbance rejection capabilities of this controller and directly compare them to a typical, instantaneous double support model for a range of gait speeds and perturbations. We find that the finite-time double support controller increased the walking stability of the biped in approximately half of the cases, indicating that a finite-time double support period does not automatically increase disturbance rejection capabilities. We also find that the timing and magnitude of the perturbation can affect if a finite-time double support period enhances stability. Finally, we demonstrate that the adaptive controller reduces slipping. 
    more » « less