skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control
This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world. The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot’s I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.  more » « less
Award ID(s):
1944722
PAR ID:
10551271
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
The International Journal of Robotics Research
ISSN:
0278-3649
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions. 
    more » « less
  2. Abstract Bipedal locomotion over compliant terrain is an important and largely underexplored problem in the robotics community. Although robot walking has been achieved on some non-rigid surfaces with existing control methodologies, there is a need for a systematic framework applicable to different bipeds that enables stable locomotion over various compliant terrains. In this work, a novel energy-based framework is proposed that allows the dynamic locomotion of bipeds across a wide range of compliant surfaces. The proposed framework utilizes an extended version of the 3D dual spring-loaded inverted pendulum (Dual-SLIP) model that supports compliant terrains, while a bio-inspired controller is employed to regulate expected perturbations of extremely low ground-stiffness levels. An energy-based methodology is introduced for tuning the bio-inspired controller to enable dynamic walking with robustness to a wide range of low ground-stiffness one-step perturbations. The proposed system and controller are shown to mimic the vertical ground reaction force (GRF) responses observed in human walking over compliant terrains. Moreover, they succeed in handling repeated unilateral stiffness perturbations under specific conditions. This work can advance the field of biped locomotion by providing a biomimetic method for generating stable human-like walking trajectories for bipedal robots over various compliant surfaces. Furthermore, the concepts of the proposed framework could be incorporated into the design of controllers for lower-limb prostheses with adjustable stiffness to improve their robustness over compliant surfaces. 
    more » « less
  3. null (Ed.)
    This paper systematically decomposes a quadrupedal robot into bipeds to rapidly generate walking gaits and then recomposes these gaits to obtain quadrupedal locomotion. We begin by decomposing the full-order, nonlinear and hybrid dynamics of a three-dimensional quadrupedal robot, including its continuous and discrete dynamics, into two bipedal systems that are subject to external forces. Using the hybrid zero dynamics (HZD) framework, gaits for these bipedal robots can be rapidly generated (on the order of seconds) along with corresponding controllers. The decomposition is achieved in such a way that the bipedal walking gaits and controllers can be composed to yield dynamic walking gaits for the original quadrupedal robot - the result is the rapid generation of dynamic quadruped gaits utilizing the full-order dynamics. This methodology is demonstrated through the rapid generation (3.96 seconds on average) of four stepping-in-place gaits and one diagonally symmetric ambling gait at 0.35 m/s on a quadrupedal robot - the Vision 60, with 36 state variables and 12 control inputs - both in simulation and through outdoor experiments. This suggested a new approach for fast quadrupedal trajectory planning using full-body dynamics, without the need for empirical model simplification, wherein methods from dynamic bipedal walking can be directly applied to quadrupeds. 
    more » « less
  4. null (Ed.)
    Can we design motion primitives for complex legged systems uniformly for different terrain types without neglecting modeling details? This paper presents a method for rapidly generating quadrupedal locomotion on sloped terrains-from modeling to gait generation, to hardware demonstration. At the core of this approach is the observation that a quadrupedal robot can be exactly decomposed into coupled bipedal robots. Formally, this is represented through the framework of coupled control systems, wherein isolated subsystems interact through coupling constraints. We demonstrate this concept in the context of quadrupeds and use it to reduce the gait planning problem for uneven terrains to bipedal walking generation via hybrid zero dynamics. This reduction method allows for the formulation of a nonlinear optimization problem that leverages low-dimensional bipedal representations to generate dynamic walking gaits on slopes for the full-order quadrupedal robot dynamics. The result is the ability to rapidly generate quadrupedal walking gaits on a variety of slopes. We demonstrate these walking behaviors on the Vision 60 quadrupedal robot; in simulation, via walking on a range of sloped terrains of 13°, 15°, 20°, 25°, and, experimentally, through the successful locomotion of 13° and 20° ~ 25° sloped outdoor grasslands. 
    more » « less
  5. We propose a locomotion framework for bipedal robots consisting of a new motion planning method, dubbed trajectory optimization for walking robots plus (TOWR+), and a new whole-body control method, dubbed implicit hierarchical whole-body controller (IHWBC). For versatility, we consider the use of a composite rigid body (CRB) model to optimize the robot’s walking behavior. The proposed CRB model considers the floating base dynamics while accounting for the effects of the heavy distal mass of humanoids using a pre-trained centroidal inertia network. TOWR+ leverages the phase-based parameterization of its precursor, TOWR, and optimizes for base and end-effectors motions, feet contact wrenches, as well as contact timing and locations without the need to solve a complementary problem or integer program. The use of IHWBC enforces unilateral contact constraints (i.e., non-slip and non-penetration constraints) and a task hierarchy through the cost function, relaxing contact constraints and providing an implicit hierarchy between tasks. This controller provides additional flexibility and smooth task and contact transitions as applied to our 10 degree-of-freedom, line-feet biped robot DRACO. In addition, we introduce a new open-source and light-weight software architecture, dubbed planning and control (PnC), that implements and combines TOWR+ and IHWBC. PnC provides modularity, versatility, and scalability so that the provided modules can be interchanged with other motion planners and whole-body controllers and tested in an end-to-end manner. In the experimental section, we first analyze the performance of TOWR+ using various bipeds. We then demonstrate balancing behaviors on the DRACO hardware using the proposed IHWBC method. Finally, we integrate TOWR+ and IHWBC and demonstrate step-and-stop behaviors on the DRACO hardware. 
    more » « less