skip to main content


This content will become publicly available on October 1, 2024

Title: Template Model Inspired Task Space Learning for Robust Bipedal Locomotion
This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions.  more » « less
Award ID(s):
2144156
NSF-PAR ID:
10496139
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
8582 to 8589
Format(s):
Medium: X
Location:
Detroit, MI, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Controller design for bipedal walking on dynamic rigid surfaces (DRSes), which are rigid surfaces moving in the inertial frame (e.g., ships and airplanes), remains largely underexplored. This paper introduces a hierarchical control approach that achieves stable underactuated bipedal walking on a horizontally oscillating DRS. The highest layer of our approach is a real-time motion planner that generates desired global behaviors (i.e., center of mass trajectories and footstep locations) by stabilizing a reduced-order robot model. One key novelty of this layer is the derivation of the reduced-order model by analytically extending the angular momentum based linear inverted pendulum (ALIP) model from stationary to horizontally moving surfaces. The other novelty is the development of a discrete-time foot-placement controller that exponentially stabilizes the hybrid, linear, time-varying ALIP. The middle layer translates the desired global behaviors into the robot’s full-body reference trajectories for all directly actuated degrees of freedom, while the lowest layer exponentially tracks those reference trajectories based on the full-order, hybrid, nonlinear robot model. Simulations confirm that the proposed framework ensures stable walking of a planar underactuated biped under different swaying DRS motions and gait types. 
    more » « less
  2. We propose a locomotion framework for bipedal robots consisting of a new motion planning method, dubbed trajectory optimization for walking robots plus (TOWR+), and a new whole-body control method, dubbed implicit hierarchical whole-body controller (IHWBC). For versatility, we consider the use of a composite rigid body (CRB) model to optimize the robot’s walking behavior. The proposed CRB model considers the floating base dynamics while accounting for the effects of the heavy distal mass of humanoids using a pre-trained centroidal inertia network. TOWR+ leverages the phase-based parameterization of its precursor, TOWR, and optimizes for base and end-effectors motions, feet contact wrenches, as well as contact timing and locations without the need to solve a complementary problem or integer program. The use of IHWBC enforces unilateral contact constraints (i.e., non-slip and non-penetration constraints) and a task hierarchy through the cost function, relaxing contact constraints and providing an implicit hierarchy between tasks. This controller provides additional flexibility and smooth task and contact transitions as applied to our 10 degree-of-freedom, line-feet biped robot DRACO. In addition, we introduce a new open-source and light-weight software architecture, dubbed planning and control (PnC), that implements and combines TOWR+ and IHWBC. PnC provides modularity, versatility, and scalability so that the provided modules can be interchanged with other motion planners and whole-body controllers and tested in an end-to-end manner. In the experimental section, we first analyze the performance of TOWR+ using various bipeds. We then demonstrate balancing behaviors on the DRACO hardware using the proposed IHWBC method. Finally, we integrate TOWR+ and IHWBC and demonstrate step-and-stop behaviors on the DRACO hardware. 
    more » « less
  3. This study proposes a hierarchically integrated framework for safe task and motion planning (TAMP) of bipedal locomotion in a partially observable environment with dynamic obstacles and uneven terrain. The high-level task planner employs linear temporal logic for a reactive game synthesis between the robot and its environment and provides a formal guarantee on navigation safety and task completion. To address environmental partial observability, a belief abstraction model is designed by partitioning the environment into multiple belief regions and employed at the high-level navigation planner to estimate the dynamic obstacles' location. This additional location information of dynamic obstacles offered by belief abstraction enables less conservative long-horizon navigation actions beyond guaranteeing immediate collision avoidance. Accordingly, a synthesized action planner sends a set of locomotion actions to the middle-level motion planner while incorporating safe locomotion specifications extracted from safety theorems based on a reduced-order model (ROM) of the locomotion process. The motion planner employs the ROM to design safety criteria and a sampling algorithm to generate nonperiodic motion plans that accurately track high-level actions. At the low level, a foot placement controller based on an angular-momentum linear inverted pendulum model is implemented and integrated with an ankle-actuated passivity-based controller for full-body trajectory tracking. To address external perturbations, this study also investigates the safe sequential composition of the keyframe locomotion state and achieves robust transitions against external perturbations through reachability analysis. The overall TAMP framework is validated with extensive simulations and hardware experiments on bipedal walking robots Cassie and Digit designed by Agility Robotics. 
    more » « less
  4. We present a framework to generate periodic trajectory references for a 3D under-actuated bipedal robot, using a linear inverted pendulum (LIP) based controller with adaptive neural regulation. We use the LIP template model to estimate the robot's center of mass (CoM) position and velocity at the end of the current step, and formulate a discrete controller that determines the next footstep location to achieve a desired walking profile. This controller is equipped on the frontal plane with a Neural-Network-based adaptive term that reduces the model mismatch between the template and physical robot that particularly affects the lateral motion. Then, the foot placement location computed for the LIP model is used to generate task space trajectories (CoM and swing foot trajectories) for the actual robot to realize stable walking. We use a fast, real-time QP-based inverse kinematics algorithm that produces joint references from the task space trajectories, which makes the formulation independent of the knowledge of the robot dynamics. Finally, we implemented and evaluated the proposed approach in simulation and hardware experiments with a Digit robot obtaining stable periodic locomotion for both cases. 
    more » « less
  5. This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios. 
    more » « less