skip to main content


This content will become publicly available on May 14, 2025

Title: Data-Driven Latent Space Representation for Robust Bipedal Locomotion Learning
This paper presents a novel framework for learning robust bipedal walking by combining a data-driven state representation with a Reinforcement Learning (RL) based locomotion policy. The framework utilizes an autoencoder to learn a low-dimensional latent space that captures the complex dynamics of bipedal locomotion from existing locomotion data. This reduced dimensional state representation is then used as states for training a robust RL-based gait policy, eliminating the need for heuristic state selections or the use of template models for gait planning. The results demonstrate that the learned latent variables are disentangled and directly correspond to different gaits or speeds, such as moving forward, backward, or walking in place. Compared to traditional template model-based approaches, our framework exhibits superior performance and robustness in simulation. The trained policy effectively tracks a wide range of walking speeds and demonstrates good generalization capabilities to unseen scenarios.  more » « less
Award ID(s):
2144156
NSF-PAR ID:
10496141
Author(s) / Creator(s):
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE International Conference on Robotics and Automation
ISSN:
1049-3492
Format(s):
Medium: X
Location:
Yokohama, Japan
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions. 
    more » « less
  2. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less
  3. null (Ed.)
    This paper systematically decomposes a quadrupedal robot into bipeds to rapidly generate walking gaits and then recomposes these gaits to obtain quadrupedal locomotion. We begin by decomposing the full-order, nonlinear and hybrid dynamics of a three-dimensional quadrupedal robot, including its continuous and discrete dynamics, into two bipedal systems that are subject to external forces. Using the hybrid zero dynamics (HZD) framework, gaits for these bipedal robots can be rapidly generated (on the order of seconds) along with corresponding controllers. The decomposition is achieved in such a way that the bipedal walking gaits and controllers can be composed to yield dynamic walking gaits for the original quadrupedal robot - the result is the rapid generation of dynamic quadruped gaits utilizing the full-order dynamics. This methodology is demonstrated through the rapid generation (3.96 seconds on average) of four stepping-in-place gaits and one diagonally symmetric ambling gait at 0.35 m/s on a quadrupedal robot - the Vision 60, with 36 state variables and 12 control inputs - both in simulation and through outdoor experiments. This suggested a new approach for fast quadrupedal trajectory planning using full-body dynamics, without the need for empirical model simplification, wherein methods from dynamic bipedal walking can be directly applied to quadrupeds. 
    more » « less
  4. null (Ed.)
    Can we design motion primitives for complex legged systems uniformly for different terrain types without neglecting modeling details? This paper presents a method for rapidly generating quadrupedal locomotion on sloped terrains-from modeling to gait generation, to hardware demonstration. At the core of this approach is the observation that a quadrupedal robot can be exactly decomposed into coupled bipedal robots. Formally, this is represented through the framework of coupled control systems, wherein isolated subsystems interact through coupling constraints. We demonstrate this concept in the context of quadrupeds and use it to reduce the gait planning problem for uneven terrains to bipedal walking generation via hybrid zero dynamics. This reduction method allows for the formulation of a nonlinear optimization problem that leverages low-dimensional bipedal representations to generate dynamic walking gaits on slopes for the full-order quadrupedal robot dynamics. The result is the ability to rapidly generate quadrupedal walking gaits on a variety of slopes. We demonstrate these walking behaviors on the Vision 60 quadrupedal robot; in simulation, via walking on a range of sloped terrains of 13°, 15°, 20°, 25°, and, experimentally, through the successful locomotion of 13° and 20° ~ 25° sloped outdoor grasslands. 
    more » « less
  5. In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-output linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model based CBF-CLF-QP, resulting in the Reinforcement Learning based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty. 
    more » « less