skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PM-FSM: Policies Modulating Finite State Machine for Robust Quadrupedal Locomotion
Deep reinforcement learning (deep RL) has emerged as an effective tool for developing controllers for legged robots. However, vanilla deep RL often requires a tremendous amount of training samples and is not feasible for achieving robust behaviors. Instead, researchers have investigated a novel policy architecture by incorporating human experts' knowledge, such as Policies Modulating Trajectory Generators (PMTG). This architecture builds a recurrent control loop by combining a parametric trajectory generator (TG) and a feedback policy network to achieve more robust behaviors. In this work, we propose Policies Modulating Finite State Machine (PM-FSM) by replacing TGs with contact-aware finite state machines (FSM), which offers more flexible control of each leg. This invention offers an explicit notion of contact events to the policy to negotiate unexpected perturbations. We demonstrated that the proposed architecture could achieve more robust behaviors in various scenarios, such as challenging terrains or external perturbations, on both simulated and real robots.  more » « less
Award ID(s):
2024768
PAR ID:
10434903
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
PM-FSM: Policies Modulating Finite State Machine for Robust Quadrupedal Locomotion
Page Range / eLocation ID:
4063 to 4069
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world. The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot’s I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps. 
    more » « less
  2. Deep reinforcement learning (RL) has led to encouraging successes in many challenging control tasks. However, a deep RL model lacks interpretability due to the difficulty of identifying how the model's control logic relates to its network structure. Programmatic policies structured in more interpretable representations emerge as a promising solution. Yet two shortcomings remain: First, synthesizing programmatic policies requires optimizing over the discrete and non-differentiable search space of program architectures. Previous works are suboptimal because they only enumerate program architectures greedily guided by a pretrained RL oracle. Second, these works do not exploit compositionality, an important programming concept, to reuse and compose primitive functions to form a complex function for new tasks. Our first contribution is a programmatically interpretable RL framework that conducts program architecture search on top of a continuous relaxation of the architecture space defined by programming language grammar rules. Our algorithm allows policy architectures to be learned with policy parameters via bilevel optimization using efficient policy-gradient methods, and thus does not require a pretrained oracle. Our second contribution is improving programmatic policies to support compositionality by integrating primitive functions learned to grasp task-agnostic skills as a composite program to solve novel RL problems. Experiment results demonstrate that our algorithm excels in discovering optimal programmatic policies that are highly interpretable. 
    more » « less
  3. Distributed manipulators - consisting of a set of actuators or robots working cooperatively to achieve a manipulation task - are robust and flexible tools for performing a range of planar manipulation skills. One novel example is the delta array, a distributed manipulator composed of a grid of delta robots, capable of performing dexterous manipulation tasks using strategies incorporating both dynamic and static contact. Hand-designing effective distributed control policies for such a manipulator can be complex and time consuming, given the high-dimensional action space and unfamiliar system dynamics. In this paper, we examine the principles guiding development and control of such a delta array for a planar translation task. We explore policy learning as a robust cooperative control approach, allowing for smooth manipulation of a range of objects, showing improved accuracy and efficiency over baseline human-designed policies. 
    more » « less
  4. null (Ed.)
    Distributed manipulators - consisting of a set of actuators or robots working cooperatively to achieve a manipulation task - are robust and flexible tools for performing a range of planar manipulation skills. One novel example is the delta array, a distributed manipulator composed of a grid of delta robots, capable of performing dexterous manipulation tasks using strategies incorporating both dynamic and static contact. Hand-designing effective distributed control policies for such a manipulator can be complex and time consuming, given the high-dimensional action space and unfamiliar system dynamics. In this paper, we examine the principles guiding development and control of such a delta array for a planar translation task. We explore policy learning as a robust cooperative control approach, allowing for smooth manipulation of a range of objects, showing improved accuracy and efficiency over baseline human-designed policies. 
    more » « less
  5. Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/. 
    more » « less