skip to main content


Title: Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models
Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work, we present a framework for learning composite dynamical behaviors from expert demonstrations. We learn a switching linear dynamical model with contacts encoded in switching conditions as a close approximation of our system dynamics. We then use discrete-time LQR as the differentiable policy class for data-efficient learning of control to develop a control strategy that operates over multiple dynamical modes and takes into account discontinuities due to contact. In addition to predicting interactions with the environment, our policy effectively reacts to inaccurate predictions such as unanticipated contacts. Through simulation and real world experiments, we demonstrate generalization of learned behaviors to different scenarios and robustness to model inaccuracies during execution.  more » « less
Award ID(s):
1925130 1956163
NSF-PAR ID:
10293206
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Robotics and Automation
ISSN:
1049-3492
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modelling and learning the dynamics of intricate dynamic interactions prevalent in common tasks such as push- ing a heavy door or picking up an object in one sweeping motion is a challenging problem. One needs to consider both the dynamics of the individual objects and of the interactions among objects. In this work, we present a method that enables efficient learning of the dynamics of interacting systems by simultaneously learning a dynamic graph structure and a stable and locally linear forward dynamic model of the system. The dynamic graph structure encodes evolving contact modes along a trajectory by making probabilistic predictions over the edge activations. Introducing a temporal dependence in the learned graph structure enables incorporating contact measurement updates which allows for more accurate forward predictions. The learned stable and locally linear dynamics enable the use of optimal control algorithms such as iLQR for long-horizon planning and control for complex interactive tasks. Through experiments in simulation and in the real world, we evaluate the performance of our method by using the learned inter- action dynamics for control and demonstrate generalization to more objects and interactions not seen during training. We also introduce a control scheme that takes advantage of contact measurement updates and hence is robust to prediction inaccuracies during execution. 
    more » « less
  2. We articulate the design imperatives for machine learning based digital twins for nonlinear dynamical systems, which can be used to monitor the “health” of the system and anticipate future collapse. The fundamental requirement for digital twins of nonlinear dynamical systems is dynamical evolution: the digital twin must be able to evolve its dynamical state at the present time to the next time step without further state input—a requirement that reservoir computing naturally meets. We conduct extensive tests using prototypical systems from optics, ecology, and climate, where the respective specific examples are a chaotic CO2 laser system, a model of phytoplankton subject to seasonality, and the Lorenz-96 climate network. We demonstrate that, with a single or parallel reservoir computer, the digital twins are capable of a variety of challenging forecasting and monitoring tasks. Our digital twin has the following capabilities: (1) extrapolating the dynamics of the target system to predict how it may respond to a changing dynamical environment, e.g., a driving signal that it has never experienced before, (2) making continual forecasting and monitoring with sparse real-time updates under non-stationary external driving, (3) inferring hidden variables in the target system and accurately reproducing/predicting their dynamical evolution, (4) adapting to external driving of different waveform, and (5) extrapolating the global bifurcation behaviors to network systems of different sizes. These features make our digital twins appealing in applications, such as monitoring the health of critical systems and forecasting their potential collapse induced by environmental changes or perturbations. Such systems can be an infrastructure, an ecosystem, or a regional climate system.

     
    more » « less
  3. null (Ed.)
    Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors’ knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system. 
    more » « less
  4. We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains. 
    more » « less
  5. Whole-body control (WBC) is a generic task-oriented control method for feedback control of loco-manipulation behaviors in humanoid robots. The combination of WBC and model-based walking controllers has been widely utilized in various humanoid robots. However, to date, the WBC method has not been employed for unsupported passive-ankle dynamic locomotion. As such, in this article, we devise a new WBC, dubbed the whole-body locomotion controller (WBLC), that can achieve experimental dynamic walking on unsupported passive-ankle biped robots. A key aspect of WBLC is the relaxation of contact constraints such that the control commands produce reduced jerk when switching foot contacts. To achieve robust dynamic locomotion, we conduct an in-depth analysis of uncertainty for our dynamic walking algorithm called the time-to-velocity-reversal (TVR) planner. The uncertainty study is fundamental as it allows us to improve the control algorithms and mechanical structure of our robot to fulfill the tolerated uncertainty. In addition, we conduct extensive experimentation for: (1) unsupported dynamic balancing (i.e., in-place stepping) with a six-degree-of-freedom biped, Mercury; (2) unsupported directional walking with Mercury; (3) walking over an irregular and slippery terrain with Mercury; and 4) in-place walking with our newly designed ten-DoF viscoelastic liquid-cooled biped, DRACO. Overall, the main contributions of this work are on: (a) achieving various modalities of unsupported dynamic locomotion of passive-ankle bipeds using a WBLC controller and a TVR planner; (b) conducting an uncertainty analysis to improve the mechanical structure and the controllers of Mercury; and (c) devising a whole-body control strategy that reduces movement jerk during walking. 
    more » « less