skip to main content


Title: REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer
A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named REvolveR of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments on a physics simulator show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced.  more » « less
Award ID(s):
2024594
PAR ID:
10366292
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial two-finger-gripper robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to a target commercial robot. 
    more » « less
  2.  
    more » « less
  3. null (Ed.)
    Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors’ knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system. 
    more » « less
  4. This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot’s visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 15 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details are available (diffusion-policy.cs.columbia.edu).

     
    more » « less
  5. Recent Internet-of-Things (IoT) networks span across a multitude of stationary and robotic devices, namely unmanned ground vehicles, surface vessels, and aerial drones, to carry out mission-critical services such as search and rescue operations, wildfire monitoring, and flood/hurricane impact assessment. Achieving communication synchrony, reliability, and minimal communication jitter among these devices is a key challenge both at the simulation and system levels of implementation due to the underpinning differences between a physics-based robot operating system (ROS) simulator that is time-based and a network-based wireless simulator that is event-based, in addition to the complex dynamics of mobile and heterogeneous IoT devices deployed in a real environment. Nevertheless, synchronization between physics (robotics) and network simulators is one of the most difficult issues to address in simulating a heterogeneous multi-robot system before transitioning it into practice. The existing TCP/IP communication protocol-based synchronizing middleware mostly relied on Robot Operating System 1 (ROS1), which expends a significant portion of communication bandwidth and time due to its master-based architecture. To address these issues, we design a novel synchronizing middleware between robotics and traditional wireless network simulators, relying on the newly released real-time ROS2 architecture with a master-less packet discovery mechanism. Additionally, we propose a ground and aerial agents’ velocity-aware customized QoS policy for Data Distribution Service (DDS) to minimize the packet loss and transmission latency between a diverse set of robotic agents, and we offer the theoretical guarantee of our proposed QoS policy. We performed extensive network performance evaluations both at the simulation and system levels in terms of packet loss probability and average latency with line-of-sight (LOS) and non-line-of-sight (NLOS) and TCP/UDP communication protocols over our proposed ROS2-based synchronization middleware. Moreover, for a comparative study, we presented a detailed ablation study replacing NS-3 with a real-time wireless network simulator, EMANE, and masterless ROS2 with master-based ROS1. Our proposed middleware attests to the promise of building a largescale IoT infrastructure with a diverse set of stationary and robotic devices that achieve low-latency communications (12% and 11% reduction in simulation and reality, respectively) while satisfying the reliability (10% and 15% packet loss reduction in simulation and reality, respectively) and high-fidelity requirements of mission-critical applications. 
    more » « less