This content will become publicly available on October 11, 2025
This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot’s visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 15 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details are available (diffusion-policy.cs.columbia.edu).
more » « less- PAR ID:
- 10548426
- Publisher / Repository:
- SAGE Publications
- Date Published:
- Journal Name:
- The International Journal of Robotics Research
- ISSN:
- 0278-3649
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named REvolveR of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments on a physics simulator show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced.more » « less
-
A key challenge in complex visuomotor control is learning abstract representations that are ef- fective for specifying goals, planning, and gen- eralization. To this end, we introduce universal planning networks (UPN). UPNs embed differen- tiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its un- derlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imi- tation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforce- ment learning, resulting in substantially more ef- fective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strate- gies across robots with significantly different mor- phologies and actuation capabilities.more » « less
-
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and trajectory diffusion generation for learning robot manipulation from demonstrations. Our main innovation is to use a global transformerbased action predictor to predict actions at keyframes, a task that requires multimodal semantic scene understanding, and to use a local trajectory diffuser to predict trajectory segments that connect predicted macro-actions. ChainedDiffuser sets a new record on established manipulation benchmarks, and outperforms both state-of-the-art keypose (macro-action) prediction models that use motion planners for trajectory prediction, and trajectory diffusion policies that do not predict keyframe macro-actions. We conduct experiments in both simulated and realworld environments and demonstrate ChainedDiffuser’s ability to solve a wide range of manipulation tasks involving interactions with diverse objects.more » « less
-
Despite its rich tradition, there are key limitations to researchers' ability to make generalizable inferences about state policy innovation and diffusion. This paper introduces new data and methods to move from empirical analyses of single policies to the analysis of comprehensive populations of policies and rigorously inferred diffusion networks. We have gathered policy adoption data appropriate for estimating policy innovativeness and tracing diffusion ties in a targeted manner (e.g., by policy domain, time period, or policy type) and extended the development of methods necessary to accurately and efficiently infer those ties. Our state policy innovation and diffusion (SPID) database includes 728 different policies coded by topic area. We provide an overview of this new dataset and illustrate two key uses: (i) static and dynamic innovativeness measures and (ii) latent diffusion networks that capture common pathways of diffusion between states across policies. The scope of the data allows us to compare patterns in both across policy topic areas. We conclude that these new resources will enable researchers to empirically investigate classes of questions that were difficult or impossible to study previously, but whose roots go back to the origins of the political science policy innovation and diffusion literature.
-
Abstract In reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data of one policy when the data has in fact been generated by a different policy. Importance sampling requires computing the likelihood ratio between the action probabilities of a target policy and those of the data-producing behavior policy. In this article, we study importance sampling where the behavior policy action probabilities are replaced by their maximum likelihood estimate of these probabilities under the observed data. We show this general technique reduces variance due to sampling error in Monte Carlo style estimators. We introduce two novel estimators that use this technique to estimate expected values that arise in the RL literature. We find that these general estimators reduce the variance of Monte Carlo sampling methods, leading to faster learning for policy gradient algorithms and more accurate off-policy policy evaluation. We also provide theoretical analysis showing that our new estimators are consistent and have asymptotically lower variance than Monte Carlo estimators.