skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Song, Shuran"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available January 15, 2026
  2. Free, publicly-accessible full text available October 4, 2025
  3. Free, publicly-accessible full text available May 13, 2025
  4. We present a prototype virtual reality user interface for robot teleoperation that supports high-level specification of 3D object positions and orientations in remote assembly tasks. Users interact with virtual replicas of task objects. They asynchronously assign multiple goals in the form of 6DoF destination poses without needing to be familiar with specific robots and their capabilities, and manage and monitor the execution of these goals. The user interface employs two different spatiotemporal visualizations for assigned goals: one represents all goals within the user’s workspace (Aggregated View), while the other depicts each goal within a separate world in miniature (Timeline View). We conducted a user study of the interface without the robot system to compare how these visualizations affect user efficiency and task load. The results show that while the Aggregated View helped the participants finish the task faster, the participants preferred the Timeline View. 
    more » « less
  5. Tan, Jie ; Toussaint, Marc ; Darvish, Kourosh (Ed.)
    Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. As a fundamental step to a general part assembly system, we tackle the task of determining the precise poses of the parts in the target assembly, which we term “rearrangement planning". We present General Part Assembly Transformer (GPAT), a transformer-based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT’s generalization abilities to novel and diverse target and part shapes. 
    more » « less
  6. Many real-world factory tasks require human expertise and involvement for robot control. However, traditional robot operation requires that users undergo extensive and time-consuming robot-specific training to understand the specific constraints of each robot. We describe a user interface that supports a user in assigning and monitoring remote assembly tasks in Virtual Reality (VR) through high-level goal-based instructions rather than low-level direct control. Our user interface is part of a testbed in which a motion-planning algorithm determines, verifies, and executes robot-specific trajectories in simulation. 
    more » « less
  7. This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot’s visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 15 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details are available (diffusion-policy.cs.columbia.edu).

     
    more » « less