Object insertion under tight tolerances (less than 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.
more »
« less
This content will become publicly available on May 20, 2026
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Object insertion under tight tolerances (less than 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.
more »
« less
- Award ID(s):
- 2309866
- PAR ID:
- 10631615
- Publisher / Repository:
- IEEE International Conference on Robotics and Automation (ICRA)
- Date Published:
- Format(s):
- Medium: X
- Location:
- Atlanta, GA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controllermore » « less
-
Tamim Asfour, editor in (Ed.)A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active ompensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and modelbased methods.more » « less
-
We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%.more » « less
-
Recent advancements in Digital Twin (DT) technology have opened new avenues for smart manufacturing. These systems increasingly depend on adaptive control mechanisms to optimize complex processes and reduce production wastage. This research presents an innovative approach that integrates Soft Actor-Critic (SAC) Reinforcement Learning (RL) algorithm with DT technology with Robot Operating System 2 (ROS2) to enable real-time adaptive control in robotic manufacturing. Our experimental setup consists of a ViperX 300 S robot arm, in which two distinct Tasks: (1) static target reaching and (2) dynamic target following were implemented for simulating adaptive control of manufacturing process. The innovative system architecture combines Unity game engine’s simulation environment with ROS2 for seamless and robust DT synchronization. We implemented a hierarchical reward structure to address common RL challenges, including local minima avoidance, convergence acceleration, and training stability, while leveraging transfer learning to efficiently adapt trained behavior models across tasks. Experimental results demonstrate rapid policy convergence and robust task execution, with performance metrics including cumulative reward, value loss, policy loss, and entropy validating the effectiveness of the approach. To the best of our knowledge, this is the first study to integrate Unity with ROS2-based DT for real-time synchronization and adaptive physical robot control using RL. Unlike prior works limited to offline or low-frequency simulations, our framework achieves stable 20 ms joint-level synchronization, enabling deployment of learned behaviors directly to physical robotic systems through virtual platform. This work advances the integration of RL with realistic DT framework for industrial and manufacturing robotics applications, providing a framework for enhanced adaptive real-time control in smart additive manufacturing (AM) processes.more » « less
An official website of the United States government
