skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 20, 2026

Title: Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Object insertion under tight tolerances (less than 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.  more » « less
Award ID(s):
2309866
PAR ID:
10631615
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
IEEE International Conference on Robotics and Automation (ICRA)
Date Published:
Format(s):
Medium: X
Location:
Atlanta, GA
Sponsoring Org:
National Science Foundation
More Like this
  1. Object insertion under tight tolerances (less than 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach. 
    more » « less
  2. Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller 
    more » « less
  3. Tamim Asfour, editor in (Ed.)
    A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active ompensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and modelbased methods. 
    more » « less
  4. We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%. 
    more » « less
  5. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less