With the rapid development of deep reinforcement learning technology, it gradually demonstrates excellent potential and is becoming the most promising solution in the robotics. However, in the smart manufacturing domain, there is still not too much research involved in dynamic adaptive control mechanisms optimizing complex processes. This research advances the integration of Soft Actor-Critic (SAC) with digital twins for industrial robotics applications, providing a framework for enhanced adaptive real-time control for smart additive manufacturing processing. The system architecture combines Unity’s simulation environment with ROS2 for seamless digital twin synchronization, while leveraging transfer learning to efficiently adapt trained models across tasks. We demonstrate our methodology using a Viper X300s robot arm with the proposed hierarchical reward structure to address the common reinforcement learning challenges in two distinct control scenarios. The results show rapid policy convergence and robust task execution in both simulated and physical environments demonstrating the effectiveness of our approach.
more »
« less
This content will become publicly available on November 11, 2026
Digital twin-enabled real-time control for robot arm-based manufacturing via reinforcement learning
Recent advancements in Digital Twin (DT) technology have opened new avenues for smart manufacturing. These systems increasingly depend on adaptive control mechanisms to optimize complex processes and reduce production wastage. This research presents an innovative approach that integrates Soft Actor-Critic (SAC) Reinforcement Learning (RL) algorithm with DT technology with Robot Operating System 2 (ROS2) to enable real-time adaptive control in robotic manufacturing. Our experimental setup consists of a ViperX 300 S robot arm, in which two distinct Tasks: (1) static target reaching and (2) dynamic target following were implemented for simulating adaptive control of manufacturing process. The innovative system architecture combines Unity game engine’s simulation environment with ROS2 for seamless and robust DT synchronization. We implemented a hierarchical reward structure to address common RL challenges, including local minima avoidance, convergence acceleration, and training stability, while leveraging transfer learning to efficiently adapt trained behavior models across tasks. Experimental results demonstrate rapid policy convergence and robust task execution, with performance metrics including cumulative reward, value loss, policy loss, and entropy validating the effectiveness of the approach. To the best of our knowledge, this is the first study to integrate Unity with ROS2-based DT for real-time synchronization and adaptive physical robot control using RL. Unlike prior works limited to offline or low-frequency simulations, our framework achieves stable 20 ms joint-level synchronization, enabling deployment of learned behaviors directly to physical robotic systems through virtual platform. This work advances the integration of RL with realistic DT framework for industrial and manufacturing robotics applications, providing a framework for enhanced adaptive real-time control in smart additive manufacturing (AM) processes.
more »
« less
- Award ID(s):
- 2348013
- PAR ID:
- 10647078
- Publisher / Repository:
- Springer Netherlands
- Date Published:
- Journal Name:
- Journal of Intelligent Manufacturing
- ISSN:
- 0956-5515
- Subject(s) / Keyword(s):
- Control, Robotics, Automation, Industrial Automation, Motor Control, Rehabilitation Robotics, Robotic Engineering, Robotics
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent Internet-of-Things (IoT) networks span across a multitude of stationary and robotic devices, namely unmanned ground vehicles, surface vessels, and aerial drones, to carry out mission-critical services such as search and rescue operations, wildfire monitoring, and flood/hurricane impact assessment. Achieving communication synchrony, reliability, and minimal communication jitter among these devices is a key challenge both at the simulation and system levels of implementation due to the underpinning differences between a physics-based robot operating system (ROS) simulator that is time-based and a network-based wireless simulator that is event-based, in addition to the complex dynamics of mobile and heterogeneous IoT devices deployed in a real environment. Nevertheless, synchronization between physics (robotics) and network simulators is one of the most difficult issues to address in simulating a heterogeneous multi-robot system before transitioning it into practice. The existing TCP/IP communication protocol-based synchronizing middleware mostly relied on Robot Operating System 1 (ROS1), which expends a significant portion of communication bandwidth and time due to its master-based architecture. To address these issues, we design a novel synchronizing middleware between robotics and traditional wireless network simulators, relying on the newly released real-time ROS2 architecture with a master-less packet discovery mechanism. Additionally, we propose a ground and aerial agents’ velocity-aware customized QoS policy for Data Distribution Service (DDS) to minimize the packet loss and transmission latency between a diverse set of robotic agents, and we offer the theoretical guarantee of our proposed QoS policy. We performed extensive network performance evaluations both at the simulation and system levels in terms of packet loss probability and average latency with line-of-sight (LOS) and non-line-of-sight (NLOS) and TCP/UDP communication protocols over our proposed ROS2-based synchronization middleware. Moreover, for a comparative study, we presented a detailed ablation study replacing NS-3 with a real-time wireless network simulator, EMANE, and masterless ROS2 with master-based ROS1. Our proposed middleware attests to the promise of building a largescale IoT infrastructure with a diverse set of stationary and robotic devices that achieve low-latency communications (12% and 11% reduction in simulation and reality, respectively) while satisfying the reliability (10% and 15% packet loss reduction in simulation and reality, respectively) and high-fidelity requirements of mission-critical applications.more » « less
-
Mobile robot navigation is a critical aspect of robotics, with applications spanning from service robots to industrial automation. However, navigating in complex and dynamic environments poses many challenges, such as avoiding obstacles, making decisions in real-time, and adapting to new situations. Reinforcement Learning (RL) has emerged as a promising approach to enable robots to learn navigation policies from their interactions with the environment. However, application of RL methods to real-world tasks such as mobile robot navigation, and evaluating their performance under various training–testing settings has not been sufficiently researched. In this paper, we have designed an evaluation framework that investigates the RL algorithm’s generalization capability in regard to unseen scenarios in terms of learning convergence and success rates by transferring learned policies in simulation to physical environments. To achieve this, we designed a simulated environment in Gazebo for training the robot over a high number of episodes. The training environment closely mimics the typical indoor scenarios that a mobile robot can encounter, replicating real-world challenges. For evaluation, we designed physical environments with and without unforeseen indoor scenarios. This evaluation framework outputs statistical metrics, which we then use to conduct an extensive study on a deep RL method, namely the proximal policy optimization (PPO). The results provide valuable insights into the strengths and limitations of the method for mobile robot navigation. Our experiments demonstrate that the trained model from simulations can be deployed to the previously unseen physical world with a success rate of over 88%. The insights gained from our study can assist practitioners and researchers in selecting suitable RL approaches and training–testing settings for their specific robotic navigation tasks.more » « less
-
Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decisionmaking processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are highdimensional point clouds.more » « less
-
Object insertion under tight tolerances (less than 1mm) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.more » « less
An official website of the United States government
