skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement Learning for Energy-Efficient Toolpath Generation in Additive Manufacturing
Toolpath design plays a significant role in determining the efficiency of Additive Manufacturing (AM) processes. Traditional toolpath optimization methods frequently depend on empirical methods, which may not adequately account for the complex dynamics of the printing process. This study introduces a novel reinforcement learning (RL) approach, leveraging Proximal Policy Optimization (PPO), to optimize toolpath generation with a particular focus on reducing energy consumption. A custom-built environment was created, simulating the toolpath planning scenario as a discrete grid space, where an RL agent representing the printing nozzle learns to navigate and optimize its path. The RL agent, implemented using Proximal Policy Optimization (PPO), was trained on grids of increasing complexity (10x10 and 25x25) using two reward systems: a default system and an energy-optimized system based on a custom energy model. The energy model penalizes energy-intensive vertical and diagonal movements while rewarding horizontal movements. Results from training showed that the energy-optimized model achieved a significant reduction in energy consumption without compromising toolpath efficiency. On the 10x10 grid, energy consumption decreased from 92.7 𝑘𝑊𝑚𝑠 to 83.5 𝑘𝑊𝑚𝑠, while on the 25x25 grid, it dropped from 400.2 𝑘𝑊𝑚𝑠 to 395.4 𝑘𝑊𝑚𝑠. Statistical analysis using paired t-tests confirmed these reductions with p-values of 0.00, demonstrating the effectiveness of incorporating energy constraints in RL training for AM. This research highlights the potential of RL in improving the sustainability and efficiency of AM processes through intelligent toolpath design.  more » « less
Award ID(s):
2301725
PAR ID:
10587101
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
University of Texas at Austin
Date Published:
Subject(s) / Keyword(s):
Additive Manufacturing reinforcement learning toolpath optimization sustainable manufacturing
Format(s):
Medium: X
Location:
University of Texas at Austin
Right(s):
open
Institution:
Austin, The University Of Texas At
Sponsoring Org:
National Science Foundation
More Like this
  1. Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases. 
    more » « less
  2. This research project aims to develop a resource management framework for efficient allocation of 5G network resources to IoT (Internet of Things) devices. As 5G technology is increasingly integrated with IoT applications, the diverse demands and use-cases of IoT devices necessitate dynamic resource management. The focus of this study is to develop an IoT device environment utilizing reinforcement learning (RL) for resource adjustment. The environment observes IoT device parameters including the current BER (bit-error-rate), allocated bandwidth, and current signal power levels. Actions that can be taken by the RL agent on the environment include adjustments to the bandwidth and the signal power level of an IoT device. One implementation of the environment is currently tested with PPO (Proximal Policy Optimization), and DDPG (Deep Deterministic Policy Gradient) RL algorithms using a continuous action space. Initial results show that PPO models train at a faster rate, while DDPG models explore a wider range of states, leading to better model predictions. Another version is tested with PPO and DQN (Deep Q-Networks) using a discrete action space. DQN demonstrates slightly better results than the PPO, possibly due to its value-based approach and that it is better suited for discrete action spaces. 
    more » « less
  3. High-performance applications necessitate rapid and dependable transfer of massive datasets across geographically dispersed locations. Traditional file transfer tools often suffer from resource underutilization and instability due to fixed configurations or monolithic optimization methods. We propose AutoMDT, a novel Modular Data Transfer Architecture, to address these issues by employing a deep reinforcement learning agent to simultaneously optimize concurrency levels for read, network, and write operations. This solution incorporates a lightweight network–system simulator, enabling offline training of a Proximal Policy Optimization (PPO) agent in approximately 45 minutes on average, thereby overcoming the impracticality of lengthy online training in production networks. AutoMDT’s modular design decouples I/O and network tasks, allowing the agent to capture complex buffer dynamics precisely and adapt to changing system and network conditions quickly. Evaluations on production-grade testbeds show that AutoMDT achieves up to 8x faster convergence and a reduction in transfer completion times compared to state-of-the-art solutions. 
    more » « less
  4. Abstract Reinforcement learning (RL), a subset of machine learning (ML), could optimize and control biomanufacturing processes, such as improved production of therapeutic cells. Here, the process of CAR T‐cell activation by antigen‐presenting beads and their subsequent expansion is formulated in silico. The simulation is used as an environment to train RL‐agents to dynamically control the number of beads in culture to maximize the population of robust effector cells at the end of the culture. We make periodic decisions of incremental bead addition or complete removal. The simulation is designed to operate in OpenAI Gym, enabling testing of different environments, cell types, RL‐agent algorithms, and state inputs to the RL‐agent. RL‐agent training is demonstrated with three different algorithms (PPO, A2C, and DQN), each sampling three different state input types (tabular, image, mixed); PPO‐tabular performs best for this simulation environment. Using this approach, training of the RL‐agent on different cell types is demonstrated, resulting in unique control strategies for each type. Sensitivity to input‐noise (sensor performance), number of control step interventions, and advantages of pre‐trained RL‐agents are also evaluated. Therefore, we present an RL framework to maximize the population of robust effector cells in CAR T‐cell therapy production. 
    more » « less
  5. Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller 
    more » « less