skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Gathering Physical Particles with a Global Magnetic Field Using Reinforcement Learning
For biomedical applications in targeted therapy delivery and interventions, a large swarm of micro-scale particles (“agents”) has to be moved through a maze-like environment (“vascular system”) to a target region (“tumor”). Due to limited on-board capabilities, these agents cannot move autonomously; instead, they are controlled by an external global force that acts uniformly on all particles. In this work, we demonstrate how to use a time-varying magnetic field to gather particles to a desired location. We use reinforcement learning to train networks to efficiently gather particles. Methods to overcome the simulation-to-reality gap are explained, and the trained networks are deployed on a set of mazes and goal locations. The hardware experiments demonstrate fast convergence, and robustness to both sensor and actuation noise. To encourage extensions and to serve as a benchmark for the reinforcement learning community, the code is available at Github.  more » « less
Award ID(s):
2130793 1553063
PAR ID:
10393220
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Page Range / eLocation ID:
10126 to 10132
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The development of reinforcement learning (RL) algorithms has created a paradigm where the agents are trained to learn directly by observing the environment and learning policies to perform tasks autonomously. In the case of network environments, these agents can control and monitor the traffic as well as help preserve the confidentiality, integrity, and availability of resources and services in the network. In the case of software defined networks (SDN), the centralized controller in the control plane has become the single point of failure for the entire network. Reactive routing in SDNs makes such networks vulnerable to denial-of-service (DoS) attacks that aim to overwhelm switch memory and the control channel between SDN switches and controllers. One potential solution to cope with such attacks is to use an intelligent mechanism to detect and block them with minimal performance overhead for the controller and control channel. In this work, we investigate the practicality and effectiveness of a reinforcement learning (RL) approach to cope with DoS attacks in SDN networks that utilize programmable switches. Assuming the existence of a reliable reward function, we demonstrate that an RL-based approach can successfully adapt to the changing nature of attack traffic to detect and mitigate attacks without overwhelming switch memory and the control channel in SDN. 
    more » « less
  2. The development of reinforcement learning (RL) algorithms has created a paradigm where the agents are trained to learn directly by observing the environment and learning policies to perform tasks autonomously. In the case of network environments, these agents can control and monitor the traffic as well as help preserve the confidentiality, integrity, and availability of resources and services in the network. In the case of software defined networks (SDN), the centralized controller in the control plane has become the single point of failure for the entire network. Reactive routing in SDNs makes such networks vulnerable to denial-of-service (DoS) attacks that aim to overwhelm switch memory and the control channel between SDN switches and controllers. One potential solution to cope with such attacks is to use an intelligent mechanism to detect and block them with minimal performance overhead for the controller and control channel. In this work, we investigate the practicality and effectiveness of a reinforcement learning (RL) approach to cope with DoS attacks in SDN networks that utilize programmable switches. Assuming the existence of a reliable reward function, we demonstrate that an RL-based approach can successfully adapt to the changing nature of attack traffic to detect and mitigate attacks without overwhelming switch memory and the control channel in SDN. 
    more » « less
  3. Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent’s inputs, which raises concerns about deploying such agents in the real world. To address this issue, we propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against lp-norm bounded adversarial attacks. Our framework is compatible with popular deep reinforcement learning algorithms and we demonstrate its performance with deep Q-learning, A3C and PPO. We experiment on three deep RL benchmarks (Atari, MuJoCo and ProcGen) to show the effectiveness of our robust training algorithm. Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method called Greedy Worst-Case Reward (GWC) to measure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks. All code used for our experiments is available at https://github.com/tuomaso/radial_rl_v2. 
    more » « less
  4. Mobile wireless networks present several challenges for any learning system, due to uncertain and variable device movement, a decentralized network architecture, and constraints on network resources. In this work, we use deep reinforcement learning (DRL) to learn a scalable and generalizable forwarding strategy for such networks. We make the following contributions: i) we use hierarchical RL to design DRL packet agents rather than device agents, to capture the packet forwarding decisions that are made over time and improve training efficiency; ii) we use relational features to ensure generalizability of the learned forwarding strategy to a wide range of network dynamics and enable offline training; and iii) we incorporate both forwarding goals and network resource considerations into packet decision-making by designing a weighted DRL reward function. Our results show that our DRL agent often achieves a similar delay per packet delivered as the optimal forwarding strategy and outperforms all other strategies including state-of-the-art strategies, even on scenarios on which the DRL agent was not trained. 
    more » « less
  5. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less