skip to main content


Title: Gathering Physical Particles with a Global Magnetic Field Using Reinforcement Learning
For biomedical applications in targeted therapy delivery and interventions, a large swarm of micro-scale particles (“agents”) has to be moved through a maze-like environment (“vascular system”) to a target region (“tumor”). Due to limited on-board capabilities, these agents cannot move autonomously; instead, they are controlled by an external global force that acts uniformly on all particles. In this work, we demonstrate how to use a time-varying magnetic field to gather particles to a desired location. We use reinforcement learning to train networks to efficiently gather particles. Methods to overcome the simulation-to-reality gap are explained, and the trained networks are deployed on a set of mazes and goal locations. The hardware experiments demonstrate fast convergence, and robustness to both sensor and actuation noise. To encourage extensions and to serve as a benchmark for the reinforcement learning community, the code is available at Github.  more » « less
Award ID(s):
2130793 1553063
NSF-PAR ID:
10393220
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Page Range / eLocation ID:
10126 to 10132
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The development of reinforcement learning (RL) algorithms has created a paradigm where the agents are trained to learn directly by observing the environment and learning policies to perform tasks autonomously. In the case of network environments, these agents can control and monitor the traffic as well as help preserve the confidentiality, integrity, and availability of resources and services in the network. In the case of software defined networks (SDN), the centralized controller in the control plane has become the single point of failure for the entire network. Reactive routing in SDNs makes such networks vulnerable to denial-of-service (DoS) attacks that aim to overwhelm switch memory and the control channel between SDN switches and controllers. One potential solution to cope with such attacks is to use an intelligent mechanism to detect and block them with minimal performance overhead for the controller and control channel. In this work, we investigate the practicality and effectiveness of a reinforcement learning (RL) approach to cope with DoS attacks in SDN networks that utilize programmable switches. Assuming the existence of a reliable reward function, we demonstrate that an RL-based approach can successfully adapt to the changing nature of attack traffic to detect and mitigate attacks without overwhelming switch memory and the control channel in SDN. 
    more » « less
  2. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less
  3. Mobile wireless networks present several challenges for any learning system, due to uncertain and variable device movement, a decentralized network architecture, and constraints on network resources. In this work, we use deep reinforcement learning (DRL) to learn a scalable and generalizable forwarding strategy for such networks. We make the following contributions: i) we use hierarchical RL to design DRL packet agents rather than device agents, to capture the packet forwarding decisions that are made over time and improve training efficiency; ii) we use relational features to ensure generalizability of the learned forwarding strategy to a wide range of network dynamics and enable offline training; and iii) we incorporate both forwarding goals and network resource considerations into packet decision-making by designing a weighted DRL reward function. Our results show that our DRL agent often achieves a similar delay per packet delivered as the optimal forwarding strategy and outperforms all other strategies including state-of-the-art strategies, even on scenarios on which the DRL agent was not trained. 
    more » « less
  4. In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's longterm rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm-an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination-and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guar-antee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data-past environment interactions-when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination. 
    more » « less
  5. While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. 
    more » « less