NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adversarial Inception Backdoor Attacks against Reinforcement Learning

Rathbun, Ethan; Oprea, Alina; Amato, Christopher (July 2025, 42nd International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
Adversarial Inception Backdoor Attacks against Reinforcement Learning

Rathbun, Ethan; Oprea, Alina; Amato, Christopher (July 2025, 42nd International Conference on Machine Learning (ICML))

Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. The objectives of these attacks are twofold: induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks assume arbitrary control over the agent's rewards, inducing values far outside the environment's natural constraints. This results in brittle attacks that fail once the proper reward constraints are enforced. Thus, in this work we propose a new class of backdoor attacks against DRL which are the first to achieve state of the art performance under strict reward constraints. These ``inception'' attacks manipulate the agent's training data -- inserting the trigger into prior observations and replacing high return actions with those of the targeted adversarial behavior. We formally define these attacks and prove they achieve both adversarial objectives against arbitrary Markov Decision Processes (MDP). Using this framework we devise an online inception attack which achieves an 100% attack success rate on multiple environments under constrained rewards while minimally impacting the agent's task performance.
more » « less
Free, publicly-accessible full text available July 13, 2026
SleeperNets: universal backdoor poisoning attacks against reinforcement learning agents

Rathbun, Ethan; Amato, Christopher; Oprea, Alina (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Free, publicly-accessible full text available December 10, 2025
SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Rathbun, Ethan; Amato, Christopher; Oprea, Alina (December 2024, the Conference on Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 10, 2025

Search for: All records