skip to main content

Title: TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents. In Proc. 57th ACM/IEEE Design Automation Conference (DAC), 2020, March 2020
We present TrojDRL, a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents.TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different gradations of threat models. We show that untargeted attacks on state-of-the-art actor-critic algorithms can circumvent existing defenses built on the assumption of backdoors being targeted. We evaluated TrojDRL on a broad set of DRL benchmarks and showed that the attacks require only poisoning as little as 0.025% of the training data. Compared with existing works of backdoor attacks on classification models, TrojDRL provides a first step towards understanding the vulnerability of DRL agents.
; ; ;
Award ID(s):
1740079 1750009
Publication Date:
Journal Name:
Proc. 57th ACM/IEEE Design Automation Conference (DAC), 2020
Sponsoring Org:
National Science Foundation
More Like this
  1. Despite their tremendous success in a range of domains, deep learning systems are inherently susceptible to two types of manipulations: adversarial inputs -- maliciously crafted samples that deceive target deep neural network (DNN) models, and poisoned models -- adversely forged DNNs that misbehave on pre-defined inputs. While prior work has intensively studied the two attack vectors in parallel, there is still a lack of understanding about their fundamental connections: what are the dynamic interactions between the two attack vectors? what are the implications of such interactions for optimizing existing attacks? what are the potential countermeasures against the enhanced attacks? Answering these key questions is crucial for assessing and mitigating the holistic vulnerabilities of DNNs deployed in realistic settings. Here we take a solid step towards this goal by conducting the first systematic study of the two attack vectors within a unified framework. Specifically, (i) we develop a new attack model that jointly optimizes adversarial inputs and poisoned models; (ii) with both analytical and empirical evidence, we reveal that there exist intriguing "mutual reinforcement" effects between the two attack vectors -- leveraging one vector significantly amplifies the effectiveness of the other; (iii) we demonstrate that such effects enable a large designmore »spectrum for the adversary to enhance the existing attacks that exploit both vectors (e.g., backdoor attacks), such as maximizing the attack evasiveness with respect to various detection methods; (iv) finally, we discuss potential countermeasures against such optimized attacks and their technical challenges, pointing to several promising research directions.« less
  2. While Deep Reinforcement Learning has emerged as a de facto approach to many complex experience-driven networking problems, it remains challenging to deploy DRL into real systems. Due to the random exploration or half-trained deep neural networks during the online training process, the DRL agent may make unexpected decisions, which may lead to system performance degradation or even system crash. In this paper, we propose PnP-DRL, an offline-trained, plug and play DRL solution, to leverage the batch reinforcement learning approach to learn the best control policy from pre-collected transition samples without interacting with the system. After being trained without interaction with systems, our Plug and Play DRL agent will start working seamlessly, without additional exploration or possible disruption of the running systems. We implement and evaluate our PnP-DRL solution on a prevalent experience-driven networking problem, Dynamic Adaptive Streaming over HTTP (DASH). Extensive experimental results manifest that 1) The existing batch reinforcement learning method has its limits; 2) Our approach PnP-DRL significantly outperforms classical adaptive bitrate algorithms in average user Quality of Experience (QoE); 3) PnP-DRL, unlike the state-of-the-art online DRL methods, can be off and running without learning gaps, while achieving comparable performances.
  3. Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource,more »an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.« less
  4. Deep reinforcement learning (DRL) augments the reinforcement learning framework, which learns a sequence of actions that maximizes the expected reward, with the representative power of deep neural networks. Recent works have demonstrated the great potential of DRL in medicine and healthcare. This paper presents a literature review of DRL in medical imaging. We start with a comprehensive tutorial of DRL, including the latest model-free and model-based algorithms. We then cover existing DRL applications for medical imaging, which are roughly divided into three main categories: (I) parametric medical image analysis tasks including landmark detection, object/lesion detection, registration, and view plane localization; (ii) solving optimization tasks including hyperparameter tuning, selecting augmentation strategies, and neural architecture search; and (iii) miscellaneous applications including surgical gesture segmentation, personalized mobile health intervention, and computational model personalization. The paper concludes with discussions of future perspectives.
  5. Mobile devices such as drones and autonomous vehicles increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks such as navigation, target-tracking and surveillance, just to name a few. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) is thus used along with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which does not comply with real-time applications requirements. Offloading OD to edge servers can mitigate this issue, but existing work focuses on the optimization of the offloading process in systems where the wireless channel has a very large capacity. Herein, we consider systems with constrained and erratic channel capacity, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. We show that this technique greatly improves the quality of the reference available to tracking, and boosts performance up to 33%. However, while Katch-Up significantly improves performance, it also increases the computing loadmore »of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns to achieve the right trade-off between resource utilization and OD performance. SmartDet takes as input highly-heterogeneous context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed by a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link, to collect several network-related traces, as well as energy measurements. We consider a state-of-the-art video dataset (ILSVRC 2015 - VID) and state-of-the-art OD models (EfficientDet 0, 2 and 4). Experimental results show that SmartDet achieves an optimal balance between tracking performance – mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Up usage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames.« less