Inverse reinforcement learning (IRL) deals with estimating an agent’s utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent’s strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent’s strategy while controlling the agent’s utility function. Then, we propose an I-IRL result that mitigates the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in microeconomics. The key idea is for the agent to deliberately choose sub-optimal responses so that its true strategy is sufficiently masked. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary-specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target.
more »
« less
Metacognitive Radar: Masking Cognition From an Inverse Reinforcement Learner
A metacognitive radar switches between two modes of cognition— one mode to achieve a high-quality estimate of targets, and the other mode to hide its utility function (plan). To achieve high-quality es- timates of targets, a cognitive radar performs a constrained utility maximization to adapt its sensing mode in response to a changing target environment. If an adversary can estimate the utility function of a cognitive radar, it can determine the radar’s sensing strategy and mitigate the radar performance via electronic countermeasures (ECM). This article discusses a metacognitive radar that switches between two modes of cognition: achieving satisfactory estimates of a target while hiding its strategy from an adversary that detects cognition. The radar does so by transmitting purposefully designed suboptimal responses to spoof the adversary’s Neyman–Pearson de- tector. We provide theoretical guarantees by ensuring that the Type-I error probability of the adversary’s detector exceeds a predefined level for a specified tolerance on the radar’s performance loss. We illustrate our cognition-masking scheme via numerical examples in- volving waveform adaptation and beam allocation. We show that small purposeful deviations from the optimal emission confuse the adversary by significant amounts, thereby masking the radar’s cognition. Our approach uses ideas from revealed preference in microeconomics and adversarial inverse reinforcement learning. Our proposed algorithms provide a principled approach for system-level electronic counter- countermeasures to hide the radar’s strategy from an adversary. We also provide performance bounds for our cognition-masking scheme when the adversary has misspecified measurements of the radar’s response.
more »
« less
- Award ID(s):
- 2312198
- PAR ID:
- 10518929
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Aerospace and Electronic Systems
- Volume:
- 59
- Issue:
- 6
- ISSN:
- 0018-9251
- Page Range / eLocation ID:
- 8826 to 8844
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Moving target defense (MTD) is a proactive defense approach that aims to thwart attacks by continuously changing the attack surface of a system (e.g., changing host or network configurations), thereby increasing the adversary’s uncertainty and attack cost. To maximize the impact of MTD, a defender must strategically choose when and what changes to make, taking into account both the characteristics of its system as well as the adversary’s observed activities. Finding an optimal strategy for MTD presents a significant challenge, especially when facing a resourceful and determined adversary who may respond to the defender’s actions. In this paper, we propose a multi-agent partially-observable Markov Decision Process model of MTD and formulate a two-player general-sum game between the adversary and the defender. To solve this game, we propose a multi-agent reinforcement learning framework based on the double oracle algorithm. Finally, we provide experimental results to demonstrate the effectiveness of our framework in finding optimal policies.more » « less
-
Abstract Audio-based sensing enables fine-grained human activity detection, such as sensing hand gestures and contact-free estimation of the breathing rate. A passive adversary, equipped with microphones, can leverage the ongoing sensing to infer private information about individuals. Further, with multiple microphones, a beamforming-capable adversary can defeat the previously-proposed privacy protection obfuscation techniques. Such an adversary can isolate the obfuscation signal and cancel it, even when situated behind a wall. AudioSentry is the first to address the privacy problem in audio sensing by protecting the users against a multi-microphone adversary. It utilizes the commodity and audio-capable devices, already available in the user’s environment, to form a distributed obfuscator array. AudioSentry packs a novel technique to carefully generate obfuscation beams in different directions, preventing the multi-microphone adversary from canceling the obfuscation signal. AudioSentry follows by a dynamic channel estimation scheme to preserve authorized sensing under obfuscation. AudioSentry offers the advantages of being practical to deploy and effective against an adversary with a large number of microphones. Our extensive evaluations with commodity devices show that protects the user’s privacy against a 16-microphone adversary with only four commodity obfuscators, regardless of the adversary’s position. AudioSentry provides its privacy-preserving features with little overhead on the authorized sensor.more » « less
-
This paper considers the problem of tracking and predicting dynamical processes with model switching. The classical approach to this problem has been to use an interacting multiple model (IMM) which uses multiple Kalman filters and an auxiliary system to estimate the posterior probability of each model given the observations. More recently, data-driven approaches such as recurrent neural networks (RNNs) have been used for tracking and prediction in a variety of settings. An advantage of data-driven approaches like the RNN is that they can be trained to provide good performance even when the underlying dynamic models are unknown. This paper studies the use of temporal convolutional networks (TCNs) in this setting since TCNs are also data-driven but have certain structural advantages over RNNs. Numerical simulations demonstrate that a TCN matches or exceeds the performance of an IMM and other classical tracking methods in two specific settings with model switching: (i) a Gilbert-Elliott burst noise communication channel that switches between two different modes, each modeled as a linear system, and (ii) a maneuvering target tracking scenario where the target switches between a linear constant velocity mode and a nonlinear coordinated turn mode. In particular, the results show that the TCN tends to identify a mode switch as fast or faster than an IMM and that, in some cases, the TCN can perform almost as well as an omniscient Kalman filter with perfect knowledge of the current mode of the dynamical system.more » « less
-
This paper proposes a distributed estimation and control algorithm to allow a team of robots to search for and track an unknown number of targets. The number of targets in the area of interest varies over time as targets enter or leave, and there are many sources of sensing uncertainty, including false positive detections, false negative detections, and measurement noise. The robots use a novel distributed Multiple Hypothesis Tracker (MHT) to estimate both the number of targets and the states of each target. A key contribution is a new data association method that reallocates target tracks across the team. The distributed MHT is compared against another distributed multi-target tracker to test its utility for multi-robot, multi-target tracking.more » « less