Inverse reinforcement learning (IRL) deals with estimating an agent’s utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent’s strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent’s strategy while controlling the agent’s utility function. Then, we propose an I-IRL result that mitigates the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in microeconomics. The key idea is for the agent to deliberately choose sub-optimal responses so that its true strategy is sufficiently masked. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary-specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target.
more »
« less
Controlled Sensing with Corrupted Commands
We consider a non-adaptive controlled sensing sce-nario in which the actions of the decision maker are corrupted by an adversary. The objective of the decision maker is to either detect the presence of the corruption or make a correct decision. Accordingly, the performance of a controlled sensing strategy is measured in terms of the error probability when there is no adversary, denoted PE,0 , and the error probability when an adversary is present, denoted PE,1 . Our main result is Stein-lemma like characterization of the optimal achievable error exponent of PE,0 subject to a constraint on PE,1 . We also illustrate the result with numerical examples.
more »
« less
- PAR ID:
- 10413178
- Date Published:
- Journal Name:
- Proc. of 58th Annual Allerton Conference on Communication, Control, and Computing
- Page Range / eLocation ID:
- 1 to 7
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Guruswami, Venkatesan (Ed.)Traditional fraud detection is often based on finding statistical anomalies in data sets and transaction histories. A sophisticated fraudster, aware of the exact kinds of tests being deployed, might be difficult or impossible to catch. We are interested in paradigms for fraud detection that are provably robust against any adversary, no matter how sophisticated. In other words, the detection strategy should rely on signals in the data that are inherent in the goals the adversary is trying to achieve. Specifically, we consider a fraud detection game centered on a random walk on a graph. We assume this random walk is implemented by having a player at each vertex, who can be honest or not. In particular, when the random walk reaches a vertex owned by an honest player, it proceeds to a uniformly random neighbor at the next timestep. However, when the random walk reaches a dishonest player, it instead proceeds to an arbitrary neighbor chosen by an omniscient Adversary. The game is played between the Adversary and a Referee who sees the trajectory of the random walk. At any point during the random walk, if the Referee determines that a {specific} vertex is controlled by a dishonest player, the Referee accuses that player, and therefore wins the game. The Referee is allowed to make the occasional incorrect accusation, but must follow a policy that makes such mistakes with small probability of error. The goal of the adversary is to make the cover time large, ideally infinite, i.e., the walk should never reach at least one vertex. We consider the following basic question: how much can the omniscient Adversary delay the cover time without getting caught? Our main result is a tight upper bound on this delay factor. We also discuss possible applications of our results to settings such as Rotor Walks, Leader Election, and Sybil Defense.more » « less
-
Real-time decision making in IoT applications relies upon space-efficient evaluation of queries over streaming data. To model the uncertainty in the classification of data being processed, we consider the model of probabilistic strings --- sequences of discrete probability distributions over a finite set of events, and initiate the study of space complexity of streaming computation for different classes of queries over such probabilistic strings. We first consider the problem of computing the probability that a word, sampled from the distribution defined by the probabilistic string read so far, is accepted by a given deterministic finite automaton. We show that this regular pattern matching problem can be solved using space that is only poly-logarithmic in the string length (and polynomial in the size of the DFA) if we are allowed a multiplicative approximation error. Then we show how to generalize this result to quantitative queries specified by additive cost register automata --- these are automata that map strings to numerical values using finite control and registers that get updated using linear transformations. Finally, we consider the case when updates in such an automaton involve tests, and in particular, when there is a counter variable that can be either incremented or decremented but decrements only apply when the counter value is non-zero. In this case, the desired answer depends on the probability distribution over the set of possible counter values that can range from 0 to n for a string of length n. Under a mild assumption, namely probabilities of the individual events are bounded away from 0 and 1, we show that there is an algorithm that can compute all n entries of this probability distribution vector to within additive 1/poly(n) error using space that is only Õ(n). In establishing these results, we introduce several new technical ideas that may prove useful for designing space-efficient algorithms for other query models over probabilistic strings.more » « less
-
null (Ed.)The primary objective of this paper is to develop computationally efficient methods for optimal stopping of an adaptive Phase II dose-finding clinical trial, where the decision maker may terminate the trial for efficacy or abandon it as a result of futility. We develop two solution methods and compare them in terms of computational time and several performance metrics such as the probability of correct stopping decision. One proposed method is an application of the one-step look-ahead policy to this problem. The second proposal builds a diffusion approximation to the state variable in the continuous regime and approximates the trial’s stopping time by optimal stopping of a diffusion process. The secondary objective of the paper is to compare these methods on different dose-response curves, particularly when the true dose-response curve has no significant advantage over a placebo. Our results, which include a real clinical trial case study, show that look-ahead policies perform poorly in terms of the probability of correct decision in this setting, whereas our diffusion approximation method provides robust solutions.more » « less
-
A metacognitive radar switches between two modes of cognition— one mode to achieve a high-quality estimate of targets, and the other mode to hide its utility function (plan). To achieve high-quality es- timates of targets, a cognitive radar performs a constrained utility maximization to adapt its sensing mode in response to a changing target environment. If an adversary can estimate the utility function of a cognitive radar, it can determine the radar’s sensing strategy and mitigate the radar performance via electronic countermeasures (ECM). This article discusses a metacognitive radar that switches between two modes of cognition: achieving satisfactory estimates of a target while hiding its strategy from an adversary that detects cognition. The radar does so by transmitting purposefully designed suboptimal responses to spoof the adversary’s Neyman–Pearson de- tector. We provide theoretical guarantees by ensuring that the Type-I error probability of the adversary’s detector exceeds a predefined level for a specified tolerance on the radar’s performance loss. We illustrate our cognition-masking scheme via numerical examples in- volving waveform adaptation and beam allocation. We show that small purposeful deviations from the optimal emission confuse the adversary by significant amounts, thereby masking the radar’s cognition. Our approach uses ideas from revealed preference in microeconomics and adversarial inverse reinforcement learning. Our proposed algorithms provide a principled approach for system-level electronic counter- countermeasures to hide the radar’s strategy from an adversary. We also provide performance bounds for our cognition-masking scheme when the adversary has misspecified measurements of the radar’s response.more » « less