This paper studies the synthesis of controllers for cyberphysical systems (CPSs) that are required to carry out complex timesensitive tasks in the presence of an adversary. The timesensitive task is specified as a formula in the metric interval temporal logic (MITL). CPSs that operate in adversarial environments have typically been abstracted as stochastic games (SGs); however, because traditional SG models do not incorporate a notion of time, they cannot be used in a setting where the objective is timesensitive. To address this, we introduce durational stochastic games (DSGs). DSGs generalize SGs to incorporate a notion of time and model the adversary’s abilities to tamper with the control input (actuator attack) and manipulate the timing information that is perceived by the CPS (timing attack). We define notions of spatial, temporal, and spatiotemporal robustness to quantify the amounts by which system trajectories under the synthesized policy can be perturbed in space and time without affecting satisfaction of the MITL objective. In the case of an actuator attack, we design computational procedures to synthesize controllers that will satisfy the MITL task along with a guarantee of its robustness. In the presence of a timing attack, we relax the robustness constraint to develop a value iterationbased procedure to compute the CPS policy as a finitestate controller to maximize the probability of satisfying the MITL task. A numerical evaluation of our approach is presented on a signalized traffic network to illustrate our results.
Optimal Secure Control with Linear Temporal Logic Constraints
Prior work on automatic control synthesis for cyberphysical systems under logical constraints has primarily focused
on environmental disturbances or modeling uncertainties, however, the impact of deliberate and malicious attacks has been
less studied. In this paper, we consider a discretetime dynamical
system with a linear temporal logic (LTL) constraint in the
presence of an adversary, which is modeled as a stochastic game.
We assume that the adversary observes the control policy before
choosing an attack strategy. We investigate two problems. In
the first problem, we synthesize a robust control policy for the
stochastic game that maximizes the probability of satisfying the
LTL constraint. A value iteration based algorithm is proposed
to compute the optimal control policy. In the second problem,
we focus on a subclass of LTL constraints, which consist of
an arbitrary LTL formula and an invariant constraint. We
then investigate the problem of computing a control policy that
minimizes the expected number of invariant constraint violations
while maximizing the probability of satisfying the arbitrary
LTL constraint. We characterize the optimality condition for
the desired control policy. A policy iteration based algorithm
is proposed to compute the control policy. We illustrate the
proposed approaches using two numerical case studies.
more »
« less
 Award ID(s):
 1656981
 NSFPAR ID:
 10131731
 Date Published:
 Journal Name:
 IEEE transactions on automatic control
 ISSN:
 00189286
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the socalled false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a doubleoracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.more » « less

Cyberphysical systems are conducting increasingly complex tasks, which are often modeled using formal languages such as temporal logic. The system’s ability to perform the required tasks can be curtailed by malicious adversaries that mount intelligent attacks. At present, however, synthesis in the presence of such attacks has received limited research attention. In particular, the problem of synthesizing a controller when the required specifications cannot be satisfied completely due to adversarial attacks has not been studied. In this paper, we focus on the minimum violation control synthesis problem under linear temporal logic constraints of a stochastic finite state discretetime system with the presence of an adversary. A minimum violation control strategy is one that satisfies the most important tasks defined by the user while violating the less important ones. We model the interaction between the controller and adversary using a concurrent Stackelberg game and present a nonlinear programming problem to formulate and solve for the optimal control policy. To reduce the computation effort, we develop a heuristic algorithm that solves the problem efficiently and demonstrate our proposed approach using a numerical case study.more » « less

We develop provably efficient reinforcement learning algorithms for twoplayer zerosum finitehorizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the leastsquares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turnbased Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a generalsum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a generalsum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCEbased scheme for optimism has not appeared in the literature and might be of interest in its own right.more » « less

In hierarchical planning for Markov decision processes (MDPs), temporal abstraction allows planning with macroactions that take place at different time scale in the form of sequential composition. In this paper, we propose a novel approach to compositional reasoning and hierarchical planning for MDPs under cosafe temporal logic constraints. In addition to sequential composition, we introduce a composition of policies based on generalized logic composition: Given subpolicies for subtasks and a new task expressed as logic compositions of subtasks, a semioptimal policy, which is optimal in planning with only subpolicies, can be obtained by simply composing subpolices. Thus, a synthesis algorithm is developed to compute optimal policies efficiently by planning with primitive actions, policies for subtasks, and the compositions of subpolicies, for maximizing the probability of satisfying constraints specified in the fragment of cosafe temporal logic. We demonstrate the correctness and efficiency of the proposed method in stochastic planning examples with a single agent and multiple task specifications.more » « less