skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enforcing Optimal Moving Target Defense Policies
This paper introduces an approach based on control theory to model, analyze and select optimal security policies for Moving Target Defense (MTD) deployment strategies. A Markov Decision Process (MDP) scheme is presented to model states of the system from attacking point of view. The employed value iteration method is based on the Bellman optimality equation for optimal policy selection for each state defined in the system.The model is then utilized to analyze the impact of various costs on the optimal policy. The MDP model is then applied to two case studies to evaluate the performance of the model.  more » « less
Award ID(s):
1564293 1516636
PAR ID:
10186608
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)
Page Range / eLocation ID:
753 to 759
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work considers the sample and computational complexity of obtaining an $$\epsilon$$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. We are interested in a basic and unresolved question in model based planning: is this naïve “plug-in” approach — where we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP — non-asymptotically, minimax optimal? Our main result answers this question positively. With regards to computation, our result provides a simpler approach towards minimax optimal planning: in comparison to prior model-free results, we show that using \emph{any} high accuracy, black-box planning oracle in the empirical model suffices to obtain the minimax error rate. The key proof technique uses a leave-one-out analysis, in a novel “absorbing MDP” construction, to decouple the statistical dependency issues that arise in the analysis of model-based planning; this construction may be helpful more generally. 
    more » « less
  2. In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. This serves as an extreme test for an agent's ability to effectively use historical data which is known to be critical for efficient RL. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP using the offline dataset; (b) learning a near-optimal policy in this pessimistic MDP. The design of the pessimistic MDP is such that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the pessimistic MDP. This enables the pessimistic MDP to serve as a good surrogate for purposes of policy evaluation and learning. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Empirically, MOReL matches or exceeds state-of-the-art results on widely used offline RL benchmarks. Overall, the modular design of MOReL enables translating advances in its components (for e.g., in model learning, planning etc.) to improvements in offline RL. 
    more » « less
  3. In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g., in model learning, planning etc.) to directly translate into improvements for offline RL. 
    more » « less
  4. We study optimal pricing in a single-server queueing system that can be observable or unobservable, depending on how customers receive information to estimate sojourn time. Our primary objective is to determine whether the service provider is better off making the system observable or unobservable under optimal pricing. We formulate the optimal pricing problem using Markov decision process (MDP) models for both observable and unobservable systems. For unobservable systems, the problem is studied using an MDP with a fixed-point equation as equilibrium constraints. We show that the MDPs for both observable and unobservable queues are special cases of a generalized arrivals-based MDP model, in which the optimal arrival rate (rather than price) is set in each state. Then, we show that the optimal policy that solves the generalized MDP exhibits a monotone structure in that the optimal arrival rate is non-increasing in the queue length, which allows for developing efficient algorithms to determine optimal pricing policies. Next, we show that if no customers overestimate sojourn time in the observable system, it is in the interest of the service provider to make the system observable. We also show that if all customers overestimate sojourn time, the service provider is better off making the system unobservable. Lastly, we learn from numerical results that when customers are heterogeneous in estimating their sojourn time, the service provider is expected to receive a higher gain by making the system observable if on average customers do not significantly overestimate sojourn time. 
    more » « less
  5. Opioid overdose rescue is very time-sensitive. Hence, drone-delivered naloxone has the potential to be a transformative innovation due to its easily deployable and flexible nature. We formulate a Markov Decision Process (MDP) model to dispatch the appropriate drone after an overdose request arrives and to relocate the drone to its next waiting location after having completed its current task. Since the underlying optimization problem is subject to the curse of dimensionality, we solve it using ad-hoc state aggregation and evaluate it through a simulation with higher granularity. Our simulation-based comparative study is based on emergency medical service data from the state of Indiana. We compare the optimal policy resulting from the scaled-down MDP model with a myopic policy as the baseline. We consider the impact of drone type and service area type on outcomes, which offers insights into the performance of the MDP suboptimal policy under various settings. 
    more » « less