Optimal exploration of engineering systems can be guided by the principle of Value of Information (VoI), which accounts for the topological important of components, their reliability and the management costs. For series systems, in most cases higher inspection priority should be given to unreliable components. For redundant systems such as parallel systems, analysis of one-shot decision problems shows that higher inspection priority should be given to more reliable components. This paper investigates the optimal exploration of redundant systems in long-term decision making with sequential inspection and repairing. When the expected, cumulated, discounted cost is considered, it may become more efficient to give higher inspection priority to less reliable components, in order to preserve system redundancy. To investigate this problem, we develop a Partially Observable Markov Decision Process (POMDP) framework for sequential inspection and maintenance of redundant systems, where the VoI analysis is embedded in the optimal selection of exploratory actions. We investigate the use of alternative approximate POMDP solvers for parallel and more general systems, compare their computation complexities and performance, and show how the inspection priorities depend on the economic discount factor, the degradation rate, the inspection precision, and the repair cost.
more »
« less
Cyber vulnerability maintenance policies that address the incomplete nature of inspection
Abstract In cybersecurity, incomplete inspection, resulting mainly from computers being turned off during the scan, leads to a challenge for scheduling maintenance actions. This article proposes the application of partially observable decision processes to derive cost‐effective cyber maintenance actions that minimize total costs. We consider several types of hosts having vulnerabilities at various levels of severity. The maintenance cost structure in our proposed model consists of the direct costs of maintenance actions in addition to potential incident costs associated with different security states. To assess the benefits of optimal policies obtained from partially observable Markov decision processes, we use real‐world data from a major university. Compared with alternative policies using simulations, the optimal control policies can significantly reduce expected maintenance expenditures per host and relatively quickly mitigate the most important vulnerabilities.
more »
« less
- Award ID(s):
- 1912166
- PAR ID:
- 10453980
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Applied Stochastic Models in Business and Industry
- Volume:
- 35
- Issue:
- 6
- ISSN:
- 1524-1904
- Page Range / eLocation ID:
- p. 1390-1410
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Optimal Joint Defense and Monitoring for Networks Security under Uncertainty: A POMDP‐Based ApproachThe increasing interconnectivity in our infrastructure poses a significant security challenge, with external threats having the potential to penetrate and propagate throughout the network. Bayesian attack graphs have proven to be effective in capturing the propagation of attacks in complex interconnected networks. However, most existing security approaches fail to systematically account for the limitation of resources and uncertainty arising from the complexity of attacks and possible undetected compromises. To address these challenges, this paper proposes a partially observable Markov decision process (POMDP) model for network security under uncertainty. The POMDP model accounts for uncertainty in monitoring and defense processes, as well as the probabilistic attack propagation. This paper develops two security policies based on the optimal stationary defense policy for the underlying POMDP state process (i.e., a network with known compromises): the estimation‐based policy that performs the defense actions corresponding to the optimal minimum mean square error state estimation and the distribution‐based policy that utilizes the posterior distribution of network compromises to make defense decisions. Optimal monitoring policies are designed to specifically support each of the defense policies, allowing dynamic allocation of monitoring resources to capture network vulnerabilities/compromises. The performance of the proposed policies is examined in terms of robustness, accuracy, and uncertainty using various numerical experiments.more » « less
-
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)Planning in real-world settings often entails addressing partial observability while aligning with users’ requirements. We present a novel framework for expressing users’ constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goal- oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment. Analysis proves that our algorithms converge to the optimal user-aligned behavior in the limit. Empirical results show that parameterized BSQ policies provide a computationally feasible approach for user-aligned planning in partially observable settings.more » « less
-
Abstract Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.more » « less
-
null (Ed.)Moving target defense (MTD) is a proactive defense approach that aims to thwart attacks by continuously changing the attack surface of a system (e.g., changing host or network configurations), thereby increasing the adversary’s uncertainty and attack cost. To maximize the impact of MTD, a defender must strategically choose when and what changes to make, taking into account both the characteristics of its system as well as the adversary’s observed activities. Finding an optimal strategy for MTD presents a significant challenge, especially when facing a resourceful and determined adversary who may respond to the defender’s actions. In this paper, we propose a multi-agent partially-observable Markov Decision Process model of MTD and formulate a two-player general-sum game between the adversary and the defender. To solve this game, we propose a multi-agent reinforcement learning framework based on the double oracle algorithm. Finally, we provide experimental results to demonstrate the effectiveness of our framework in finding optimal policies.more » « less
An official website of the United States government
