Often in manufacturing systems, scenarios arise where the demand for maintenance exceeds the capacity of maintenance resources. This results in the problem of allocating the limited resources among machines competing for them. This maintenance scheduling problem can be formulated as a Markov decision process (MDP) with the goal of finding the optimal dynamic maintenance action given the current system state. However, as the system becomes more complex, solving an MDP suffers from the curse of dimensionality. To overcome this issue, we propose a two-stage approach that first optimizes a static condition-based maintenance (CBM) policy using a genetic algorithm (GA) and then improves the policy online via Monte Carlo tree search (MCTS). The static policy significantly reduces the state space of the online problem by allowing us to ignore machines that are not sufficiently degraded. Furthermore, we formulate MCTS to seek a maintenance schedule that maximizes the long-term production volume of the system to reconcile the conflict between maintenance and production objectives. We demonstrate that the resulting online policy is an improvement over the static CBM policy found by GA.
more »
« less
Online Maintenance Prioritization Via Monte Carlo Tree Search and Case-Based Reasoning
Abstract When maintenance resources in a manufacturing system are limited, a challenge arises in determining how to allocate these resources among multiple competing maintenance jobs. This work formulates an online prioritization problem to tackle this challenge using a Markov decision process (MDP) to model the system behavior and Monte Carlo tree search (MCTS) to seek optimal maintenance actions in various states of the system. Further, case-based reasoning (CBR) is adopted to retain and reuse search experience gathered from MCTS to reduce the computational effort needed over time and to improve decision-making efficiency. The proposed method results in increased system throughput when compared to existing methods of maintenance prioritization while also reducing the computation time needed to identify optimal maintenance actions as more information is gathered. This is especially beneficial in manufacturing settings where maintenance decisions must be made quickly to minimize the negative performance impact of machine downtime.
more »
« less
- Award ID(s):
- 1854659
- PAR ID:
- 10337308
- Date Published:
- Journal Name:
- Journal of Computing and Information Science in Engineering
- Volume:
- 22
- Issue:
- 4
- ISSN:
- 1530-9827
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS’s convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy’s actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.more » « less
-
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.more » « less
-
Abstract In cybersecurity, incomplete inspection, resulting mainly from computers being turned off during the scan, leads to a challenge for scheduling maintenance actions. This article proposes the application of partially observable decision processes to derive cost‐effective cyber maintenance actions that minimize total costs. We consider several types of hosts having vulnerabilities at various levels of severity. The maintenance cost structure in our proposed model consists of the direct costs of maintenance actions in addition to potential incident costs associated with different security states. To assess the benefits of optimal policies obtained from partially observable Markov decision processes, we use real‐world data from a major university. Compared with alternative policies using simulations, the optimal control policies can significantly reduce expected maintenance expenditures per host and relatively quickly mitigate the most important vulnerabilities.more » « less
-
Performing object retrieval in real-world workspaces must tackle challenges including uncertainty and clutter. One option is to apply prehensile operations, which can be time consuming in highly-cluttered scenarios. On the other hand, non-prehensile actions, such as pushing simultaneously multiple objects, can help to quickly clear a cluttered workspace and retrieve a target object. Such actions, however, can also lead to increased uncertainty as it is difficult to estimate the outcome of pushing operations. The proposed framework in this work integrates topological tools and Monte-Carlo Tree Search (MCTS) to achieve effective and robust pushing for object retrieval. It employs persistent homology to automatically identify manageable clusters of blocking objects without the need for manually adjusting hyper-parameters. Then, MCTS uses this information to explore feasible actions to push groups of objects, aiming to minimize the number of operations needed to clear the path to the target. Real-world experiments using a Baxter robot, which involves some noise in actuation, show that the proposed framework achieves a higher success rate in solving retrieval tasks in dense clutter than alternatives. Moreover, it produces solutions with few pushing actions improving the overall execution time. More critically, it is robust enough that it allows one to plan the sequence of actions offline and then execute them reliably on a Baxter robot.more » « less
An official website of the United States government

