Agents that plan and act in the real world must deal with the fact that time passes as they are planning. When timing is tight, there may be insufficient time to complete the search for a plan before it is time to act. By commencing execution before search concludes, one gains time to search by making planning and execution concurrent. However, this incurs the risk of making incorrect action choices, especially if actions are irreversible. This tradeoff between opportunity and risk is the problem addressed in this paper. Our main contribution is to formally define this setting as an abstract metareasoning problem. We find that the abstract problem is intractable. However, we identify special cases that are solvable in polynomial time, develop greedy solution algorithms, and, through tests on instances derived from search problems, find several methods that achieve promising practical performance. This work lays the foundation for a principled time-aware executive that concurrently plans and executes.
more »
« less
Evaluating Distributional Predictions of Search Time: Put Up or Shut Up Games (Extended Abstract)
Metareasoning can be a helpful technique for controlling search in situations where computation time is an important resource, such as real-time planning and search, algorithm portfolios, and concurrent planning and execution. Metareasoning often involves an estimate of the remaining search time of a running algorithm, and several ways to compute such estimates have been presented in the literature. In this paper, we argue that many applications actually require a full estimated probability distribution over the remaining time, rather than just a point estimate of expected search time. We study several methods for estimating such distributions, including some novel adaptations of existing schemes.To properly evaluate the estimates, we introduce `put-up or shut-up games', which probe the distributional estimates without requiring infeasible computation.Our experimental evaluation reveals that estimates that are more accurate in expected value do not necessarily deliver better distributions, yielding worse scores in the game.
more »
« less
- Award ID(s):
- 2008594
- PAR ID:
- 10546019
- Publisher / Repository:
- AAAI Press
- Date Published:
- Journal Name:
- Proceedings of the International Symposium on Combinatorial Search
- Volume:
- 17
- ISSN:
- 2832-9171
- Page Range / eLocation ID:
- 277 to 278
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Standard temporal planning assumes that planning takes place offline, and then execution starts at time 0. Recently, situated temporal planning was introduced, where planning starts at time 0, and execution occurs after planning terminates. Situated temporal planning reflects a more realistic scenario where time passes during planning. However, in situated temporal planning a complete plan must be generated before any action is executed. In some problems with time pressure, timing is too tight to complete planning before the first action must be executed. For example, an autonomous car that has a truck backing towards it should probably move out of the way now, and plan how to get to its destination later. In this paper, we propose a new problem setting: concurrent planning and execution, in which actions can be dispatched (executed) before planning terminates. Unlike previous work on planning and execution, we must handle wall clock deadlines that affect action applicability and goal achievement (as in situated planning) while also supporting dispatching actions before a complete plan has been found. We extend previous work on metareasoning for situated temporal planning to develop an algorithm for this new setting. Our empirical evaluation shows that when there is strong time pressure, our approach outperforms situated temporal planning.more » « less
-
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.more » « less
-
The performance of global ocean biogeochemical models can be quantified as the misfit between modeled tracer distributions and observations, which is sought to be minimized during parameter optimization. These models are computationally expensive due to the long spin‐up time required to reach equilibrium, and therefore optimization is often laborious. To reduce the required computational time, we investigate whether optimization of a biogeochemical model with shorter spin‐ups provides the same optimized parameters as one with a full‐length, equilibrated spin‐up over several millennia. We use the global ocean biogeochemical model MOPS with a range of lengths of model spin‐up and calibrate the model against synthetic observations derived from previous model runs using a derivative‐free optimization algorithm (DFO‐LS). When initiating the biogeochemical model with tracer distributions that differ from the synthetic observations used for calibration, a minimum spin‐up length of 2,000 years was required for successful optimization due to certain parameters which influence the transport of matter from the surface to the deeper ocean, where timescales are longer. However, preliminary results indicate that successful optimization may occur with an even shorter spin‐up by a judicious choice of initial condition, here the synthetic observations used for calibration, suggesting a fruitful avenue for future research.more » « less
-
Multi-modal journey planning, which allows multiple types of transport within a single trip, is becoming increasingly popular, due to a strong practical interest and an increasing availability of data. In real life, transport networks feature uncertainty. Yet, most approaches assume a deterministic environment, making plans more prone to failures such as missed connections and major delays in the arrival. This paper presents an approach to computing optimal contingent plans in multi-modal journey planning. The problem is modeled as a search in an and/or state space. We describe search enhancements used on top of the AO* algorithm. Enhancements include admissible heuristics, multiple types of pruning that preserve the completeness and the optimality, and a hybrid search approach with a deterministic and a nondeterministic search. We demonstrate an NP-hardness result, with the hardness stemming from the dynamically changing distributions of the travel time random variables. We perform a detailed empirical analysis on realistic transport networks from cities such as Montpellier, Rome and Dublin. The results demonstrate the effectiveness of our algorithmic contributions, and the benefits of contingent plans as compared to standard sequential plans, when the arrival and departure times of buses are characterized by uncertainty.more » « less
An official website of the United States government

