skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DiSProD: Differentiable Symbolic Propagation of Distributions for Planning
The paper introduces DiSProD, an online planner developed forenvironments with probabilistic transitions in continuous state andaction spaces. DiSProD builds a symbolic graph that captures thedistribution of future trajectories, conditioned on a given policy,using independence assumptions and approximate propagation ofdistributions. The symbolic graph provides a differentiablerepresentation of the policy's value, enabling efficient gradient-basedoptimization for long-horizon search. The propagation of approximatedistributions can be seen as an aggregation of many trajectories, makingit well-suited for dealing with sparse rewards and stochasticenvironments. An extensive experimental evaluation compares DiSProD tostate-of-the-art planners in discrete-time planning and real-timecontrol of robotic systems. The proposed method improves over existingplanners in handling stochastic environments, sensitivity to searchdepth, sparsity of rewards, and large action spaces. Additionalreal-world experiments demonstrate that DiSProD can control groundvehicles and surface vessels to successfully navigate around obstacles.  more » « less
Award ID(s):
2246261
PAR ID:
10499803
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
International Joint Conferences on Artificial Intelligence
Date Published:
ISBN:
978-1-956792-03-4
Page Range / eLocation ID:
5324 to 5332
Format(s):
Medium: X
Location:
Macau, SAR China
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper investigates online stochastic planning for problems with large factored state and action spaces. One promising approach in recent work estimates the quality of applicable actions in the current state through aggregate simulation from the states they reach. This leads to significant speedup, compared to search over concrete states and actions, and suffices to guide decision making in cases where the performance of a random policy is informative of the quality of a state. The paper makes two significant improvements to this approach. The first, taking inspiration from lifted belief propagation, exploits the structure of the problem to derive a more compact computation graph for aggregate simulation. The second improvement replaces the random policy embedded in the computation graph with symbolic variables that are optimized simultaneously with the search for high quality actions. This expands the scope of the approach to problems that require deep search and where information is lost quickly with random steps. An empirical evaluation shows that these ideas significantly improve performance, leading to state of the art performance on hard planning problems. 
    more » « less
  2. The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space. The graph captures the result of simulating the model in aggregate under independence assumptions, giving a symbolic representation of the value function. The algorithm supports large observation spaces using sampling networks, a representation of the process of sampling values of observations, which is integrated into the graph representation. Following previous work in MDPs this approach enables action selection in POMDPs through gradient optimization over the graph representation. This approach complements recent algorithms for POMDPs which are based on particle representations of belief states and an explicit search for action selection. Our approach enables scaling to large factored action spaces in addition to large state spaces and observation spaces. An experimental evaluation demonstrates that the algorithm provides excellent performance relative to state of the art in large POMDP problems. 
    more » « less
  3. It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes two contributions in this context. The first is to provide an analysis of the recently developed SOGBOFA heuristic planning algorithm that was shown to be effective for problems with large factored state and action spaces. It is shown that SOGBOFA can be seen as a specialized inference algorithm that computes its solutions through a combination of a symbolic variant of belief propagation and gradient ascent. The second contribution is a new solver for Marginal MAP (MMAP) inference. We introduce a new reduction from MMAP to maximum expected utility problems which are suitable for the symbolic computation in SOGBOFA. This yields a novel algebraic gradient-based solver (AGS) for MMAP. An experimental evaluation illustrates the potential of AGS in solving difficult MMAP problems. 
    more » « less
  4. The metric dimension of a graph is the smallest number of nodes required to identify allother nodes uniquely based on shortest path distances. Applications of metric dimensioninclude discovering the source of a spread in a network, canonically labeling graphs, andembedding symbolic data in low-dimensional Euclidean spaces. This survey gives a self-contained introduction to metric dimension and an overview of the quintessential resultsand applications. We discuss methods for approximating the metric dimension of generalgraphs, and specific bounds and asymptotic behavior for deterministic and random familiesof graphs. We conclude with related concepts and directions for future work. 
    more » « less
  5. Reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned, generalized Q-function can be utilized for zero-shot transfer to related problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects. 
    more » « less