skip to main content

This content will become publicly available on July 1, 2023

Title: Non-parametric Models for Long-term Autonomy
In this thesis we propose novel estimation techniques for localization and planning problems, which are key challenges in long-term autonomy. We take inspiration in our methods from non-parametric estimation and use tools such as kernel density estimation, non-linear least-squares optimization, binary masking, and random sampling. We show that these methods, by avoiding explicit parametric models, outperform existing methods that use them. Despite the seeming differences between localization and planning, we demonstrate in this thesis that the problems share core structural similarities. When real or simulation-sampled measurements are expensive, noisy, or high variance, non-parametric estimation techniques give higher-quality results in less time. We first address two localization problems. In order to permit localization with a set of ad hoc-placed radios, we propose an ultra-wideband (UWB) graph realization system to localize the radios. Our system achieves high accuracy and robustness by using kernel density estimation for measurement probability densities, by explicitly modeling antenna delays, and by optimizing this combination with a non-linear least squares formulation. Next, in order to then support robotic navigation, we present a flexible system for simultaneous localization and mapping (SLAM) that combines elements from both traditional dense metric SLAM and topological SLAM, using a binary "masking function" to more » focus attention. This masking function controls which lidar scans are available for loop closures. We provide several masking functions based on approximate topological class detectors. We then examine planning problems in the final chapter and in the appendix. In order to plan with uncertainty around multiple dynamic agents, we describe Monte-Carlo Policy-Tree Decision Making (MCPTDM), a framework for efficiently computing policies in partially-observable, stochastic, continuous problems. MCPTDM composes a sequence of simpler closed-loop policies and uses marginal action costs and particle repetition to improve cost estimates and sample efficiency by reducing variance. Finally, in the appendix we explore Learned Similarity Monte-Carlo Planning (LSMCP), where we seek to enhance the sample efficiency of partially observable Monte Carlo tree search-based planning by taking advantage of similarities in the final outcomes of similar states and actions. We train a multilayer perceptron to learn a similarity function which we then use to enhance value estimates in the planning. Collectively, we show in this thesis that non-parametric methods promote long-term autonomy by reducing error and increasing robustness across multiple domains. « less
Award ID(s):
Publication Date:
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents novel techniques for tightly integrated online information fusion and planning in human-autonomy teams operating in partially known environments. Motivated by dynamic target search problems, we present a new map-based sketch interface for online soft-hard data fusion. This interface lets human collaborators efficiently update map information and continuously build their own highly flexible ad hoc dictionaries for making language-based semantic observations, which can be actively exploited by autonomous agents in optimal search and information gathering problems. We formally link these capabilities to POMDP algorithms for optimal planning under uncertainty, and develop a new Dynamically Observable Monte Carlo planning (DOMCP) algorithm as an efficient means for updating online sampling-based planning policies for POMDPs with non-static observation models. DOMCP is validated on a small scale robot localization problem, and then demonstrated with our new user interface on a simulated dynamic target search scenario in a partially known outdoor environment.
  2. When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
  3. Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To address this issue, we introduce the Factored BA-POMDP model (FBA-POMDP), a framework that is able to learn a compact model of the dynamics by exploiting the underlying structure of a POMDP. The FBA-POMDP framework casts the problem as a planning task, for which we adapt the Monte-Carlo Tree Search planning algorithm and develop a belief tracking method to approximate the joint posterior over the state and model variables. Our empirical results show that this method outperforms a number of BRL baselines and is able to learn efficiently when the factorization is known, as well as learn both the factorization and the model parameters simultaneously.
  4. Abstract— A core capability of robots is to reason about mul- tiple objects under uncertainty. Partially Observable Markov Decision Processes (POMDPs) provide a means of reasoning under uncertainty for sequential decision making, but are computationally intractable in large domains. In this paper, we propose Object-Oriented POMDPs (OO-POMDPs), which represent the state and observation spaces in terms of classes and objects. The structure afforded by OO-POMDPs support a factorization of the agent’s belief into independent object distributions, which enables the size of the belief to scale linearly versus exponentially in the number of objects. We formulate a novel Multi-Object Search (MOS) task as an OO-POMDP for mobile robotics domains in which the agent must find the locations of multiple objects. Our solution exploits the structure of OO-POMDPs by featuring human language to selectively update the belief at task onset. Using this structure, we develop a new algorithm for efficiently solving OO-POMDPs: Object- Oriented Partially Observable Monte-Carlo Planning (OO- POMCP). We show that OO-POMCP with grounded language commands is sufficient for solving challenging MOS tasks both in simulation and on a physical mobile robot.
  5. We report on a systematic study of guest cation (i.e., Li, Na, or Mg) diffusion within spinel intercalation compounds, a promising class of materials for Li-, Na-, and Mg-ion batteries. Using kinetic Monte Carlo simulations, we identify factors that are responsible for a strong concentration dependence of the cation diffusion coefficient. We focus on spinels in which the guest cations prefer the octahedral sites and where diffusion is mediated by vacancy clusters. Starting with MgyTiS2, we predict an abrupt drop in the Mg diffusion coefficient that spans several orders of magnitude around y ≈ 0.5 due to the onset of highly correlated Mg diffusion. The prediction is consistent with previous experimental studies that are only able to achieve half the theoretical capacity of MgyTiS2. We next perform a parametric study of diffusion in spinels using kinetic Monte Carlo simulations applied to lattice model Hamiltonians and identify a critical topological weakness of the spinel crystal structure that makes it prone to highly correlated cation diffusion at intermediate-to-high guest cation concentrations. We find that the onset of this highly correlated diffusion becomes more pronounced as the nearest-neighbor repulsion between pairs of guest cations becomes stronger, since this increases the dependence of long-rangemore »cation diffusion on triple-vacancy clusters. The results of this study provide guidance with which the concentration dependence of cation diffusion coefficients in spinel can be tailored to reduce the onset of sluggish diffusion at high cation concentrations. The conclusions drawn from this study also apply to other close-packed anion hosts such as disordered rocksalt electrodes and partially ordered spinels.« less