 NSFPAR ID:
 10213291
 Date Published:
 Journal Name:
 Annual Review of Control, Robotics, and Autonomous Systems
 Volume:
 4
 Issue:
 1
 ISSN:
 25735144
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We consider damped stochastic systems in a controlled (timevarying) potential and study their transition between specified Gibbsequilibria states in finite time. By the second law of thermody namics, the minimum amount of work needed to transition from one equilibrium state to another is the difference between the Helmholtz free energy of the two states and can only be achieved by a reversible (infinitely slow) process. The minimal gap between the work needed in a finitetime transition and the work during a reversible one, turns out to equal the square of the optimal mass transport (Wasserstein 2) distance between the two endpoint distributions times the inverse of the duration needed for the transition. This result, in fact, relates nonequilibrium optimal control strategies (protocols) to gradient flows of entropy functionals via the JordanKinderlehrerOtto scheme. The purpose of this paper is to introduce ideas and results from the emerging field of stochastic thermodynamics in the setting of classical regulator theory, and to draw connections and derive such fundamental relations from a control perspective in a multivariable setting.more » « less

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the probably approximately correct (PAC)Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (relative entropy programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural networkbased grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural networkbased policies.

null (Ed.)Abstract We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel, which only depends on pairwise distances. We study the problem of inferring this interaction kernel from observations of the positions of the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to this inverse problem, based on a regularized maximum likelihood estimator constrained to suitable hypothesis spaces adaptive to data. We show that a coercivity condition enables us to control the condition number of this problem and prove the consistency of our estimator, and that in fact it converges at a nearoptimal learning rate, equal to the min–max rate of onedimensional nonparametric regression. In particular, this rate is independent of the dimension of the state space, which is typically very high. We also analyze the discretization errors in the case of discretetime observations, showing that it is of order 1/2 in terms of the time spacings between observations. This term, when large, dominates the sampling error and the approximation error, preventing convergence of the estimator. Finally, we exhibit an efficient parallel algorithm to construct the estimator from data, and we demonstrate the effectiveness of our algorithm with numerical tests on prototype systems including stochastic opinion dynamics and a LennardJones model.more » « less

The GromovWasserstein (GW) formalism can be seen as a generalization of the optimal transport (OT) formalism for comparing two distributions associated with different metric spaces. It is a quadratic optimization problem and solving it usually has computational costs that can rise sharply if the problem size exceeds a few hundred points. Recently fast techniques based on entropy regularization have being developed to solve an approximation of the GW problem quickly. There are issues, however, with the numerical convergence of those regularized approximations to the true GW solution. To circumvent those issues, we introduce a novel strategy to solve the discrete GW problem using methods taken from statistical physics. We build a temperaturedependent free energy function that reflects the GW problem’s constraints. To account for possible differences of scales between the two metric spaces, we introduce a scaling factor s in the definition of the energy. From the extremum of the free energy, we derive a mapping between the two probability measures that are being compared, as well as a distance between those measures. This distance is equal to the GW distance when the temperature goes to zero. The optimal scaling factor itself is obtained by minimizing the free energy with respect to s. We illustrate our approach on the problem of comparing shapes defined by unstructured triangulations of their surfaces. We use several synthetic and “real life” datasets. We demonstrate the accuracy and automaticity of our approach in nonrigid registration of shapes. We provide numerical evidence that there is a strong correlation between the GW distances computed from lowresolution, surfacebased representations of proteins and the analogous distances computed from atomistic models of the same proteins.more » « less

Abstract An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $\mu $ and $\nu $, known only through a finite set of independent samples $(x_i)_{i=1..n}$ and $(y_j)_{j=1..m}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of the distributions underlying the data. Specifically, instead of a discrete pointbypoint assignment, the new procedure seeks an optimal map $T(x)$ defined for all $x$, minimizing the Kullback–Leibler divergence between $(T(x_i))$ and the target $(y_j)$. The relative entropy is given a samplebased, variational characterization, thereby creating an adversarial setting: as one player seeks to push forward one distribution to the other, the second player develops features that focus on those areas where the two distributions fail to match. The procedure solves local problems that seek the optimal transfer between consecutive, intermediate distributions between $\mu $ and $\nu $. As a result, maps of arbitrary complexity can be built by composing the simple maps used for each local problem. Displaced interpolation is used to guarantee global from local optimality. The procedure is illustrated through synthetic examples in one and two dimensions.