 Award ID(s):
 1751636
 Publication Date:
 NSFPAR ID:
 10326613
 Journal Name:
 IEEE transactions on control systems technology
 ISSN:
 15580865
 Sponsoring Org:
 National Science Foundation
More Like this

We propose a neural network approach for solving highdimensional optimal control problems. In particular, we focus on multiagent control problems with obstacle and collision avoidance. These problems immediately become highdimensional, even for moderate phasespace dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and HamiltonJacobiBellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls' optimality is achieved on a large portion of the statespace. Our approach is gridfree and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach's effectiveness on a 150dimensional multiagent problem with obstacles.

We present a closedloop multiarm motion planner that is scalable and flexible with team size. Traditional multiarm robotic systems have relied on centralized motion planners, whose run times often scale exponentially with team size, and thus, fail to handle dynamic environments with openloop control. In this paper, we tackle this problem with multiagent reinforcement learning, where a shared policy network is trained to control each individual robot arm to reach its target endeffector pose given observations of its workspace state and target endeffector pose. The policy is trained using Soft ActorCritic with expert demonstrations from a samplingbased motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sublinearly and can be deployed on multiarm systems with variable team sizes. Thanks to the closedloop and decentralized formulation, our approach generalizes to 510 multiarm systems and dynamic moving targets (>90% success rate for a 10arm system), despite being trained on only 14 arm planning tasks with static targets.

Abstract Supervised machine learning via artificial neural network (ANN) has gained significant popularity for many geomechanics applications that involves multi‐phase flow and poromechanics. For unsaturated poromechanics problems, the multi‐physics nature and the complexity of the hydraulic laws make it difficult to design the optimal setup, architecture, and hyper‐parameters of the deep neural networks. This paper presents a meta‐modeling approach that utilizes deep reinforcement learning (DRL) to automatically discover optimal neural network settings that maximize a pre‐defined performance metric for the machine learning constitutive laws. This meta‐modeling framework is cast as a Markov Decision Process (MDP) with well‐defined states (subsets of states representing the proposed neural network (NN) settings), actions, and rewards. Following the selection rules, the artificial intelligence (AI) agent, represented in DRL via NN, self‐learns from taking a sequence of actions and receiving feedback signals (rewards) within the selection environment. By utilizing the Monte Carlo Tree Search (MCTS) to update the policy/value networks, the AI agent replaces the human modeler to handle the otherwise time‐consuming trial‐and‐error process that leads to the optimized choices of setup from a high‐dimensional parametric space. This approach is applied to generate two key constitutive laws for the unsaturated poromechanics problems: (1) the path‐dependent retentionmore »

We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rankone data matrices, we prove that finite twolayer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto encoders. In higher dimensions, we show that the training problem for twolayer networks can be cast as a finite dimensional convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cuttingplane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We alsomore »

We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since there are no explicit targets for the hidden layers (Rumelhart, Hinton, & Williams, 1986). We consider without loss of generality the twolayer case and present a general framework that explicitly characterizes a target for the hidden layer that is optimal for minimizing the objective function of the network. This characterization then makes possible a purely greedy training scheme that learns one layer at a time, starting from the input layer. We provide instantiations of the abstract framework under certain architectures and objective functions. Based on these instantiations, we present a layerwise training algorithm for an llayer feedforward network for classification, wheremore »