skip to main content


Title: Safe and Efficient Model Predictive Control Using Neural Networks: An Interior Point Approach
Model predictive control (MPC) provides a useful means for controlling systems with constraints, but suffers from the computational burden of repeatedly solving an optimization problem in real time. Offline (explicit) solutions for MPC attempt to alleviate real time computational challenges using either multiparametric programming or machine learning. The multiparametric approaches are typically applied to linear or quadratic MPC problems, while learning-based approaches can be more flexible and are less memory-intensive. Existing learning-based approaches offer significant speedups, but the challenge becomes ensuring constraint satisfaction while maintaining good performance. In this paper, we provide a neural network parameterization of MPC policies that explicitly encodes the constraints of the problem. By exploring the interior of the MPC feasible set in an unsupervised learning paradigm, the neural network finds better policies faster than projection-based methods and exhibits substantially shorter solve times. We use the proposed policy to solve a robust MPC problem, and demonstrate the performance and computational gains on a standard test system.  more » « less
Award ID(s):
1807142
NSF-PAR ID:
10388612
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the IEEE Conference on Decision and Control
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Safe operations of autonomous mobile robots in close proximity to humans, creates a need for enhanced trajectory tracking (with low tracking errors). Linear optimal control techniques such as Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) have been used successfully for low-speed applications while leveraging their model-based methodology with manageable computational demands. However, model and parameter uncertainties or other unmodeled nonlinearities may cause poor control actions and constraint violations. Nonlinear MPC has emerged as an alternate optimal-control approach but needs to overcome real-time deployment challenges (including fast sampling time, design complexity, and limited computational resources). In recent years, the optimal control-based deployments have benefitted enormously from the ability of Deep Neural Networks (DNNs) to serve as universal function approximators. This has led to deployments in a plethora of previously inaccessible applications – but many aspects of generalizability, benchmarking, and systematic verification and validation coupled with benchmarking have emerged. This paper presents a novel approach to fusing Deep Reinforcement Learning-based (DRL) longitudinal control with a traditional PID lateral controller for autonomous navigation. Our approach follows (i) Generation of an adequate fidelity simulation scenario via a Real2Sim approach; (ii) training a DRL agent within this framework; (iii) Testing the performance and generalizability on alternate scenarios. We use an initial tuned set of the lateral PID controller gains for observing the vehicle response over a range of velocities. Then we use a DRL framework to generate policies for an optimal longitudinal controller that successfully complements the lateral PID to give the best tracking performance for the vehicle. 
    more » « less
  2. Abstract

    This paper introduces a new graph neural network architecture for learning solutions of Capacitated Vehicle Routing Problems (CVRP) as policies over graphs. CVRP serves as an important benchmark for a wide range of combinatorial planning problems, which can be adapted to manufacturing, robotics and fleet planning applications. Here, the specific aim is to demonstrate the significant real-time executability and (beyond training) scalability advantages of the new graph learning approach over existing solution methods. While partly drawing motivation from recent graph learning methods that learn to solve CO problems such as multi-Traveling Salesman Problem (mTSP) and VRP, the proposed neural architecture presents a novel encoder-decoder architecture. Here the encoder is based on Capsule networks, which enables better representation of local and global information with permutation invariant node embeddings; and the decoder is based on the Multi-head attention (MHA) mechanism allowing sequential decisions. This architecture is trained using a policy gradient Reinforcement Learning process. The performance of our approach is favorably compared with state-of-the-art learning and non-learning methods for a benchmark suite of Capacitated-VRP (CVRP) problems. A further study on the CVRP with demand uncertainties is conducted to explore how this Capsule-Attention Mechanism architecture can be extended to handle real-world uncertainties by embedding them through the encoder.

     
    more » « less
  3. The main objective of Personalized Tour Recommendation (PTR) is to generate a sequence of point-of-interest (POIs) for a particular tourist, according to the user-specific constraints such as duration time, start and end points, the number of attractions planned to visit, and so on. Previous PTR solutions are based on either heuristics for solving the orienteering problem to maximize a global reward with a specified budget or approaches attempting to learn user visiting preferences and transition patterns with the stochastic process or recurrent neural networks. However, existing learning methodologies rely on historical trips to train the model and use the next visited POI as the supervised signal, which may not fully capture the coherence of preferences and thus recommend similar trips to different users, primarily due to the data sparsity problem and long-tailed distribution of POI popularity. This work presents a novel tour recommendation model by distilling knowledge and supervision signals from the trips in a self-supervised manner. We propose Contrastive Trajectory Learning for Tour Recommendation (CTLTR), which utilizes the intrinsic POI dependencies and traveling intent to discover extra knowledge and augments the sparse data via pre-training auxiliary self-supervised objectives. CTLTR provides a principled way to characterize the inherent data correlations while tackling the implicit feedback and weak supervision problems by learning robust representations applicable for tour planning. We introduce a hierarchical recurrent encoder-decoder to identify tourists’ intentions and use the contrastive loss to discover subsequence semantics and their sequential patterns through maximizing the mutual information. Additionally, we observe that a data augmentation step as the preliminary of contrastive learning can solve the overfitting issue resulting from data sparsity. We conduct extensive experiments on a range of real-world datasets and demonstrate that our model can significantly improve the recommendation performance over the state-of-the-art baselines in terms of both recommendation accuracy and visiting orders. 
    more » « less
  4. The large number of antennas in massive MIMO systems allows the base station to communicate with multiple users at the same time and frequency resource with multi-user beamforming. However, highly correlated user channels could drastically impede the spectral efficiency that multi-user beamforming can achieve. As such, it is critical for the base station to schedule a suitable group of users in each time and frequency resource block to achieve maximum spectral efficiency while adhering to fairness constraints among the users. In this paper, we consider the resource scheduling problem for massive MIMO systems with its optimal solution known to be NP-hard. Inspired by recent achievements in deep reinforcement learning (DRL) to solve problems with large action sets, we propose SMART, a dynamic scheduler for massive MIMO based on the state-of-the-art Soft Actor-Critic (SAC) DRL model and the K-Nearest Neighbors (KNN) algorithm. Through comprehensive simulations using realistic massive MIMO channel models as well as real-world datasets from channel measurement experiments, we demonstrate the effectiveness of our proposed model in various channel conditions. Our results show that our proposed model performs very close to the optimal proportionally fair (Opt-PF) scheduler in terms of spectral efficiency and fairness with more than one order of magnitude lower computational complexity in medium network sizes where Opt-PF is computationally feasible. Our results also show the feasibility and high performance of our proposed scheduler in networks with a large number of users and resource blocks. 
    more » « less
  5. null (Ed.)
    With increase in the frequency of natural disasters such as hurricanes that disrupt the supply from the grid, there is a greater need for resiliency in electric supply. Rooftop solar photovoltaic (PV) panels along with batteries can provide resiliency to a house in a blackout due to a natural disaster. Our previous work showed that intelligence can reduce the size of a PV+battery system for the same level of post-blackout service compared to a conventional system that does not employ intelligent control. The intelligent controller proposed is based on model predictive control (MPC), which has two main challenges. One, it requires simple yet accurate models as it involves real-time optimization. Two, the discrete actuation for residential loads (on/off) makes the underlying optimization problem a mixed-integer program (MIP) which is challenging to solve. An attractive alternative to MPC is reinforcement learning (RL) as the real-time control computation is both model-free and simple. These points of interest accompany certain trade-offs; RL requires computationally expensive offline learning, and its performance is sensitive to various design choices. In this work, we propose an RL-based controller. We compare its performance with the MPC controller proposed in our prior work and a non-intelligent baseline controller. The RL controller is found to provide a resiliency performance — by commanding critical loads and batteries—similar to MPC with a significant reduction in computational effort. 
    more » « less