skip to main content

Title: Learning Markov Models Via Low-Rank Optimization
Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite having a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum-likelihood estimation methods: a convex approach with nuclear norm regularization and a nonconvex approach with rank constraint. We explicitly derive the statistical rates of both estimators in terms of the Kullback-Leiber divergence and the [Formula: see text] error and also establish a minimax lower bound to assess the tightness of these rates. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.
Authors:
; ; ;
Award ID(s):
2015366
Publication Date:
NSF-PAR ID:
10331950
Journal Name:
Operations Research
ISSN:
0030-364X
Sponsoring Org:
National Science Foundation
More Like this
  1. Planning safe trajectories for nonlinear dynamical systems subject to model uncertainty and disturbances is challenging. In this work, we present a novel approach to tackle chance-constrained trajectory planning problems with nonconvex constraints, whereby obstacle avoidance chance constraints are reformulated using the signed distance function. We propose a novel sequential convex programming algorithm and prove that under a discrete time problem formulation, it is guaranteed to converge to a solution satisfying first-order optimality conditions. We demonstrate the approach on an uncertain 6 degrees of freedom spacecraft system and show that the solutions satisfy a given set of chance constraints.
  2. Abstract

    Low-rank matrix models have been universally useful for numerous applications, from classical system identification to more modern matrix completion in signal processing and statistics. The nuclear norm has been employed as a convex surrogate of the low-rankness since it induces a low-rank solution to inverse problems. While the nuclear norm for low rankness has an excellent analogy with the $\ell _1$ norm for sparsity through the singular value decomposition, other matrix norms also induce low-rankness. Particularly as one interprets a matrix as a linear operator between Banach spaces, various tensor product norms generalize the role of the nuclear norm. We provide a tensor-norm-constrained estimator for the recovery of approximately low-rank matrices from local measurements corrupted with noise. A tensor-norm regularizer is designed to adapt to the local structure. We derive statistical analysis of the estimator over matrix completion and decentralized sketching by applying Maurey’s empirical method to tensor products of Banach spaces. The estimator provides a near-optimal error bound in a minimax sense and admits a polynomial-time algorithm for these applications.

  3. Edge machine learning can deliver low-latency and private artificial intelligent (AI) services for mobile devices by leveraging computation and storage resources at the network edge. This paper presents an energy-efficient edge processing framework to execute deep learning inference tasks at the edge computing nodes whose wireless connections to mobile devices are prone to channel uncertainties. Aimed at minimizing the sum of computation and transmission power consumption with probabilistic quality-of-service (QoS) constraints, we formulate a joint inference tasking and downlink beamforming problem that is characterized by a group sparse objective function. We provide a statistical learning based robust optimization approach to approximate the highly intractable probabilistic-QoS constraints by nonconvex quadratic constraints, which are further reformulated as matrix inequalities with a rank-one constraint via matrix lifting. We design a reweighted power minimization approach by iteratively reweighted ℓ1 minimization with difference-of-convex-functions (DC) regularization and updating weights, where the reweighted approach is adopted for enhancing group sparsity whereas the DC regularization is designed for inducing rank-one solutions. Numerical results demonstrate that the proposed approach outperforms other state-of-the-art approaches.
  4. d. Many of the infrastructure sectors that are considered to be crucial by the Department of Homeland Security include networked systems (physical and temporal) that function to move some commodity like electricity, people, or even communication from one location of importance to another. The costs associated with these flows make up the price of the network’s normal functionality. These networks have limited capacities, which cause the marginal cost of a unit of flow across an edge to increase as congestion builds. In order to limit the expense of a network’s normal demand we aim to increase the resilience of the system and specifically the resilience of the arc capacities. Divisions of critical infrastructure have faced difficulties in recent years as inadequate resources have been available for needed upgrades and repairs. Without being able to determine future factors that cause damage both minor and extreme to the networks, officials must decide how to best allocate the limited funds now so that these essential systems can withstand the heavy weight of society’s reliance. We model these resource allocation decisions using a two-stage stochastic program (SP) for the purpose of network protection. Starting with a general form for a basic two-stage SP, wemore »enforce assumptions that specify characteristics key to this type of decision model. The second stage objective—which represents the price of the network’s routine functionality—is nonlinear, as it reflects the increasing marginal cost per unit of additional flow across an arc. After the model has been designed properly to reflect the network protection problem, we are left with a nonconvex, nonlinear, nonseparable risk-neutral program. This research focuses on key reformulation techniques that transform the problematic model into one that is convex, separable, and much more solvable. Our approach focuses on using perspective functions to convexify the feasibility set of the second stage and second order conic constraints to represent nonlinear constraints in a form that better allows the use of computational solvers. Once these methods have been applied to the risk-neutral model we introduce a risk measure into the first stage that allows us to control the balance between an efficient, solvable model and the need to hedge against extreme events. Using Benders cuts that exploit linear separability, we give a decomposition and solution algorithm for the general network model. The innovations included in this formulation are then implemented on a transportation network with given flow demand« less
  5. Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates andmore »the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction« less