 NSFPAR ID:
 10416538
 Date Published:
 Journal Name:
 Journal of machine learning research
 Volume:
 23
 ISSN:
 15337928
 Page Range / eLocation ID:
 1  44
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)In the lowrank matrix completion (LRMC) problem, the lowrank assumption means that the columns (or rows) of the matrix to be completed are points on a lowdimensional linear algebraic variety. This paper extends this thinking to cases where the columns are points on a lowdimensional nonlinear algebraic variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC). Matrices whose columns belong to a union of subspaces are an important special case. We propose a LADMC algorithm that leverages existing LRMC methods on a tensorized representation of the data. For example, a secondorder tensorized representation is formed by taking the Kronecker product of each column with itself, and we consider higher order tensorizations as well. This approach will succeed in many cases where traditional LRMC is guaranteed to fail because the data are lowrank in the tensorized representation but not in the original representation. We also provide a formal mathematical justication for the success of our method. In particular, we give bounds of the rank of these data in the tensorized representation, and we prove sampling requirements to guarantee uniqueness of the solution. We also provide experimental results showing that the new approach outperforms existing stateoftheart methods for matrix completion under a union of subspaces model.more » « less

Summary In this paper, we propose an efficient numerical scheme for solving some large‐scale ill‐posed linear inverse problems arising from image restoration. In order to accelerate the computation, two different hidden structures are exploited. First, the coefficient matrix is approximated as the sum of a small number of Kronecker products. This procedure not only introduces one more level of parallelism into the computation but also enables the usage of computationally intensive matrix–matrix multiplications in the subsequent optimization procedure. We then derive the corresponding Tikhonov regularized minimization model and extend the fast iterative shrinkage‐thresholding algorithm (FISTA) to solve the resulting optimization problem. Because the matrices appearing in the Kronecker product approximation are all structured matrices (Toeplitz, Hankel, etc.), we can further exploit their fast matrix–vector multiplication algorithms at each iteration. The proposed algorithm is thus called
structured FISTA (sFISTA). In particular, we show that the approximation error introduced by sFISTA is well under control and sFISTA can reach the same image restoration accuracy level as FISTA. Finally, both the theoretical complexity analysis and some numerical results are provided to demonstrate the efficiency of sFISTA. 
We consider the development of practical stochastic quasiNewton, and in particular Kroneckerfactored block diagonal BFGS and LBFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient n is often of the order of tens of millions and the Hessian has n^ 2 elements. Consequently, computing and storing a full n times n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an LBFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a blockdiagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kroneckerfactored block diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and LBFGS approximations bounded. In tests on autoencoder feedforward network models with either nine or thirteen layers applied to three datasets, our methods outperformed or performed comparably to KFAC and stateoftheart firstorder stochastic methods.more » « less

We address the problem of highrank matrix completion with side information. In contrast to existing work dealing with side information, which assume that the data matrix is lowrank, we consider the more general scenario where the columns of the data matrix are drawn from a union of lowdimensional subspaces, which can lead to a high rank matrix. Our goal is to complete the matrix while taking advantage of the side information. To do so, we use the selfexpressive property of the data, searching for a sparse representation of each column of matrix as a combination of a few other columns. More specifically, we propose a factorization of the data matrix as the product of side information matrices with an unknown interaction matrix, under which each column of the data matrix can be reconstructed using a sparse combination of other columns. As our proposed optimization, searching for missing entries and sparse coefficients, is nonconvex and NPhard, we propose a lifting framework, where we couple sparse coefficients and missing values and define an equivalent optimization that is amenable to convex relaxation. We also propose a fast implementation of our convex framework using a Linearized Alternating Direction Method. By extensive experiments on both synthetic and real data, and, in particular, by studying the problem of multilabel learning, we demonstrate that our method outperforms existing techniques in both lowrank and highrank data regimesmore » « less

We address the problem of highrank matrix completion with side information. In contrast to existing work dealing with side information, which assume that the data matrix is lowrank, we consider the more general scenario where the columns of the data matrix are drawn from a union of lowdimensional subspaces, which can lead to a high rank matrix. Our goal is to complete the matrix while taking advantage of the side information. To do so, we use the selfexpressive property of the data, searching for a sparse representation of each column of matrix as a combination of a few other columns. More specifically, we propose a factorization of the data matrix as the product of side information matrices with an unknown interaction matrix, under which each column of the data matrix can be reconstructed using a sparse combination of other columns. As our proposed optimization, searching for missing entries and sparse coefficients, is nonconvex and NPhard, we propose a lifting framework, where we couple sparse coefficients and missing values and define an equivalent optimization that is amenable to convex relaxation. We also propose a fast implementation of our convex framework using a Linearized Alternating Direction Method. By extensive experiments on both synthetic and real data, and, in particular, by studying the problem of multilabel learning, we demonstrate that our method outperforms existing techniques in both lowrank and highrank data regimes.more » « less