 Award ID(s):
 1838179
 NSFPAR ID:
 10310557
 Date Published:
 Journal Name:
 International Conference on Learnining Representations (ICLR)
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We develop a convex analytic approach to analyze finite width twolayer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple solutions are encouraged via its convex geometrical properties. We then leverage this characterization to show that an optimal set of parameters yield linear spline interpolation for regression problems involving one dimensional or rankone data. We also characterize the classification decision regions in terms of a kernel matrix and minimum `1norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain predictions of finite width networks. Our convex geometric characterization also provides intuitive explanations of hidden neurons as autoencoders. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints. Then, we apply certain convex relaxations and introduce a cuttingplane algorithm to globally optimize the network. We further analyze the exactness of the relaxations to provide conditions for the convergence to a global optimum. Our analysis also shows that optimal network parameters can be also characterized as interpretable closedform formulas in some practically relevant special cases.more » « less

null (Ed.)We develop exact representations of training twolayer neural networks with rectified linear units (ReLUs) in terms of a single convex program with number of variables polynomial in the number of training samples and the number of hidden neurons. Our theory utilizes semiinfinite duality and minimum norm regularization. We show that ReLU networks trained with standard weight decay are equivalent to block `1 penalized convex models. Moreover, we show that certain standard convolutional linear networks are equivalent semidefinite programs which can be simplified to `1 regularized linear models in a polynomial sized discrete Fourier feature space.more » « less

Abstract Given the costs and a feasible solution for a minimum cost flow problem on a countably infinite network, inverse optimization involves finding new costs that are close to the original ones and that make the given solution optimal. We study this problem using the weighted absolute sum metric to quantify closeness of cost vectors. We provide sufficient conditions under which known results from inverse optimization in minimum cost flow problems on finite networks extend to the countably infinite case. These conditions ensure that recent duality results on countably infinite linear programs can be applied to our setting. Specifically, they enable us to prove that the inverse optimization problem can be reformulated as a capacitated, minimum cost circulation problem on a countably infinite network. Finite‐dimensional truncations of this problem can be solved in polynomial time when the weights equal one, which yields an efficient solution method. The circulation problem can also be solved via the shadow simplex method, where each finite‐dimensional truncation is tackled using the usual network Simplex algorithm that is empirically known to be computationally efficient. We illustrate these results on an infinite horizon shortest path problem.

We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semiinfinite duality to obtain equivalent convex optimization problems for several two and threelayer CNN architectures. We first prove that twolayer CNNs can be globally optimized via an `2 norm regularized convex program. We then show that multilayer circular CNN training problems with a single ReLU layer are equivalent to an `1 regularized convex program that encourages sparsity in the spectral domain. We also extend these results to threelayer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.more » « less

null (Ed.)We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rankone data matrices, we prove that finite twolayer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto encoders. In higher dimensions, we show that the training problem for twolayer networks can be cast as a finite dimensional convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cuttingplane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to ℓ0ℓ1 equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models.more » « less