Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time
We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an 2 norm regularized convex program. We then show that multi-layer circular CNN training problems with a single ReLU layer are equivalent to an 1 regularized convex program that encourages sparsity in the spectral domain. We also extend these results to three-layer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.
Authors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10310558
Journal Name:
International Conference on Learning Representations (ICLR) 2021
2. We develop exact representations of training twolayer neural networks with rectified linear units (ReLUs) in terms of a single convex program with number of variables polynomial in the number of training samples and the number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. We show that ReLU networks trained with standard weight decay are equivalent to block 1 penalized convex models. Moreover, we show that certain standard convolutional linear networks are equivalent semidefinite programs which can be simplified to 1 regularized linear models in a polynomial sized discrete Fourier feature space.