skip to main content


Title: Sparse and smooth signal estimation: Convexification of L0 formulations
Signal estimation problems with smoothness and sparsity priors can be naturally modeled as quadratic optimization with L0-“norm” constraints. Since such problems are non-convex and hard-to-solve, the standard approach is, instead, to tackle their convex surrogates based on L1-norm relaxations. In this paper, we propose new iterative (convex) conic quadratic relaxations that exploit not only the L0-“norm” terms, but also the fitness and smoothness functions. The iterative convexification approach substantially closes the gap between the L0-“norm” and its L1 surrogate. These stronger relaxations lead to significantly better estimators than L1-norm approaches and also allow one to utilize affine sparsity priors. In addition, the parameters of the model and the resulting estimators are easily interpretable. Experiments with a tailored Lagrangian decomposition method indicate that the proposed iterative convex relaxations yield solutions within 1% of the exact L0-approach, and can tackle instances with up to 100,000 variables under one minute.  more » « less
Award ID(s):
1818700
NSF-PAR ID:
10289594
Author(s) / Creator(s):
; ;
Editor(s):
Mirrokni, V
Date Published:
Journal Name:
Journal of machine learning research
Volume:
22
Issue:
52
ISSN:
1532-4435
Page Range / eLocation ID:
1-43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under L1, L0, and transformed-L1 penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical findings, and compare the accuracy and sparsity trade-off among the penalties. 
    more » « less
  2. null (Ed.)
    We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a finite dimensional convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to ℓ0-ℓ1 equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models. 
    more » « less
  3. SUMMARY

    Gravity surveys constitute an important method for investigating the Earth's interior based on density contrasts related to Earth material differentials. Because lithology depends on the environment and the period of formation, there are generally clear boundaries between rocks with different lithologies. Inversions with convex functions for approximating the L0 norm are used to detect boundaries in reconstructed models. Optimizations can easily be found because of the convex transformations; however, the volume of the reconstructed model depends on the weighting parameter and the density constraint rather than the model sparsity. To determine and adapt the modelling size, a novel non-convex framework for gravity inversion is proposed. The proposed optimization aims to directly reduce the L0 norm of the density matrix. An improved iterative hard thresholding algorithm is developed to linearly reduce the L0 penalty during the inner iteration. Accordingly, it is possible to determine the modelling scale during the iteration and achieve an expected scale for the reconstructed model. Both simple and complex model experiments demonstrate that the proposed method efficiently reconstructs models. In addition, granites formed during the Yanshanian and Indosinian periods in the Nanling region, China, are reconstructed according to the modelling size evaluated in agreement with the magnetotelluric profile and density statistics of rock samples. The known ores occur at the contact zones between the sedimentary rocks and the reconstructed Yanshanian granites. The ore-forming bodies, periods, and processes are identified, providing guidance for further deep resource exploration in the study area.

     
    more » « less
  4. Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. Binarized activation offers an additional computational saving for inference. Due to vanishing gradient issue in training networks with binarized activation, coarse gradient (a.k.a. straight through estimator) is adopted in practice. In this paper, we study the problem of coarse gradient descent (CGD) learning of a one hidden layer convolutional neural network (CNN) with binarized activation function and sparse weights. It is known that when the input data is Gaussian distributed, no-overlap one hidden layer CNN with ReLU activation and general weight can be learned by GD in polynomial time at high probability in regression problems with ground truth. We propose a relaxed variable splitting method integrating thresholding and coarse gradient descent. The sparsity in network weight is realized through thresholding during the CGD training process. We prove that under thresholding of L1, L0, and transformed-L1 penalties, no-overlap binary activation CNN can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel sparsifying operation. We found explicit error estimates of sparse weights from the true weights. 
    more » « less
  5. Abstract We study the low-rank phase retrieval problem, where our goal is to recover a $d_1\times d_2$ low-rank matrix from a series of phaseless linear measurements. This is a fourth-order inverse problem, as we are trying to recover factors of a matrix that have been observed, indirectly, through some quadratic measurements. We propose a solution to this problem using the recently introduced technique of anchored regression. This approach uses two different types of convex relaxations: we replace the quadratic equality constraints for the phaseless measurements by a search over a polytope and enforce the rank constraint through nuclear norm regularization. The result is a convex program in the space of $d_1 \times d_2$ matrices. We analyze two specific scenarios. In the first, the target matrix is rank-$1$, and the observations are structured to correspond to a phaseless blind deconvolution. In the second, the target matrix has general rank, and we observe the magnitudes of the inner products against a series of independent Gaussian random matrices. In each of these problems, we show that anchored regression returns an accurate estimate from a near-optimal number of measurements given that we have access to an anchor matrix of sufficient quality. We also show how to create such an anchor in the phaseless blind deconvolution problem from an optimal number of measurements and present a partial result in this direction for the general rank problem. 
    more » « less