skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Training Quantized Neural Networks to Global Optimality via Semidefinite Programming
Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in two-layer neural networks from the recent literature, semidefinite lifting, and Grothendieck’s identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality provably in polynomial time in all relevant parameters via tight semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method.  more » « less
Award ID(s):
2037304
PAR ID:
10350300
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
162
ISSN:
2640-3498
Page Range / eLocation ID:
694-704
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in twolayer neural networks from the recent literature, semidefinite lifting, and Grothendieck’s identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality provably in polynomial time in all relevant parameters via tight semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method. 
    more » « less
  2. We propose an enhanced semidefinite program (SDP) relaxation to enable the tight and efficient verification of neural networks (NNs). The tightness improvement is achieved by introducing a nonlinear constraint to existing SDP relaxations previously proposed for NN verification. The efficiency of the proposal stems from the iterative nature of the proposed algorithm in that it solves the resulting non-convex SDP by recursively solving auxiliary convex layer-based SDP problems. We show formally that the solution generated by our algorithm is tighter than state-of-the-art SDP-based solutions for the problem. We also show that the solution sequence converges to the optimal solution of the non-convex enhanced SDP relaxation. The experimental results on standard benchmarks in the area show that our algorithm achieves the state-of-the-art performance whilst maintaining an acceptable computational cost. 
    more » « less
  3. null (Ed.)
    We demonstrate digital predistortion (DPD) using a novel, neural-network (NN) method to combat the nonlinearities in power amplifiers (PAs), which limit the power efficiency of mobile devices, increase the error vector magnitude, and cause inadequate spectral containment. DPD is commonly done with polynomial-based methods that use an indirect-learning architecture (ILA) which can be computationally intensive, especially for mobile devices, and overly sensitive to noise. Our approach using NNs avoids the problems associated with ILAs by first training a NN to model the PA then training a predistorter by backpropagating through the PA NN model. The NN DPD effectively learns the unique PA distortions, which may not easily fit a polynomial-based model, and hence may offer a favorable tradeoff between computation overhead and DPD performance. We demonstrate the performance of our NN method using two different power amplifier systems and investigate the complexity tradeoffs. 
    more » « less
  4. null (Ed.)
    Recently decentralized optimization attracts much attention in machine learning because it is more communication-efficient than the centralized fashion. Quantization is a promising method to reduce the communication cost via cutting down the budget of each single communication using the gradient compression. To further improve the communication efficiency, more recently, some quantized decentralized algorithms have been studied. However, the quantized decentralized algorithm for nonconvex constrained machine learning problems is still limited. Frank-Wolfe (a.k.a., conditional gradient or projection-free) method is very efficient to solve many constrained optimization tasks, such as low-rank or sparsity-constrained models training. In this paper, to fill the gap of decentralized quantized constrained optimization, we propose a novel communication-efficient Decentralized Quantized Stochastic Frank-Wolfe (DQSFW) algorithm for non-convex constrained learning models. We first design a new counterexample to show that the vanilla decentralized quantized stochastic Frank-Wolfe algorithm usually diverges. Thus, we propose DQSFW algorithm with the gradient tracking technique to guarantee the method will converge to the stationary point of non-convex optimization safely. In our theoretical analysis, we prove that to achieve the stationary point our DQSFW algorithm achieves the same gradient complexity as the standard stochastic Frank-Wolfe and centralized Frank-Wolfe algorithms, but has much less communication cost. Experiments on matrix completion and model compression applications demonstrate the efficiency of our new algorithm. 
    more » « less
  5. null (Ed.)
    We develop exact representations of training twolayer neural networks with rectified linear units (ReLUs) in terms of a single convex program with number of variables polynomial in the number of training samples and the number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. We show that ReLU networks trained with standard weight decay are equivalent to block `1 penalized convex models. Moreover, we show that certain standard convolutional linear networks are equivalent semidefinite programs which can be simplified to `1 regularized linear models in a polynomial sized discrete Fourier feature space. 
    more » « less