Title: Conic Descent Redux for Memory-Efficient Optimization
Conic programming has well-documented merits in a gamut of signal processing and machine learning tasks. This contribution revisits a recently developed first-order conic descent (CD) solver, and advances it in three aspects: intuition, theory, and algorithmic implementation. It is found that CD can afford an intuitive geometric derivation that originates from the dual problem. This opens the door to novel algorithmic designs, with a momentum variant of CD, momentum conic descent (MOCO) exemplified. Diving deeper into the dual behavior CD and MOCO reveals: i) an analytically justified stopping criterion; and, ii) the potential to design preconditioners to speed up dual convergence. Lastly, to scale semidefinite programming (SDP) especially for low-rank solutions, a memory efficient MOCO variant is developed and numerically validated. more »« less
Ji, Ziwei; Srebro, Nathan; Telgarsky, Matus
(, Proceedings of Machine Learning Research)
null
(Ed.)
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (eg, the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of O (1/t^ 2). This contrasts with a rate of O (1/log (t)) for standard gradient descent, and O (1/t) for normalized gradient descent. The momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.
Augmented Lagrangian Methods (ALMs) are widely employed in solving constrained optimizations, and some efficient solvers are developed based on this framework. Under the quadratic growth assumption, it is known that the dual iterates and the Karush–Kuhn–Tucker (KKT) residuals of ALMs applied to conic programs converge linearly. In contrast, the convergence rate of the primal iterates has remained elusive. In this paper, we resolve this challenge by establishing new quadratic growth and error bound properties for primal and dual conic programs under the standard strict complementarity condition. Our main results reveal that both primal and dual iterates of the ALMs converge linearly contingent solely upon the assumption of strict complementarity and a bounded solution set. This finding provides a positive answer to an open question regarding the asymptotically linear convergence of the primal iterates of ALMs applied to conic optimization.
Kaku, Aakash; Upadhya, Sahana; Razavian, Narges
(, Advances in neural information processing systems)
We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.
Wang, Jianyu; Liang, Hao; Joshi, Gauri
(, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
null
(Ed.)
Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.
Papp, Dávid; Yıldız, Sercan
(, INFORMS Journal on Computing)
We present alfonso, an open-source Matlab package for solving conic optimization problems over nonsymmetric convex cones. The implementation is based on the authors’ corrected analysis of a method of Skajaa and Ye. It enables optimization over any convex cone as long as a logarithmically homogeneous self-concordant barrier is available for the cone or its dual. This includes many nonsymmetric cones, for example, hyperbolicity cones and their duals (such as sum-of-squares cones), semidefinite and second-order cone representable cones, power cones, and the exponential cone. Besides enabling the solution of problems that cannot be cast as optimization problems over a symmetric cone, algorithms for nonsymmetric conic optimization also offer performance advantages for problems whose symmetric cone programming representation requires a large number of auxiliary variables or has a special structure that can be exploited in the barrier computation. The worst-case iteration complexity of alfonso is the best known for nonsymmetric cone optimization: [Formula: see text] iterations to reach an ε-optimal solution, where ν is the barrier parameter of the barrier function used in the optimization. Alfonso can be interfaced with a Matlab function (supplied by the user) that computes the Hessian of a barrier function for the cone. A simplified interface is also available to optimize over the direct product of cones for which a barrier function has already been built into the software. This interface can be easily extended to include new cones. Both interfaces are illustrated by solving linear programs. The oracle interface and the efficiency of alfonso are also demonstrated using an optimal design of experiments problem in which the tailored barrier computation greatly decreases the solution time compared with using state-of-the-art, off-the-shelf conic optimization software. Summary of Contribution: The paper describes an open-source Matlab package for optimization over nonsymmetric cones. A particularly important feature of this software is that, unlike other conic optimization software, it enables optimization over any convex cone as long as a suitable barrier function is available for the cone or its dual, not limiting the user to a small number of specific cones. Nonsymmetric cones for which such barriers are already known include, for example, hyperbolicity cones and their duals (such as sum-of-squares cones), semidefinite and second-order cone representable cones, power cones, and the exponential cone. Thus, the scope of this software is far larger than most current conic optimization software. This does not come at the price of efficiency, as the worst-case iteration complexity of our algorithm matches the iteration complexity of the most successful interior-point methods for symmetric cones. Besides enabling the solution of problems that cannot be cast as optimization problems over a symmetric cone, our software can also offer performance advantages for problems whose symmetric cone programming representation requires a large number of auxiliary variables or has a special structure that can be exploited in the barrier computation. This is also demonstrated in this paper via an example in which our code significantly outperforms Mosek 9 and SCS 2.
Li, Bingcong, and Giannakis, Georgios B. Conic Descent Redux for Memory-Efficient Optimization. Retrieved from https://par.nsf.gov/biblio/10518927. Web. doi:10.1109/IEEECONF59524.2023.10476894.
@article{osti_10518927,
place = {Country unknown/Code not available},
title = {Conic Descent Redux for Memory-Efficient Optimization},
url = {https://par.nsf.gov/biblio/10518927},
DOI = {10.1109/IEEECONF59524.2023.10476894},
abstractNote = {Conic programming has well-documented merits in a gamut of signal processing and machine learning tasks. This contribution revisits a recently developed first-order conic descent (CD) solver, and advances it in three aspects: intuition, theory, and algorithmic implementation. It is found that CD can afford an intuitive geometric derivation that originates from the dual problem. This opens the door to novel algorithmic designs, with a momentum variant of CD, momentum conic descent (MOCO) exemplified. Diving deeper into the dual behavior CD and MOCO reveals: i) an analytically justified stopping criterion; and, ii) the potential to design preconditioners to speed up dual convergence. Lastly, to scale semidefinite programming (SDP) especially for low-rank solutions, a memory efficient MOCO variant is developed and numerically validated.},
journal = {},
publisher = {IEEE},
author = {Li, Bingcong and Giannakis, Georgios B},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.