MoXCo: How I learned to stop exploring and love my local minima?

Singh, Esha; Sabach, Shoham; Wang, Yu-Xiang

Citation Details

This content will become publicly available on March 24, 2026

MoXCo: How I learned to stop exploring and love my local minima?

Deep neural networks are well-known for their generalization capabilities, largely attributed to optimizers’ ability to find "good" solutions in high-dimensional loss landscapes. This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to these "good" solutions. Our approach shifts the optimization paradigm from merely finding solutions quickly to discovering solutions that generalize well, establishing a careful balance between optimization efficiency and model generalization. We empirically validate our claims using two-layer, fully connected neural network with ReLU activation and demonstrate practical applicability through binary quantization of ResNets. Our numerical results demonstrate that these adaptive optimizers facilitate exploration leading to faster convergence speeds and narrow the generalization gap between stochastic gradient descent and other adaptive methods. more »

Award ID(s):: 2134214 2536920

PAR ID:: 10649006

Author(s) / Creator(s):: Singh, Esha; Sabach, Shoham; Wang, Yu-Xiang

Publisher / Repository:: Proceedings of the Conference on Parsimony and Learning. (CPAL 2025)

Date Published:: 2025-03-24

Journal Name:: Proceedings of Machine Learning Research

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on March 24, 2026
Journal Article:
The DOI is not currently available.

More Like this