skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization
Award ID(s):
2240788
PAR ID:
10526736
Author(s) / Creator(s):
;
Publisher / Repository:
International Conference on Machine Learning (ICML)
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Al-Baali, Mehiddin; Purnama, Anton; Grandinetti, Lucio (Ed.)
    Second order, Newton-like algorithms exhibit convergence properties superior to gradient-based or derivative-free optimization algorithms. However, deriving and computing second order derivatives--needed for the Hessian-vector product in a Krylov iteration for the Newton step--often is not trivial. Second order adjoints provide a systematic and efficient tool to derive second derivative infor- mation. In this paper, we consider equality constrained optimization problems in an infinite-dimensional setting. We phrase the optimization problem in a general Banach space framework and derive second order sensitivities and second order adjoints in a rigorous and general way. We apply the developed framework to a partial differential equation-constrained optimization problem. 
    more » « less
  2. Multi-block minimax bilevel optimization has been studied recently due to its great potential in multi-task learning, robust machine learning, and few-shot learning. However, due to the complex three-level optimization structure, existing algorithms often suffer from issues such as high computing costs due to the second-order model derivatives or high memory consumption in storing all blocks’ parameters. In this paper, we tackle these challenges by proposing two novel fully first-order algorithms named FOSL and MemCS. FOSL features a fully single-loop structure by updating all three variables simultaneously, and MemCS is a memory-efficient double-loop algorithm with cold-start initialization. We provide a comprehensive convergence analysis for both algorithms under full and partial block participation, and show that their sample complexities match or outperform those of the same type of methods in standard bilevel optimization. We evaluate our methods in two applications: the recently proposed multi-task deep AUC maximization and a novel rank-based robust meta-learning. Our methods consistently improve over existing methods with better performance over various datasets. 
    more » « less
  3. Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known about the implicit regularization that provides a fine-grained characterization of which particular solutions are reached. This paper shows that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian, a measure widely used to distinguish between sharp and flat minima. The authors provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions, defining flat minima as minimizers that achieve the smallest trace of Hessian among all optimal solutions. Experiments on binary classification tasks with convex losses and language model fine-tuning support the theoretical findings. 
    more » « less