skip to main content


Title: Variable Smoothing for Weakly Convex Composite Functions
Abstract We study minimization of a structured objective function, being the sum of a smooth function and a composition of a weakly convex function with a linear operator. Applications include image reconstruction problems with regularizers that introduce less bias than the standard convex regularizers. We develop a variable smoothing algorithm, based on the Moreau envelope with a decreasing sequence of smoothing parameters, and prove a complexity of $${\mathcal {O}}(\epsilon ^{-3})$$ O ( ϵ - 3 ) to achieve an $$\epsilon $$ ϵ -approximate solution. This bound interpolates between the $${\mathcal {O}}(\epsilon ^{-2})$$ O ( ϵ - 2 ) bound for the smooth case and the $${\mathcal {O}}(\epsilon ^{-4})$$ O ( ϵ - 4 ) bound for the subgradient method. Our complexity bound is in line with other works that deal with structured nonsmoothness of weakly convex functions.  more » « less
Award ID(s):
2023239 1740707 1934612 1628384
PAR ID:
10268362
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Optimization Theory and Applications
Volume:
188
Issue:
3
ISSN:
0022-3239
Page Range / eLocation ID:
628 to 649
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is O(ε−1). In this paper, we show that by leveraging a local error bound condition on the dual function, the proposed algorithm can achieve a better primal convergence time of O 􏰕ε−2/(2+β) log2(ε−1)􏰖, where β ∈ (0, 1] is a local error bound parameter. As an example application of the general algorithm, we show that the distributed geometric median problem, which can be formulated as a constrained convex program, has its dual function non-smooth but satisfying the aforementioned local error bound condition with β = 1/2, therefore enjoying a convergence time of O 􏰕ε−4/5 log2(ε−1)􏰖. This result improves upon the O(ε−1) convergence time bound achieved by existing distributed optimization algorithms. Simulation experiments also demonstrate the performance of our proposed algorithm. 
    more » « less
  2. We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a (δ,ϵ)-stationary point from O(ϵ^(-4),δ^(-1)) stochastic gradient queries to O(ϵ^(-3),δ^(-1)), which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning. For deterministic and second-order smooth objectives, applying more advanced optimistic online learning techniques enables a new complexity of O(ϵ^(-1.5),δ^(-0.5)). Our techniques also recover all optimal or best-known results for finding ϵ stationary points of smooth or second-order smooth objectives in both stochastic and deterministic settings. 
    more » « less
  3. In this paper, we revisit the bilevel optimization problem, in which the upper-level objective function is generally nonconvex and the lower-level objective function is strongly convex. Although this type of problem has been studied extensively, it still remains an open question how to achieve an $\mathcal{O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic bilevel optimization without any second-order derivative computation. To fill this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named FdeHBO, which features a simple fully single-loop structure, a projection-aided finite-difference Hessian/Jacobian-vector approximation, and momentum-based updates. Theoretically, we show that FdeHBO requires $\mathcal{O}(\epsilon^{-1.5})$ iterations (each using $\mathcal{O}(1)$ samples and only first-order gradient information) to find an $\epsilon$-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an $\mathcal{O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex stochastic bilevel optimization. 
    more » « less
  4. This work proposes a new algorithm – the Single-timescale Double-momentum Stochastic Approximation (SUSTAIN) –for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on two-timescale or double loop techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that SUSTAIN requires ${O}(\epsilon^{-3/2})$ iterations (each using $O(1)$ samples) to find an $\epsilon$-stationary solution. The $\epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions match the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex. 
    more » « less
  5. Abstract

    This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank–Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an$$\epsilon $$ϵ-suboptimal solution in$$\mathcal {O}(\epsilon ^{-2})$$O(ϵ-2)iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.

     
    more » « less