skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The importance of better models in stochastic optimization
Standard stochastic optimization methods are brittle, sensitive to stepsize choice and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models for stochastic optimization and learning problems that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models—which we call the aprox family—stochastic methods can be made stable, provably convergent, and asymptotically optimal; even modeling that the objective is nonnegative is sufficient for this stability. We extend these results beyond convexity to weakly convex objectives, which include compositions of convex losses with smooth functions common in modern machine learning. We highlight the importance of robustness and accurate modeling with experimental evaluation of convergence time and algorithm sensitivity.  more » « less
Award ID(s):
1553086
PAR ID:
10383072
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
116
Issue:
46
ISSN:
0027-8424
Page Range / eLocation ID:
22924 to 22930
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a (δ,ϵ)-stationary point from O(ϵ^(-4),δ^(-1)) stochastic gradient queries to O(ϵ^(-3),δ^(-1)), which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning. For deterministic and second-order smooth objectives, applying more advanced optimistic online learning techniques enables a new complexity of O(ϵ^(-1.5),δ^(-0.5)). Our techniques also recover all optimal or best-known results for finding ϵ stationary points of smooth or second-order smooth objectives in both stochastic and deterministic settings. 
    more » « less
  2. We study stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. Building upon the recent advances in algorithmic high-dimensional robust statistics, in each SGD iteration, master employs a non-trivial decoding to estimate the true gradient from the unbiased stochastic gradients received from workers, some of which may be corrupt. We provide convergence analyses for both strongly-convex and non-convex smooth objectives under standard SGD assumptions. We can control the approximation error of our solution in both these settings by the mini-batch size of stochastic gradients; and we can make the approximation error as small as we want, provided that workers use a sufficiently large mini-batch size. Our algorithm can tolerate less than 1/3 fraction of Byzantine workers. It can approximately find the optimal parameters in the strongly-convex setting exponentially fast, and reaches to an approximate stationary point in the non-convex setting with linear speed, i.e., with a rate of 1/T, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting. 
    more » « less
  3. Solving Convex Discrete Optimization via Simulation via Stochastic Localization Algorithms Many decision-making problems in operations research and management science require the optimization of large-scale complex stochastic systems. For a number of applications, the objective function exhibits convexity in the discrete decision variables or the problem can be transformed into a convex one. In “Stochastic Localization Methods for Convex Discrete Optimization via Simulation,” Zhang, Zheng, and Lavaei propose provably efficient simulation-optimization algorithms for general large-scale convex discrete optimization via simulation problems. By utilizing the convex structure and the idea of localization and cutting-plane methods, the developed stochastic localization algorithms demonstrate a polynomial dependence on the dimension and scale of the decision space. In addition, the simulation cost is upper bounded by a value that is independent of the objective function. The stochastic localization methods also exhibit a superior numerical performance compared with existing algorithms. 
    more » « less
  4. Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes chi^2-divergences as a special case. We prove that our algorithm finds an epsilon-stationary point with an improved computational complexity than existing methods. Our method also applies to the smoothed conditional value at risk (CVaR) DRO. 
    more » « less
  5. With a manifold growth in the scale and intricacy of systems, the challenges of parametric misspecification become pronounced. These concerns are further exacerbated in compositional settings, which emerge in problems complicated by modeling risk and robustness. In “Data-Driven Compositional Optimization in Misspecified Regimes,” the authors consider the resolution of compositional stochastic optimization problems, plagued by parametric misspecification. In considering settings where such misspecification may be resolved via a parallel learning process, the authors develop schemes that can contend with diverse forms of risk, dynamics, and nonconvexity. They provide asymptotic and rate guarantees for unaccelerated and accelerated schemes for convex, strongly convex, and nonconvex problems in a two-level regime with extensions to the multilevel setting. Surprisingly, the nonasymptotic rate guarantees show no degradation from the rate statements obtained in a correctly specified regime and the schemes achieve optimal (or near-optimal) sample complexities for general T-level strongly convex and nonconvex compositional problems. 
    more » « less