skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Stochastic Optimization under Distributional Drift
We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we identify a low drift-to-noise regime in which the tracking efficiency of the proximal stochastic gradient method benefits significantly from a step decay schedule. Numerical experiments illustrate our results.  more » « less
Award ID(s):
1651851 2023166
PAR ID:
10471583
Author(s) / Creator(s):
; ;
Publisher / Repository:
Journal of Machine Learning Research
Date Published:
Journal Name:
Journal of Machine Learning Research
Volume:
24
Issue:
147
ISSN:
1533-7928
Page Range / eLocation ID:
1-56
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study a distributed method called SAB–TV, which employs gradient tracking to collaboratively minimize the strongly-convex sum of smooth local cost functions for networked agents communicating over a time-varying directed graph. Each agent, assumed to have access to a stochastic first order oracle for obtaining an unbiased estimate of the gradient of its local cost function, maintains an auxiliary variable to asymptotically track the stochastic gradient of the global cost. The optimal decision and gradient tracking are updated over time through limited information exchange with local neighbors using row- and column-stochastic weights, guaranteeing both consensus and optimality. With a sufficiently small constant step-size, we demonstrate that, in expectation, SAB–TV converges linearly to a neighborhood of the optimal solution. Numerical simulations illustrate the effectiveness of the proposed algorithm. 
    more » « less
  2. We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from τ rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only 1/τ of the gradients. In sharp contrast, we quantify how stochastic noise makes the effect of delays negligible, improving on previous work which only showed this phenomenon asymptotically or for much smaller delays. Also, in the context of distributed optimization, the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching. Our results are based on a novel technique for analyzing convergence of optimization algorithms using generating functions. 
    more » « less
  3. We study thestochastic heat equationon R d subject to a centered Gaussian noise that is white in time and colored in space.The drift term is assumed to satisfy an Osgood-type condition and the diffusion coefficient may have certain related growth. We show that there exists random field solution which do not explode in finite time. This complements and improves upon recent results on blow-up of solutions to stochastic partial differential equations. 
    more » « less
  4. Many complex fluids can be described by continuum hydrodynamic field equations, to which noise must be added in order to capture thermal fluctuations. In almost all cases, the resulting coarse-grained stochastic partial differential equations carry a short-scale cutoff, which is also reflected in numerical discretisation schemes. We draw together our recent findings concerning the construction of such schemes and the interpretation of their continuum limits, focusing, for simplicity, on models with a purely diffusive scalar field, such as ‘Model B’ which describes phase separation in binary fluid mixtures. We address the requirement that the steady-state entropy production rate (EPR) must vanish for any stochastic hydrodynamic model in a thermal equilibrium. Only if this is achieved can the given discretisation scheme be relied upon to correctly calculate the nonvanishing EPR for ‘active field theories’ in which new terms are deliberately added to the fluctuating hydrodynamic equations that break detailed balance. To compute the correct probabilities of forward and time-reversed paths (whose ratio determines the EPR), we must make a careful treatment of so-called ‘spurious drift’ and other closely related terms that depend on the discretisation scheme. We show that such subtleties can arise not only in the temporal discretisation (as is well documented for stochastic ODEs with multiplicative noise) but also from spatial discretisation, even when noise is additive, as most active field theories assume. We then review how such noise can become multiplicative via off-diagonal couplings to additional fields that thermodynamically encode the underlying chemical processes responsible for activity. In this case, the spurious drift terms need careful accounting, not just to evaluate correctly the EPR but also to numerically implement the Langevin dynamics itself. 
    more » « less
  5. null (Ed.)
    Recently decentralized optimization attracts much attention in machine learning because it is more communication-efficient than the centralized fashion. Quantization is a promising method to reduce the communication cost via cutting down the budget of each single communication using the gradient compression. To further improve the communication efficiency, more recently, some quantized decentralized algorithms have been studied. However, the quantized decentralized algorithm for nonconvex constrained machine learning problems is still limited. Frank-Wolfe (a.k.a., conditional gradient or projection-free) method is very efficient to solve many constrained optimization tasks, such as low-rank or sparsity-constrained models training. In this paper, to fill the gap of decentralized quantized constrained optimization, we propose a novel communication-efficient Decentralized Quantized Stochastic Frank-Wolfe (DQSFW) algorithm for non-convex constrained learning models. We first design a new counterexample to show that the vanilla decentralized quantized stochastic Frank-Wolfe algorithm usually diverges. Thus, we propose DQSFW algorithm with the gradient tracking technique to guarantee the method will converge to the stationary point of non-convex optimization safely. In our theoretical analysis, we prove that to achieve the stationary point our DQSFW algorithm achieves the same gradient complexity as the standard stochastic Frank-Wolfe and centralized Frank-Wolfe algorithms, but has much less communication cost. Experiments on matrix completion and model compression applications demonstrate the efficiency of our new algorithm. 
    more » « less