We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $$\epsilon$$ accuracy in 2-Wasserstein distance, our algorithm achieves $$\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}%\wedge\frac{\kappa^2L^{-2}d\sigma^2}{\epsilon^2} \big)$$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.
more »
« less
Projected Stochastic Gradient Langevin Algorithms for Constrained Sampling and Non-Convex Learning
Langevin algorithms are gradient descent methods with additive noise. They have been used for decades in Markov Chain Monte Carlo (MCMC) sampling, optimization, and learning. Their convergence properties for unconstrained non-convex optimization and learning problems have been studied widely in the last few years. Other work has examined projected Langevin algorithms for sampling from log-concave distributions restricted to convex compact sets. For learning and optimization, log-concave distributions correspond to convex losses. In this paper, we analyze the case of non-convex losses with compact convex constraint sets and IID external data variables. We term the resulting method the projected stochastic gradient Langevin algorithm (PSGLA). We show the algorithm achieves a deviation of π(πβ1/4(ππππ)1/2) from its target distribution in 1-Wasserstein distance. For optimization and learning, we show that the algorithm achieves π-suboptimal solutions, on average, provided that it is run for a time that is polynomial in π and slightly super-exponential in the problem dimension.
more »
« less
- Award ID(s):
- 2122856
- PAR ID:
- 10354811
- Editor(s):
- Belkin, M.; Kpotufe, S.
- Date Published:
- Journal Name:
- Proceedings of Thirty Fourth Conference on Learning Theory
- Volume:
- PMLR 134
- Page Range / eLocation ID:
- 2891-2937
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The connections between (convex) optimization and (logconcave) sampling have been considerably enriched in the past decade with many conceptual and mathematical analogies. For instance, the Langevin algorithm can be viewed as a sampling analogue of gradient descent and has condition-number-dependent guarantees on its performance. In the early 1990s, Nesterov and Nemirovski developed the Interior-Point Method (IPM) for convex optimization based on self-concordant barriers, providing efficient algorithms for structured convex optimization, often faster than the general method. This raises the following question: can we develop an analogous IPM for structured sampling problems? In 2012, Kannan and Narayanan proposed the Dikin walk for uniformly sampling polytopes, and an improved analysis was given in 2020 by Laddha-Lee-Vempala. The Dikin walk uses a local metric defined by a self-concordant barrier for linear constraints. Here we generalize this approach by developing and adapting IPM machinery together with the Dikin walk for poly-time sampling algorithms. Our IPM-based sampling framework provides an efficient warm start and goes beyond uniform distributions and linear constraints. We illustrate the approach on important special cases, in particular giving the fastest algorithms to sample uniform, exponential, or Gaussian distributions on a truncated PSD cone. The framework is general and can be applied to other sampling algorithms.more » « less
-
null (Ed.)Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution exp(βf) for a suitable function f. When the domain of the distribution is high-dimensional, this sampling can be challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When f is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex f, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such f. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.more » « less
-
Traditional models of supervised learning require a learner, given examples from an arbitrary joint distribution on π π Γ { Β± 1 } R d Γ{Β±1}, to output a hypothesis that competes (to within π Ο΅) with the best fitting concept from a class. To overcome hardness results for learning even simple concept classes, this paper introduces a smoothed-analysis framework that only requires competition with the best classifier robust to small random Gaussian perturbations. This subtle shift enables a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (multi-index model) and (2) has bounded Gaussian surface area. This class includes functions of halfspaces and low-dimensional convex sets, which are only known to be learnable in non-smoothed settings with respect to highly structured distributions like Gaussians. The analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, the authors present the first algorithm for agnostically learning intersections of π k-halfspaces in time π β poly ( log β‘ π , π , πΎ ) kβ poly(logk,Ο΅,Ξ³), where πΎ Ξ³ is the margin parameter. Previously, the best-known runtime was exponential in π k (Arriaga and Vempala, 1999).more » « less
-
From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negativeβwe prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.more » « less
An official website of the United States government

