skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization
Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD). Adopting the techniques of Tzen et al. (2018) for LD to non-reversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithm improves upon the recurrence time obtained for LD in Tzen et al. (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar results and improvements are obtained for the NLD algorithm. We also show that non-reversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that non-reversible Langevin algorithms are more efficient to locate a local minimum as well as exploring the state space.  more » « less
Award ID(s):
1814888 1723085
PAR ID:
10256959
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
33
ISSN:
2640-3498
Page Range / eLocation ID:
7850 - 17862
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Based on the prior work of Chahl and Gopalakrishnan (2019) to infer particle-ion collision time distributions using a Langevin Dynamics (LD) approach, we develop a model for the non-dimensional diffusion charging collision kernel β_i or H that is applicable for 0≤Ψ_E≤60,0≤Ψ_I/Ψ_E ≤1,Kn_D≤2000 (defined in the main text). The developed model for β_i for attractive Coulomb and image potential interactions, along with the model for β_i for repulsive Coulomb and image potential interactions from Gopalakrishnan et al. (2013b), is tested against published diffusion charging experimental data. Current state of the art charging models, Fuchs (1963) and Wiedensohler (1988) regression for bipolar charging, are also evaluated and discussed. Comparisons reveal that the LD-based model accurately describes unipolar fractions for 10 – 100 nm particles measured in air (Adachi et al., 1985), nitrogen and argon but not in helium (Adachi et al., 1987). Fuchs model and the LD-based model yield similar predictions in the experimental conditions considered, except in helium. In the case of bipolar charging, the LD-based model captures the experimental trends quantitatively (within ±20%) across the entire size range of 4 – 40 nm producing superior agreement than Wiedensohler’s regression. The latter systematically underpredicts charge fraction below ~20 nm in air (by up to 40%) for the data presented in Adachi et al. (1985). Comparison with the data of Gopalakrishnan et al. (2015), obtained in UHP air along with measurements of the entire ion mass-mobility distribution, shows excellent agreement with the predictions of the LD-based model. This demonstrates the capability to accommodate arbitrary ion populations in any background gas, when such data is available. Wiedensohler’s regression, derived for bipolar charging in air using average ion mass-mobility, also describes the data reasonably well in the conditions examined. However, both models failed to capture the fraction of singly and doubly charged particles in carbon dioxide warranting further investigation. 
    more » « less
  2. Consider a system of homogeneous interacting diffusive particles labeled by the nodes of a unimodular Galton–Watson tree, where the state of each node evolves infinitesi- mally like a d-dimensional diffusion whose drift coefficient depends on (the histories of) its own state and the states of neighboring nodes, and whose diffusion coefficient depends only on (the history of) its own state. Under suitable regularity assumptions on the coefficients, an autonomous characterization is obtained for the marginal dis- tribution of the dynamics of the neighborhood of a typical node in terms of a certain local equation, which is a new kind of stochastic differential equation that is nonlinear in the sense of McKean. This equation describes a finite-dimensional non-Markovian stochastic process whose infinitesimal evolution at any time depends not only on the structure and current state of the neighborhood, but also on the conditional law of the current state given the past of the states of neighborhing nodes until that time. Such marginal distributions are of interest because they arise as weak limits of both marginal distributions and empirical measures of interacting diffusions on many sequences of sparse random graphs, including the configuration model and Erdös–Rényi graphs whose average degrees converge to a finite non-zero limit. The results obtained complement classical results in the mean-field regime, which characterize the limiting dynamics of homogeneous interacting diffusions on complete graphs, as the num- ber of nodes goes to infinity, in terms of a corresponding nonlinear Markov process. However, in the sparse graph setting, the topology of the graph strongly influences the dynamics, and the analysis requires a completely different approach. The proofs of existence and uniqueness of the local equation rely on delicate new conditional independence and symmetry properties of particle trajectories on unimodular Galton– Watson trees, as well as judicious use of changes of measure. 
    more » « less
  3. SUMMARY This paper revisits and extends the adjoint theory for glacial isostatic adjustment (GIA) of Crawford et al. (2018). Rotational feedbacks are now incorporated, and the application of the second-order adjoint method is described for the first time. The first-order adjoint method provides an efficient means for computing sensitivity kernels for a chosen objective functional, while the second-order adjoint method provides second-derivative information in the form of Hessian kernels. These latter kernels are required by efficient Newton-type optimization schemes and within methods for quantifying uncertainty for non-linear inverse problems. Most importantly, the entire theory has been reformulated so as to simplify its implementation by others within the GIA community. In particular, the rate-formulation for the GIA forward problem introduced by Crawford et al. (2018) has been replaced with the conventional equations for modelling GIA in laterally heterogeneous earth models. The implementation of the first- and second-order adjoint problems should be relatively easy within both existing and new GIA codes, with only the inclusions of more general force terms being required. 
    more » « less
  4. State-of-the-art hypergraph partitioners follow the multilevel paradigm that constructs multiple levels of progressively coarser hypergraphs that are used to drive cut refinements on each level of the hierarchy. Multilevel partitioners are subject to two limitations: (i) Hypergraph coarsening processes rely on local neighborhood structure without fully considering the global structure of the hypergraph. (ii) Refinement heuristics can stagnate on local minima. In this paper, we describe SpecPart, the first supervised spectral framework that directly tackles these two limitations. SpecPart solves a generalized eigenvalue problem that captures the balanced partitioning objective and global hypergraph structure in a low-dimensional vertex embedding while leveraging initial high-quality solutions from multilevel partitioners as hints. SpecPart further constructs a family of trees from the vertex embedding and partitions them with a tree-sweeping algorithm. Then, a novel overlay of multiple tree-based partitioning solutions, followed by lifting to a coarsened hypergraph, where an ILP partitioning instance is solved to alleviate local stagnation. We have validated SpecPart on multiple sets of benchmarks. Experimental results show that for some benchmarks, our SpecPart can substantially improve the cutsize by more than 50% with respect to the best published solutions obtained with leading partitioners hMETIS and KaHyPar. 
    more » « less
  5. We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize the communication cost. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe that there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero. 
    more » « less